Whisper is a powerful artificial intelligence-powered speech recognition technology that employs large-scale weak supervision. It is a general-purpose model that can recognize multilingual speech, translate speech, and identify spoken languages. It is built on a sequence-to-sequence model that enables joint modeling of sequence tokens as well as prediction decoding. It has five model sizes to choose from, each with a different speed and accuracy tradeoff. It is freely available under the MIT license.

