Skip to content

Whisper Provider Setup

Note: Whisper is capable of transcribing many languages, but can only translate a language into English. It does not support translating to other languages.

Whisper (based on OpenAI Whisper) uses a neural network powered by your CPU or NVIDIA graphics card to generate subtitles for your media.

Whisper supports transcribing in many languages as well as translating from a language to English. The provider works best when it knows the audio language ahead of time. Make sure the 'Deep analyze media file to get audio tracks language' option is enabled to ensure the best results.

Minimum score must be lowered if you want whisper generated subtitles to be automatically "downloaded" because they have a fixed score which is 241/360 (~67%) for episodes and 61/120 (~51%) for movies.

whisper-asr-webservice

Bazarr's Whisper provider communicates with ahmetoner/whisper-asr-webservice. Refer to its documentation for more assistance in setting it up.

Choosing a Model

Larger models are more accurate but take longer to run. Choose the largest model you are comfortable with and your CPU/GPU is capable of running.

Available ASR_MODELs are tiny, base, small, medium, large (only OpenAI Whisper), large-v1, large-v2 and large-v3 (only OpenAI Whisper for now).

Choosing a backend

whisper-asr-webservice supports multiple backends. Currently, there are two available options:

Docker Installation

Change ASR_MODEL to use the model you like, and change ASR_ENGINE to select the backend.

CPU - Docker CLI

docker run -d -p 9000:9000 -e ASR_MODEL=small -e ASR_ENGINE=faster_whisper onerahmet/openai-whisper-asr-webservice:latest

CPU - Docker Compose

---
version: "2.1"
services:
  whisperasr:
    image: onerahmet/openai-whisper-asr-webservice:latest
    environment:
      - ASR_MODEL=small
      - ASR_ENGINE=faster_whisper
    ports:
      - 9000:9000
    restart: unless-stopped

GPU - Docker CLI

docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=small -e ASR_ENGINE=faster_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu

GPU - Docker Compose

---
version: "2.1"
services:
  whisperasr:
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    environment:
      - ASR_MODEL=small
      - ASR_ENGINE=faster_whisper
    ports:
      - 9000:9000
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

Docker on Windows

You can run the ASR container on your Windows PC. This is especially useful if its your only way to get access to a powerful GPU. Follow these instructions for more information.

Bazarr Configuration

Change the endpoint to the server you are hosting the Whisper container on (127.0.0.1 if on the same machine), and adjust the timeout if you find it keeps timing out on long movies or TV shows. The endpoint must start with http://

img.png

Troubleshooting

Language detection

When Bazarr doesn't know the language of the media you're trying to get subtitles for, Whisper must guess. It only uses the first 30 seconds of audio in order to detect the language. To ensure best results, use media which has the audio language specified in the file, and make sure the deep analyze option is turned on.

Bazarr's implementation of Whisper is still in early stages. If you have any issues, follow these instructions:

  • Enable debug logging
  • Join the Bazarr discord and post your logs, and a description of the problem on the #whisper channel
  • Include an @Alex mention to get my attention