Models
Overview
Handy supports multiple speech-to-text models. All models run locally on your machine. Nothing is sent to the cloud.
Parakeet
NVIDIA’s Parakeet models are optimized for western languages. Parakeet V3 is the recommended model for most users. It’s fast, accurate, and supports 25 languages.
Parakeet writes out numbers as words (e.g. “one”, “two”, “three”) rather than digits. Language is detected automatically and cannot be manually specified.
| Model | Size | Speed | Accuracy | Languages |
|---|---|---|---|---|
| Parakeet V3 | ~478 MB | Fast | High | 25 European languages |
| Parakeet V2 | ~473 MB | Fast | High | English only |
Whisper
OpenAI’s Whisper models support 99+ languages and optional translation to English. Best choice if you need broad multilingual support.
Whisper outputs numbers as digits (e.g. “1”, “2”, “3”) and allows you to specify which language to transcribe, or let it auto-detect. Whisper Medium is a great balance of speed and accuracy for users with capable hardware.
| Model | Size | Speed | Accuracy | Translation |
|---|---|---|---|---|
| Small | ~487 MB | Fast | Good | Yes |
| Medium | ~492 MB | Moderate | Better | Yes |
| Large | ~1.1 GB | Slower | Best | Yes |
| Turbo | ~1.6 GB | Moderate | High | No |
Breeze ASR
A Whisper variant optimized for Taiwanese Mandarin with code-switching support (~1.1 GB).
Moonshine
Moonshine models are lightweight and English-only. Great for quick transcription on lower-powered hardware.
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| V2 Tiny | ~31 MB | Fastest | Lower |
| V2 Small | ~100 MB | Very fast | Good |
| V2 Medium | ~192 MB | Fast | Better |
| Base | ~58 MB | Very fast | Good |
GigaAM v3
GigaAM v3 is a Russian speech recognition model. It supports punctuation, Latin characters, and digits. A good choice for Russian speakers who want fast, accurate local transcription.
| Model | Size | Speed | Accuracy | Languages |
|---|---|---|---|---|
| GigaAM v3 | ~225 MB | Fast | High | Russian |
SenseVoice
SenseVoice is a fast model from FunAudioLLM supporting a small set of East Asian languages plus English.
| Model | Size | Speed | Accuracy | Languages |
|---|---|---|---|---|
| SenseVoice | ~160 MB | Fastest | Good | 5 languages |
Changing Your Model
Open Handy’s settings and select a model from the dropdown. New models will be downloaded on first use.
Speed vs. Accuracy
- Fastest transcription: Moonshine or SenseVoice for English/supported languages
- Best all-rounder: Parakeet V3 for European languages
- Best for Russian: GigaAM v3
- Best multilingual: Whisper Small or Medium
- Best accuracy: Whisper Large
Custom Models
As an experimental feature, you can load custom Whisper-compatible models. Place a .bin model file in the models/ directory inside your Handy app data folder (check About for the path). Handy will discover it on the next launch.
Hardware Considerations
Whisper models use GPU acceleration automatically:
- macOS: Metal
- Windows/Linux: Vulkan
All other models (Parakeet, Moonshine, SenseVoice, GigaAM) currently run on CPU only. GPU acceleration support is coming soon.