Models

Overview

Handy supports multiple speech-to-text models. All models run locally on your machine. Nothing is sent to the cloud.

Handy Models Settings

Parakeet

NVIDIA’s Parakeet models are optimized for western languages. Parakeet V3 is the recommended model for most users. It’s fast, accurate, and supports 25 languages.

Parakeet writes out numbers as words (e.g. “one”, “two”, “three”) rather than digits. Language is detected automatically and cannot be manually specified.

ModelSizeSpeedAccuracyLanguages
Parakeet V3~478 MBFastHigh25 European languages
Parakeet V2~473 MBFastHighEnglish only
Parakeet V3 Supported Languages
EnglishSpanishFrenchGermanPortugueseRussianItalianPolishDutchRomanianUkrainianCzechGreekSwedishHungarianBulgarianDanishFinnishSlovakCroatianLithuanianSlovenianLatvianEstonianMaltese

Whisper

OpenAI’s Whisper models support 99+ languages and optional translation to English. Best choice if you need broad multilingual support.

Whisper outputs numbers as digits (e.g. “1”, “2”, “3”) and allows you to specify which language to transcribe, or let it auto-detect. Whisper Medium is a great balance of speed and accuracy for users with capable hardware.

ModelSizeSpeedAccuracyTranslation
Small~487 MBFastGoodYes
Medium~492 MBModerateBetterYes
Large~1.1 GBSlowerBestYes
Turbo~1.6 GBModerateHighNo
Whisper Supported Languages
EnglishChinese (Simplified)Chinese (Traditional)SpanishHindiArabicFrenchPortugueseBengaliRussianJapaneseIndonesianGermanKoreanTurkishVietnameseItalianThaiUrduPolishUkrainianDutchRomanianPersianMalayTamilTeluguTagalogSwahiliCzech

Breeze ASR

A Whisper variant optimized for Taiwanese Mandarin with code-switching support (~1.1 GB).

Moonshine

Moonshine models are lightweight and English-only. Great for quick transcription on lower-powered hardware.

ModelSizeSpeedAccuracy
V2 Tiny~31 MBFastestLower
V2 Small~100 MBVery fastGood
V2 Medium~192 MBFastBetter
Base~58 MBVery fastGood

GigaAM v3

GigaAM v3 is a Russian speech recognition model. It supports punctuation, Latin characters, and digits. A good choice for Russian speakers who want fast, accurate local transcription.

ModelSizeSpeedAccuracyLanguages
GigaAM v3~225 MBFastHighRussian

SenseVoice

SenseVoice is a fast model from FunAudioLLM supporting a small set of East Asian languages plus English.

ModelSizeSpeedAccuracyLanguages
SenseVoice~160 MBFastestGood5 languages
SenseVoice Supported Languages
ChineseEnglishJapaneseKoreanCantonese

Changing Your Model

Open Handy’s settings and select a model from the dropdown. New models will be downloaded on first use.

Speed vs. Accuracy

  • Fastest transcription: Moonshine or SenseVoice for English/supported languages
  • Best all-rounder: Parakeet V3 for European languages
  • Best for Russian: GigaAM v3
  • Best multilingual: Whisper Small or Medium
  • Best accuracy: Whisper Large

Custom Models

As an experimental feature, you can load custom Whisper-compatible models. Place a .bin model file in the models/ directory inside your Handy app data folder (check About for the path). Handy will discover it on the next launch.

Hardware Considerations

Whisper models use GPU acceleration automatically:

  • macOS: Metal
  • Windows/Linux: Vulkan

All other models (Parakeet, Moonshine, SenseVoice, GigaAM) currently run on CPU only. GPU acceleration support is coming soon.