Speech to Speech

Record or upload audio, optionally add a text prompt, and get back a spoken reply. Powered by Gemma 4 multimodal audio understanding and Kokoro TTS, all in one request to /api/v1/multimodal-generate.

Audio input

Choose file No audio selected