Speech to Speech
Record or upload audio, optionally add a text prompt, and get back a spoken reply.
Powered by Gemma 4 multimodal audio understanding and Kokoro TTS, all in one request to
/api/v1/multimodal-generate.
No audio selected