New, competitive and affordable speech to text API
The pricing is $0.00005 USD per second, that's 0.003 USD per minute or 0.18 per hour of audio (billed by the second so your not charged more)
There is a free tier of 5.5 hours of audio transcription per month
Over 8x cheaper than Google
Yep, as of today Google Cloud Speech to text costs 0.006 USD for each 15 seconds meaning at best you'll get $0.0004 a second and pay 8x on Google cloud speech to text.
It is 5x cheaper than Assembly AI with a much larger free tier too.
Step 1 - Get an API Key
Sign Up to Text-Generator.io then get your API key
Step 2 - Profit
import requests
headers = {"secret": "YOUR_SECRET_HERE"}
data = {
"audio_url": "http://www.fit.vutbr.cz/~motlicek/sympatex/f2bjrop1.0.wav",
"translate_to_english": False
}
response = requests.post(
"https://api4.text-generator.io/api/v1/audio-extraction",
json=data,
headers=headers
)
json_response = response.json()
print(json_response)
Results
The response is a object containing the text and "segments" the text in chunks broken down by second start/end timestamps
{
"text": "Wanted Chief Justice of the Massachusetts Supreme Court. In April, the SJC's current leader, Edward Hennessy, reaches the mandatory retirement age of 70, and a successor is expected.",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0,
"end": 3.72,
"text": " Wanted Chief Justice of the Massachusetts Supreme Court.",
"temperature": 0,
"avg_logprob": -0.21680092811584473,
"compression_ratio": 1.2551724137931035,
"no_speech_prob": 0.06906628608703613
},
{
"id": 1,
"seek": 0,
"start": 3.72,
"end": 7.2,
"text": " In April, the SJC's current leader, Edward Hennessy,",
"temperature": 0,
"avg_logprob": -0.21680092811584473,
"compression_ratio": 1.2551724137931035,
"no_speech_prob": 0.06906628608703613
},
{
"id": 2,
"seek": 0,
"start": 7.2,
"end": 9.8,
"text": " reaches the mandatory retirement age of 70,",
"temperature": 0,
"avg_logprob": -0.21680092811584473,
"compression_ratio": 1.2551724137931035,
"no_speech_prob": 0.06906628608703613
},
{
"id": 3,
"seek": 980,
"start": 9.8,
"end": 30.8,
"text": " and a successor is expected.",
"temperature": 0,
"avg_logprob": -0.7248780992296007,
"compression_ratio": 0.7777777777777778,
"no_speech_prob": 0.005156728904694319
}
],
"language": "en"
}
Details
mp3 and wav files are supported
You can also send a "translate_to_english" parameter to receive english text even for foreign spoken languages, with this set to false many languages are supported and returned as is, in the native spoken language.
Under the hood
Under the hood there is a large audio model and large language model which both analyse the audio at different resolutions to generate the transcribed text using a gpt-2 tokenization scheme, token by token via beam search.
We found a beam size of 4 to give sufficiently better results.
In future audio analysis will be integrated into our text generation API and crawler, allowing classification, question answering and analysis of conversations with embedded linked audio or video files.
Plug
We are on a mission to bring about affordable AI for everyone
Text Generator offers an API for text and code generation. Secure (no PII is stored on our servers), affordable, flexible and accurate.
Try examples yourself at: Text Generator Playground
Sign up