Request Body
Base64 encoded, 16kHz wav audio. Maximum size is 25MB or 6 minutes of audio.
Optional list of (ISO 639-1) language codes that the user is expected to speak.Setting the list size to 1 forces the transcription into the specified language. Not providing an input attempts autodetection on full list of languages (less accurate).
Optional contextual information about the circumstances surrounding the user dictation.Flow can use these information to make its output more accurate by for example, getting names right, resolving speech ambiguities, etc.All properties are optional and will use default values if not provided.
See Request Schema page for an exhaustive list of context attributes.
Legacy API schema for providing context. Use the equivalent fields in the
context
field instead. If both context
and properties
are provided, properties
will be ignored.See Request Schema page for an exhaustive list of properties.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
application/json
Base64-encoded, 16kHz wav audio in PCM16 format (16-bit signed integer). Max size is 25MB / 6 minutes of audio
Example:
"UklGRiQA...."
The list of languages the user might speak, set to one to skip language detection, set to empty to look through entire language list
Example:
["en", "fr"]
Additional information about the context surrounding the dictation
Response
Successful transcription
Unique identifier for the transcription
Example:
"550e8400-e29b-41d4-a716-446655440000"
The transcribed text with formatting
Example:
"Testing testing 1, 2, 3"
Detected language code
Example:
"en"
Total processing time in milliseconds
Example:
432
Number of tokens used
Example:
9