POST
/
api
{
  "audio": "UklGRiQA....",
  "language": ["en"],
  "context": {
    "app": {
      "type": "email"
    },
    "dictionary_context": [],
    "textbox_contents": {
      "before_text": "",
      "selected_text": "",
      "after_text": ""
    },
    // ... for a full list of available fields, see the "Request Schema" page
  }
}
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Testing testing 1, 2, 3",
  "detected_language": "en",
  "total_time": 432,
  "generated_tokens": 9
}
Convert audio to text with support for multiple languages and context awareness. Use your API key for authentication.

Request Body

audio
string
required
Base64 encoded, 16kHz wav audio. Maximum size is 25MB or 6 minutes of audio.
language
array
Optional list of (ISO 639-1) language codes that the user is expected to speak.Setting the list size to 1 forces the transcription into the specified language. Not providing an input attempts autodetection on full list of languages (less accurate).
context
object
Optional contextual information about the circumstances surrounding the user dictation.Flow can use these information to make its output more accurate by for example, getting names right, resolving speech ambiguities, etc.All properties are optional and will use default values if not provided.
See Request Schema page for an exhaustive list of context attributes.
properties
object
deprecated
Legacy API schema for providing context. Use the equivalent fields in the context field instead. If both context and properties are provided, properties will be ignored.
See Request Schema page for an exhaustive list of properties.
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Testing testing 1, 2, 3",
  "detected_language": "en",
  "total_time": 432,
  "generated_tokens": 9
}
{
  "audio": "UklGRiQA....",
  "language": ["en"],
  "context": {
    "app": {
      "type": "email"
    },
    "dictionary_context": [],
    "textbox_contents": {
      "before_text": "",
      "selected_text": "",
      "after_text": ""
    },
    // ... for a full list of available fields, see the "Request Schema" page
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
audio
string
required

Base64-encoded, 16kHz wav audio in PCM16 format (16-bit signed integer). Max size is 25MB / 6 minutes of audio

Example:

"UklGRiQA...."

language
string[]
required

The list of languages the user might speak, set to one to skip language detection, set to empty to look through entire language list

Example:
["en", "fr"]
context
object
required

Additional information about the context surrounding the dictation

Response

Successful transcription

id
string<uuid>

Unique identifier for the transcription

Example:

"550e8400-e29b-41d4-a716-446655440000"

text
string

The transcribed text with formatting

Example:

"Testing testing 1, 2, 3"

detected_language
string

Detected language code

Example:

"en"

total_time
integer

Total processing time in milliseconds

Example:

432

generated_tokens
integer

Number of tokens used

Example:

9