REST with API key auth

{
  "audio": "UklGRiQA....",
  "language": ["en"],
  "context": {
    "app": {
      "type": "email"
    },
    "dictionary_context": [],
    "textbox_contents": {
      "before_text": "",
      "selected_text": "",
      "after_text": ""
    },
    // ... for a full list of available fields, see the "Request Schema" page
  }
}

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Testing testing 1, 2, 3",
  "detected_language": "en",
  "total_time": 432,
  "generated_tokens": 9
}

POST

api

{
  "audio": "UklGRiQA....",
  "language": ["en"],
  "context": {
    "app": {
      "type": "email"
    },
    "dictionary_context": [],
    "textbox_contents": {
      "before_text": "",
      "selected_text": "",
      "after_text": ""
    },
    // ... for a full list of available fields, see the "Request Schema" page
  }
}

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Testing testing 1, 2, 3",
  "detected_language": "en",
  "total_time": 432,
  "generated_tokens": 9
}

Convert audio to text with support for multiple languages and context awareness. Use your API key for authentication.

Request Body

audio

string

required

Base64 encoded, 16kHz wav audio. Maximum size is 25MB or 6 minutes of audio.

language

array

Optional list of (ISO 639-1) language codes that the user is expected to speak.Setting the list size to 1 forces the transcription into the specified language. Not providing an input attempts autodetection on full list of languages (less accurate).

context

object

Optional contextual information about the circumstances surrounding the user dictation.Flow can use these information to make its output more accurate by for example, getting names right, resolving speech ambiguities, etc.All properties are optional and will use default values if not provided.

See Request Schema page for an exhaustive list of context attributes.

properties

object

deprecated

Legacy API schema for providing context. Use the equivalent fields in the context field instead. If both context and properties are provided, properties will be ignored.

See Request Schema page for an exhaustive list of properties.

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Testing testing 1, 2, 3",
  "detected_language": "en",
  "total_time": 432,
  "generated_tokens": 9
}

{
  "audio": "UklGRiQA....",
  "language": ["en"],
  "context": {
    "app": {
      "type": "email"
    },
    "dictionary_context": [],
    "textbox_contents": {
      "before_text": "",
      "selected_text": "",
      "after_text": ""
    },
    // ... for a full list of available fields, see the "Request Schema" page
  }
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

audio

string

required

Base64-encoded, 16kHz wav audio in PCM16 format (16-bit signed integer). Max size is 25MB / 6 minutes of audio

Example:

"UklGRiQA...."

language

string[]

required

The list of languages the user might speak, set to one to skip language detection, set to empty to look through entire language list

Example:

["en", "fr"]

context

object

required

Additional information about the context surrounding the dictation

Show child attributes

Response

Successful transcription

string<uuid>

Unique identifier for the transcription

Example:

"550e8400-e29b-41d4-a716-446655440000"

text

string

The transcribed text with formatting

Example:

"Testing testing 1, 2, 3"

detected_language

string

Detected language code

Example:

"en"

total_time

integer

Total processing time in milliseconds

Example:

432

generated_tokens

integer

Number of tokens used

Example:

9

REST API (slower)REST with Client auth

Getting Started

Basics

Other Endpoints

Client Side Auth

Sample Projects

REST with API key auth

Request Body

Authorizations

Body

Response

Getting Started

Basics

Other Endpoints

Client Side Auth

Sample Projects

​Request Body

Authorizations

Body

Response

Request Body