Request Schema

In all our transcription endpoints, there are three different inputs that you can provide to the API.

`audio`

Base64 encoded, 16kHz wav audio of the recorded speech.

This is the only required field. Maximum size is 25MB or 6 minutes of audio.

`language`

Optional list of (ISO 639-1) language codes that the user is expected to speak. Setting the list size to 1 forces the transcription into the specified language. Not providing an input attempts autodetection on full list of languages (less accurate).

`context`

Optional contextual information about the circumstances surrounding the user dictation. Flow can use these information to make its output more accurate by for example, getting names right, resolving speech ambiguities, etc.

All properties are optional and will use default values if not provided.

app

object

Information about the application the user is dictating into, used to customize the writing style.

Show child attributes

name

string

The name of the application.

type

string

The type of the application. Currently supports the following list:

email: Email clients and anywhere the user would be trying to dictate an email.
ai: Applications where the user is conversing with an AI chatbot / agent and not a human.
other: Any application that does not fit in the two groups above.

dictionary_context

array

default:"[]"

List of uncommon names or words relevant to the context that might be mentioned by the user.

user_identifier

string

User ID in the app that the person is dictating in, like their email address in an email client application.

user_first_name

string

First name of the speaker, used to make sure Flow spells the speaker’s name correctly.

user_last_name

string

Last name of the speaker.

textbox_contents

object

Text under and surrounding the cursor in the active textbox. Flow uses this information to decide casing, spacing and punctuation.

Show child attributes

before_text

object

The text immediately before the cursor.

selected_text

object

The text the user has highlighted.

after_text

object

The text immediately after the cursor.

screenshot

string

Screenshot of the screen or the app the user is dictating in, for when the user is referencing something on the screen.

content_text

string

Plaintext content of the current page in the app user is dictating in. This is a more efficient way of providing context compared to screenshot.

content_html

string

HTML content of the app user is dictating in (a more feature-rich alternative to content_text).

conversation

object

Chatbot style history of the conversation the user is dictating in. This typically applies to messaging and AI apps.

Show child attributes

string

required

Identifier of the conversation. For example name of group chat in case of a messaging app.

participants

array

List of names of other people in the conversation, used to get their names correct in case of uncommonly spelled names.

messages

array

List of recent messages in the conversation, in the chronological order. Used to identify proper context and names of things being mentioned that are not conversation participants.

Show message item attributes

role

string

required

Type of the message sender. One of: user, human, assistant

content

string

required

Content of the message.

`properties`

This field has been deprecated and its functionality replaced by the context field above. We strongly encourage you to migrate to the new schema. You can find the equivalent fields in the new request type. If both context and properties are provided, properties will be ignored.

Show properties

language

string

default:"en"

2-digit ISO language code (e.g. ‘en’, ‘es’)

app_type

string

default:"other"

Flow formats appropriately depending on if the user is prompting AI, writing an email, or other tasks. Options: ‘ai’, ‘email’, ‘other’

dictionary

array

default:"[]"

List of dictionary words to help with transcription accuracy

after_text

string

default:true

The text immediately after the cursor. Flow uses it to decide spacing / punctuation.

before_text

string

default:true

The text immediately before the cursor. Flow uses it to decide spacing / punctuation.

selected_text

string

default:true

The text the user has highlighted. Flow uses it to decide spacing / punctuation.

Getting Started

Basics

Other Endpoints

Client Side Auth

Sample Projects

`audio`

`language`

`context`

`properties`

Getting Started

Basics

Other Endpoints

Client Side Auth

Sample Projects

​audio

​language

​context

​properties

`audio`

`language`

`context`

`properties`