audio
Base64 encoded, 16kHz wav audio of the recorded speech.
This is the only required field. Maximum size is 25MB or 6 minutes of audio.
language
Optional list of (ISO 639-1) language codes that the user is expected to speak.
Setting the list size to 1 forces the transcription into the specified language. Not providing an input attempts autodetection on full list of languages (less accurate).
context
Optional contextual information about the circumstances surrounding the user dictation. Flow can use these information to make its output more accurate by for example, getting names right, resolving speech ambiguities, etc.
All properties are optional and will use default values if not provided.
Information about the application the user is dictating into, used to customize the writing style.
List of uncommon names or words relevant to the context that might be mentioned by the user.
User ID in the app that the person is dictating in, like their email address in an email client application.
First name of the speaker, used to make sure Flow spells the speaker’s name correctly.
Last name of the speaker.
Text under and surrounding the cursor in the active textbox. Flow uses this information to decide casing, spacing and punctuation.
Screenshot of the screen or the app the user is dictating in, for when the user is referencing something on the screen.
Plaintext content of the current page in the app user is dictating in. This is a more efficient way of providing context compared to screenshot.
HTML content of the app user is dictating in (a more feature-rich alternative to
content_text
).Chatbot style history of the conversation the user is dictating in. This typically applies to messaging and AI apps.
properties
This field has been deprecated and its functionality replaced by the
context
field above. We strongly encourage you to migrate to the new schema. You can find the equivalent fields in the new request type. If both context
and properties
are provided, properties
will be ignored.