Access the Flow WebSocket API
How it works
Connect:
Open a WebSocket connection to the server
API-key endpoint: wss://platform-api.wisprflow.ai/api/v1/dash/ws?api_key=Bearer%20<YOUR_API_KEY>
Client auth endpoint (Recommended): wss://platform-api.wisprflow.ai/api/v1/dash/client_ws?client_key=Bearer%20<CLIENT_KEY>
We recommend using the client endpoint with client-side authentication to connect directly to our servers to lower latency.
Send Messages:
- Start: Begin a new transcription session
- Append: Send audio chunks sequentially, ideally 1-second chunks
Make sure the chunks you send are of the same duration. Varying durations can cause unexpected failures.
Receive Responses:
Partial and final results are returned during the session.
TYPE | DESCRIPTION | Example |
---|
auth | Authenticates the client with the server | { "type": "auth", "access_token": "..." } |
append | Sends a chunk of audio data and (optionally) metadata to the server | { "type": "append", "audio": "...chunk_audio...", "position": 0 } |
commit | Mark the end of audio | { "type": "commit", "total_packets": 284 } |
Once connected, the first message you send must be a start message, with any API properties you’d like to pass in. After receiving confirmation that the session started, you can send multiple chunks of audio data. You must stream base64-encoded single-channel 16-bit (int16) PCM wav data sampled at 16 KHz. When you finish sending all audio chunks, you MUST send a commit message that includes the total number of audio chunks sent.
Step-by-Step Guide
Establish WebSocket Connection
Connect to the WebSocket server using your API key:
const ws = new WebSocket("wss://platform-api.wisprflow.ai/api/v1/dash/ws?api_key=Bearer%20<YOUR_API_KEY>");
ws.onopen = () => {
console.log("WebSocket connected");
};
ws.onerror = (error) => {
console.error("WebSocket error:", error);
};
ws.onclose = () => {
console.log("WebSocket closed");
};
Start a Session
Send a start message to initialize the transcription session:
ws.send(JSON.stringify({
type: "auth",
access_token: "...",
language: ["en"], // If known beforehand, set to list of languages the user speaks
context: { // Optional contextual information used to improve dictation quality when applicable
app: {
name: "Weather Forecast Chatbot",
type: "ai" // one of {email, ai, other}
},
dictionary_context: [], // list of uncommon names or words relevant to the context
user_identifier: "john_doe_1",
user_first_name: "John",
user_last_name: "Doe",
textbox_contents: {
before_text: "", // text immediately before cursor position
selected_text: "", // text highlighted by cursor for replacement
after_text: "" // text immediately after cursor position
},
screenshot: null, // screenshor for when speaking about something on the screen
content_text: null, // plaintext content of the app user is dictating in
content_html: null, // HTML content of the app user is dictating in
conversation: { // chatbot style history of conversation (messaging or AI app)
id: "ai_chat_126", // conversation identifier
participants: ["John Doe", "AI Assistant"], // list of people the user might be addresing
messages: [ // list of messages in conversation, in chronological order
{
role: "user" // One of {user, human, assistant}
content: "How is the weather in SF today?"
},
{
role: "assistant"
content: "It's partly cloudy. Do you want to know the temperature?"
}
]
}
}
}));
Upon successful authentication, the server will respond with an "auth"
message.
Stream Audio
Send audio chunks using append messages. Ensure each chunk is Base64-encoded and includes a sequential chunk_number:
ws.send(JSON.stringify({
type: "append",
position: 0,
audio_packets: {
packets: [base64Audio1, base64Audio2, base64Audio3, ...], // you can send any number of packets
volumes: [volume1, volume2, volume3, ...], // The number of volumes must match the number of packets
packet_duration: BUFFER_SIZE / audioContext.sampleRate, // in seconds
audio_encoding: "wav", // pcm wav data
byte_encoding: "base64" // set to "binary" for reduced message size (see "Network Optimization" below)
}
}));
If you send 4 packets in the first and second append
messages, the third append message must have position: 8
Receive Final Transcription
When all audio chunks have been sent, send a final append message with final: true
:
ws.send(JSON.stringify({
type: "commit",
total_packets: n
}));
ws.onmessage = (event) => {
const response = JSON.parse(event.data);
if (response.status == "text") {
if (response.final) { // final transcription result (websocket will close)
console.log("Final server response");
} else { // intermediate transcription result (roughly every 30 seconds)
console.log("Interim server response");
}
} else if (response.status == "error") { // unrecoverable (websocket will close)
console.log("Server returned error response");
} else { // other information
console.log("Server returned other response");
}
};
Responses
Connection Authenticated
Commit Acknowledgement
{
"status": "info",
"message": {
"event": "commit_received"
}
}
Partial Transcriptions
{
"status": "text",
"position": 180,
"final": false,
"body": {
"text": "This is a partial transcription...",
"detected_language": "en"
}
}
Final Transcription
{
"status": "text",
"position": 205,
"final": false,
"body": {
"text": "This is a partial transcription... and this is the full text.",
"detected_language": "en"
}
}
Network Optimization
To reduce the message size being sent over the network, instead of encoding binary data (audio and screenshot) using base64, you can serialize the message into the MessagePack format and send as binary data.
To enable binary transmission mode, add the following field to the websocket header:
Also, set byte_encoding: "binary"
in the append messages.
When binary transmission mode is activated, the responses received over the websocket will also be serialized using MessagePack and will need deserialization by the client.