Access the Flow WebSocket API

How it works

Connect:

Open a WebSocket connection to the server API-key endpoint: wss://api.wisprflow.ai/api/v1/dash/ws?api_key=<YOUR_API_KEY> Client auth endpoint (Recommended): wss://api.wisprflow.ai/api/v1/dash/client_ws?client_key=<CLIENT_KEY> We recommend using the client endpoint with client-side authentication to connect directly to our servers to lower latency.

Send Messages:

  • Start: Begin a new transcription session
  • Append: Send audio chunks sequentially, ideally 1-second chunks
Make sure the chunks you send are of the same duration. Varying durations can cause unexpected failures.

Receive Responses:

Partial and final results are returned during the session.
TYPEDESCRIPTIONExample
authAuthenticates the client with the server{ "type": "auth", "access_token": "..." }
appendSends a chunk of audio data and (optionally) metadata to the server{ "type": "append", "audio": "...chunk_audio...", "position": 0 }
commitMark the end of audio{ "type": "commit", "total_packets": 284 }
Once connected, the first message you send must be a start message, with any API properties you’d like to pass in. After receiving confirmation that the session started, you can send multiple chunks of audio data. You must stream base64-encoded single-channel 16-bit (int16) PCM wav data sampled at 16 KHz. When you finish sending all audio chunks, you MUST send a commit message that includes the total number of audio chunks sent.

Step-by-Step Guide

Establish WebSocket Connection

Connect to the WebSocket server using your API key:
const ws = new WebSocket("wss://api.wisprflow.ai/api/v1/dash/ws?api_key=<YOUR_API_KEY>");

ws.onopen = () => {
  console.log("WebSocket connected");
};

ws.onerror = (error) => {
  console.error("WebSocket error:", error);
};

ws.onclose = () => {
  console.log("WebSocket closed");
};

Start a Session

Send a start message to initialize the transcription session:
ws.send(JSON.stringify({
  type: "auth",
  access_token: "...",
  language: ["en"], // If known beforehand, set to list of languages the user speaks
  context: { // Optional contextual information used to improve dictation quality when applicable
    app: {
      name: "Weather Forecast Chatbot",
      type: "ai" // one of {email, ai, other}
    },
    dictionary_context: [], // list of uncommon names or words relevant to the context
    user_identifier: "john_doe_1",
    user_first_name: "John",
    user_last_name: "Doe",
    textbox_contents: {
      before_text: "", // text immediately before cursor position
      selected_text: "", // text highlighted by cursor for replacement
      after_text: "" // text immediately after cursor position
    },
    screenshot: null, // screenshor for when speaking about something on the screen
    content_text: null, // plaintext content of the app user is dictating in
    content_html: null, // HTML content of the app user is dictating in
    conversation: { // chatbot style history of conversation (messaging or AI app)
      id: "ai_chat_126", // conversation identifier
      participants: ["John Doe", "AI Assistant"], // list of people the user might be addresing
      messages: [ // list of messages in conversation, in chronological order
        {
          role: "user" // One of {user, human, assistant}
          content: "How is the weather in SF today?"
        },
        {
          role: "assistant"
          content: "It's partly cloudy. Do you want to know the temperature?"
        }
      ]
    }
  }
}));
Upon successful authentication, the server will respond with an "auth" message.

Stream Audio

Send audio chunks using append messages. Ensure each chunk is Base64-encoded and includes a sequential chunk_number:
ws.send(JSON.stringify({
  type: "append",
  position: 0,
  audio_packets: {
    packets: [base64Audio1, base64Audio2, base64Audio3, ...], // you can send any number of packets
    volumes: [volume1, volume2, volume3, ...], // The number of volumes must match the number of packets
    packet_duration: BUFFER_SIZE / audioContext.sampleRate, // in seconds
    audio_encoding: "wav", // pcm wav data 
    byte_encoding: "base64" // set to "binary" for reduced message size (see "Network Optimization" below)
  }
}));
If you send 4 packets in the first and second append messages, the third append message must have position: 8

Receive Final Transcription

When all audio chunks have been sent, send a final append message with final: true:
ws.send(JSON.stringify({
  type: "commit",
  total_packets: n
}));

ws.onmessage = (event) => {
  const response = JSON.parse(event.data);
  if (response.status == "text") {
    if (response.final) { // final transcription result (websocket will close)
      console.log("Final server response");
    } else { // intermediate transcription result (roughly every 30 seconds)
      console.log("Interim server response");
    }
  } else if (response.status == "error") { // unrecoverable (websocket will close)
    console.log("Server returned error response");
  } else { // other information
    console.log("Server returned other response");
  }
};

Responses

Connection Authenticated

{
  "status": "auth"
}

Commit Acknowledgement

{
  "status": "info",
  "message": {
    "event": "commit_received"
  }
}

Partial Transcriptions

{
  "status": "text",
  "position": 180,
  "final": false,
  "body": {
    "text": "This is a partial transcription...",
    "detected_language": "en"
  }
}

Final Transcription

{
  "status": "text",
  "position": 205,
  "final": false,
  "body": {
    "text": "This is a partial transcription... and this is the full text.",
    "detected_language": "en"
  }
}

Network Optimization

To reduce the message size being sent over the network, instead of encoding binary data (audio and screenshot) using base64, you can serialize the message into the MessagePack format and send as binary data. To enable binary transmission mode, add the following field to the websocket header:
Encoding: 'msgpack'
Also, set byte_encoding: "binary" in the append messages.
When binary transmission mode is activated, the responses received over the websocket will also be serialized using MessagePack and will need deserialization by the client.