“We consistently achieved 100% accuracy” - MacLife
Flow’s Voice interface API allows you to create magical voice interactions in your applications. Flow can convert audio into text with high accuracy across 100+ languages with a blazing fast API. What separates Flow from traditional “speech-to-text” models is that it’s optimized to understand what people say, and output text in their style: with auto-edits, removing filler words, getting names right, making their message more concise while maintaining their tone and more.
  • Auto edits: “Let’s meet at 6pm, actually let’s do 7” -> “Let’s meet at 7pm”
  • Concise while maintaining tone and removing filler words: “Search for, um, what’s it called, my flight tickets” -> “Search for my flight tickets”
  • Getting names right: “Hi Tony, how’s it going?” -> “Hi Tanay, how’s it going?”
This lets you create the same magical experience in your application that users feel using the Wispr Flow desktop and mobile clients which let them use voice everywhere on their computers.

Two easy ways to integrate Flow

  1. WebSocket API (recommended) Stream audio via WebSocket to /ws (lower latency + streaming response)
  2. REST API: Send a POST request to /api (slower, for when websockets are not available)

Two ways to authenticate

  1. API-key based auth: Keep your API key securely on your backend. This is easier, but is going to be lower latency since the requests can’t be generated client-side directly
  2. Client side auth (recommended): Generate access tokens for your clients to let them directly access the API. This is lower latency and also makes it easier to implement with Websockets.