The idea arrived while I was walking. A clear solution to the problem I'd been stuck on for two days. I pulled out my phone, opened a note-taking app, and started talking.
Nothing appeared on screen.
I kept talking. Still nothing. I stopped mid-sentence, stared at the loading spinner, and by the time text finally showed up, the second half of the thought was gone.
That's the cost of latency. Not just frustration. Lost ideas.
The invisible threshold your brain enforces
Your brain predicts the next word while you're speaking. When text appears fast enough, you stay in the flow — your thoughts pour out without you thinking about the tool.
But when the screen lags even half a second, something breaks. You pause to check if the app heard you. That pause costs you context. The thought you were building collapses. You forget what you were about to say next.
I've tested this with dozens of apps. The pattern is always the same: when first characters appear in under 300 milliseconds, people keep talking. When it takes longer, they start babysitting the app instead of thinking.
What killed my flow (and how I fixed it)
I used to use a popular transcription app that sent everything to the cloud. Beautiful interface. Great marketing. Terrible for thinking.
Here's what would happen: I'd start a voice note walking to my car. The app would show a pulsing microphone icon. I'd talk for 15 seconds. Nothing on screen. I'd stop talking, worried it wasn't recording. Then suddenly, a wall of text would appear — transcribed in the cloud, sent back to my phone.
By that point, I'd lost the thread.
The fix was switching to on-device transcription. Same phone, same voice, completely different experience. Text started streaming in under 300ms. I could see my words forming as I spoke them. My brain stayed in flow because there was no gap between speaking and seeing.
The two delays that hurt most
Time to first character. This is the moment between when you start speaking and when you see the first word on screen. If this takes longer than a second, people stop trusting the tool. They'll check. They'll re-record. They'll switch apps.
Under 300ms? They keep going without thinking about it.
Jitter. When speed swings unpredictably — fast sometimes, slow other times — your brain never knows whether to trust the tool. You start recording defensively, speaking slower than natural, checking constantly.
Consistent speed matters more than peak speed.
What "fast enough" actually feels like
I asked ten people to try Brain Dump for a week and describe when it felt fast. The pattern was clear:
- Text starts appearing in under 300ms. Not after they finish the sentence. As they're speaking.
- Characters stream smoothly. No jumps. No sudden dumps of text.
- No spinner before seeing words. The mic icon is fine. A loading spinner before text? That kills flow.
- They can talk for a minute without thinking about the app. If they're monitoring the screen instead of their thoughts, it's too slow.
One person described it perfectly: "It feels like the app is keeping up with my brain instead of me waiting for the app."
Five fixes that work
After testing this across different phones, networks, and environments, these five changes make the biggest difference:
1. Use on-device transcription
Network round trips add hundreds of milliseconds. Every packet that leaves your phone and comes back costs you time. On-device transcription removes that entire path.
Brain Dump uses Apple's on-device speech recognition by default. Same quality, no network dependency, instant results.
2. Position your mic correctly
I used to hold my phone like I was making a call — right against my ear. Terrible for transcription.
The sweet spot: 15-25cm from your mouth, microphone facing you. Too close and you get muffled pops. Too far and background noise competes.
One fix that surprised me: face away from wind. Even a slight breeze hitting the mic creates noise that slows transcription.
3. Speak in short sentences
Your brain and the transcription model both work better with natural pauses. Long rambling sentences force the model to hold context longer before deciding how to segment words.
I started speaking like I write: short sentences, clear breaks. Transcription got faster and more accurate.
4. Close competing apps
If you have three apps trying to use the microphone, or a dozen apps crushing your CPU, transcription slows down. This was especially noticeable on older iPhones.
Before recording something important, I do a quick swipe-up and close apps I'm not using. Makes a difference on iPhone 12 and earlier.
5. Keep cloud polish separate from capture
Some apps try to transcribe and polish simultaneously. They transcribe your voice, send it to ChatGPT for cleanup, then show you the polished version. Feels magical when it works, but adds crushing latency.
Brain Dump splits these: capture is instant and local, polish is optional and happens after. You stay in flow during capture.
Test this yourself in 60 seconds
Want to know if your current tool is fast enough? Try this:
- Open your transcription app (Brain Dump or whatever you use).
- Start recording. Say a title first: "Latency test."
- Read three short sentences out loud. Watch for when the first character appears.
- If you see a spinner or wait more than a second, switch to Airplane Mode and try again.
If offline mode is faster, your bottleneck is the network. If it's still slow, the app is doing too much processing.
The real cost of slow capture
I lost ideas for years before I realized the problem wasn't my memory. It was latency.
Every time I had to wait for an app, every time I stopped mid-thought to check if it was recording, every time I lost flow because text wasn't appearing — those weren't small frustrations. They were lost work.
The ideas you have while walking, driving, showering — those are often your best ideas because your brain is relaxed enough to make connections. But they're also fragile. If you can't capture them instantly, they disappear.
Fast capture isn't a nice-to-have feature. It's the difference between writing down the thought and losing it forever.
Bottom line
If you only fix one thing about your voice capture workflow, fix time to first character. Get it under 300ms. Everything else — accuracy, formatting, polish — gets easier once people can stay in flow.
Because the real goal isn't transcription. It's thinking. And thinking requires flow.
Related: Offline capture in Airplane Mode for the fastest possible transcription, and wired vs wireless mics for better signal quality while walking.

