There is a technical reason to stream AI responses: latency. A response that begins appearing in under a second feels fast even if it takes twenty seconds to complete. But the more interesting effect of streaming is psychological. It transforms the interaction from a request-response transaction into something that feels like a conversation.
The thinking effect
When tokens appear progressively, the user watches the AI think. They see the response take shape. They can anticipate where it is going. They can decide, mid-stream, whether the direction is right or whether they want to interrupt and redirect.
This is fundamentally different from a loading spinner followed by a wall of text. The spinner creates a binary experience: waiting, then done. Streaming creates a continuous experience: the response is alive, evolving, building toward something. The user is engaged throughout rather than disengaged during the wait and overwhelmed by the result.
Designing for the stream
Streaming requires different design thinking than static responses. The UI needs to handle partial content gracefully. Formatting must be applied progressively as tokens arrive - a heading should look like a heading as soon as enough tokens have arrived to identify it, not only after the entire response is complete.
We pay particular attention to the rhythm of the stream. Tool calls that happen mid-response - the AI deciding to look something up or perform an operation - create natural pauses. We use these pauses productively, showing the user what the AI is doing rather than presenting a frozen stream. "Looking up your recent campaign performance" is more useful than a hanging cursor.
The interruption contract
Streaming creates an implicit contract with the user: you can stop this at any time. If the AI is heading in the wrong direction, the user can interrupt and redirect without waiting for it to finish generating content they do not want. This sense of control is psychologically important. It reinforces that the human is directing the AI, not waiting on it.
We design the interruption experience carefully. Stopping a stream should feel as natural as interrupting a colleague mid-sentence. The AI acknowledges the interruption, processes the redirect, and continues from the new direction without losing the context of the conversation so far.
Beyond text
Streaming is not limited to text responses. When the AI performs a sequence of operations - creating content, then scheduling it, then composing a promotional email - each step streams its progress. The user watches the workflow unfold rather than waiting for a batch result. They can intervene at any step.
This operational streaming transforms complex multi-step workflows from opaque batch operations into transparent, supervisable processes. The user is not just a requester who waits for results. They are a collaborator who watches, evaluates, and guides the work in progress.
Streaming is not about speed. It is about presence.
- Cleo's Team