Skip to content

Voice mode

Voice mode is a live conversation with the chat assistant — not just dictation. You speak, it speaks back, and the same tools the text chat has are available the whole time. Your phone connects directly to OpenAI’s realtime model over WebRTC for low latency.

  • Moonjar app installed and signed in.
  • Microphone permission granted (the app asks the first time).
  • Read What the chat knows about you if you want to know what gets sent to OpenAI in voice mode — it’s more than a transcription clip.
  1. Open Moonjar and tap the Chat tab.
  2. Tap Conversation mode (the live-voice icon in the chat composer).
  3. Wait for the connection to open, then start speaking.
  4. The assistant replies out loud. Tap Interrupt to stop the reply mid-sentence and say something else.
  5. Tap End to close the session.

Your phone opens a WebRTC peer connection straight to OpenAI’s realtime model. Moonjar’s server only mints a short-lived (60-second) token to start the session — once the connection is up, your audio goes phone-to-OpenAI, not phone-to-Moonjar-to-OpenAI. That’s the latency win.

When you speak, OpenAI’s realtime transcribes it and the model decides whether to call a tool, answer directly, or both. Tool calls come back over the data channel; Moonjar runs them and feeds the results in. Then the model speaks the reply.

A short context block is sent at the start of the session — your collections, recent documents, last 20 memories, location, and custom instructions — so the assistant can answer “what’s on my plate” without burning a tool round-trip on every question.

  • OpenAI sees more than just audio. Memories, document titles, collection names, and your location are sent as system instructions. See What the chat knows about you.
  • A few tools are turned off. Tools that don’t make sense for voice — deep research (long-running), suggest actions (renders buttons), generate insights (background job), show calculation (renders a card) — are skipped. Everything else is available.
  • Voice mode counts against your voice-minutes cap, separate from text chat.

Microphone permission denied. Reset the Moonjar mic permission in Settings → Apps → Moonjar on iOS.

Background noise triggering false interruptions. Voice mode uses server-side voice detection at a high threshold to avoid this, but in a noisy room it can still misfire. Hold the phone closer to your mouth.

The assistant kept getting cut off. A misfiring voice detector can chop replies. Use the on-screen Interrupt button rather than letting the VAD decide.