Dictating to Claude Code on a phone
Adding a mobile wrapper page to the ttyd browser terminal with an arrow-key toolbar and voice dictation via the Web Speech API. The terminal is usable from a phone, including speaking commands and code directly into it.
The problem with ttyd on a phone
The ttyd terminal works fine in a mobile browser, but two things make it painful:
- No arrow keys. The on-screen keyboard covers most of the screen and has no cursor keys, Esc, Tab, or Ctrl combos that a terminal constantly needs.
- Typing long prompts is slow. Dictating to Claude is faster than typing on a phone keyboard.
Mobile wrapper page
A separate Caddy route at /code-mobile serves a static HTML page that embeds the ttyd terminal in an iframe and adds a toolbar below it. The Caddyfile route:
handle /code-mobile {
@tailnet header Tailscale-User-Login *
handle @tailnet {
root * /home/elendal/IdeaProjects/claude-code-mobile
rewrite * /code-mobile.html
file_server
}
respond "Forbidden" 403
}
The ttyd instance for mobile runs on a separate port (7683) with a smaller font size, behind its own Caddy route at /code-term*. The iframe points to /code-term/.
The key toolbar injects escape sequences directly into xterm.js's hidden textarea using the native value setter and InputEvent:
function send(seq) {
const doc = document.getElementById('term').contentDocument;
const ta = doc && doc.querySelector('textarea');
if (!ta) return;
ta.focus();
const nativeInputSetter = Object.getOwnPropertyDescriptor(
window.frames[0].HTMLTextAreaElement.prototype, 'value').set;
nativeInputSetter.call(ta, seq);
ta.dispatchEvent(new Event('input', { bubbles: true }));
}
Regular text (from voice dictation) needs InputEvent with a data property — xterm.js reads e.data for printable characters and ignores Event('input') without it:
function sendText(text) {
const setter = Object.getOwnPropertyDescriptor(
window.frames[0].HTMLTextAreaElement.prototype, 'value').set;
setter.call(ta, text);
ta.dispatchEvent(new InputEvent('input', { bubbles: true, data: text, inputType: 'insertText' }));
}
Voice dictation
The Web Speech API (webkitSpeechRecognition) is built into Chrome on Android. No model download, no server, no audio encoding issues.
Toggle-to-talk: one tap starts recording (button turns red), another tap stops it. Each final result is injected into the terminal as it arrives, so long dictation sessions work naturally.
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.lang = 'en-US';
recognition.continuous = true;
recognition.interimResults = false;
recognition.onresult = e => {
const result = e.results[e.results.length - 1];
if (!result.isFinal) return;
const text = result[0].transcript.trim();
if (text) sendText(text + ' ');
};
continuous: true keeps the session open until explicitly stopped. interimResults: false means only finalized text is processed, avoiding duplicate injections.
What I tried first: Whisper in the browser
The original plan was to run Whisper locally via HuggingFace Transformers.js — no server calls, fully private. The pipeline API is clean:
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3/dist/transformers.min.js';
const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-base.en', { dtype: 'q8' });
It works in desktop Chrome. On Android Chrome it doesn't, for two reasons:
Audio capture. ScriptProcessorNode (deprecated) doesn't fire onaudioprocess on mobile. AudioWorkletNode (modern replacement) fires but requires the graph to be connected to audioCtx.destination — and even then, mobile Chrome may keep the AudioContext suspended unless resumed from a synchronous user gesture handler, which is awkward with async setup code.
Codec decoding. MediaRecorder on Android Chrome produces audio/webm;codecs=opus. AudioContext.decodeAudioData on the same browser rejects it. Transformers.js uses decodeAudioData internally when you pass a blob URL. There is no built-in WAV recording option in MediaRecorder on Chrome.
The Web Speech API sidesteps all of this. If offline/private transcription becomes a requirement, the audio capture problem would need a different approach — possibly recording to a remote server and running Whisper there.
Files
~/IdeaProjects/claude-code-mobile/code-mobile.html— wrapper page~/IdeaProjects/infra/Caddyfile—/code-mobileand/code-term*routes~/.config/systemd/user/claude-code-ttyd-mobile.service— ttyd instance on port 7683