Processing Audio with Whisper and Managing State
You've already built a skill to process text messages from Telegram. Now, let's enhance it to handle audio messages and prevent duplicate entries. By the end of this lesson, you'll be able to integrate Whisper for audio transcription into your AI skills and manage processing state to avoid duplicate work.
Core idea
When building AI agents that interact with external services, you often encounter two challenges: processing different data types (like audio) and ensuring you only process new data. For audio, you need a tool to convert speech to text. For avoiding duplicates, you need a way to track what's already been processed.
Whisper is an open-source speech-to-text model developed by OpenAI. It can transcribe audio in many languages and works locally on your machine, meaning your audio data doesn't leave your computer. When you instruct Claude Code to "transcribe audio using Whisper," it will automatically download and run the necessary model. The first time you use it, this download might take a few minutes.
To prevent reprocessing old data, you can leverage unique identifiers provided by the external service. Telegram, for example, assigns a unique update_id to every incoming message. These IDs are sequential. By storing the update_id of the last processed message, your skill can pick up exactly where it left off, only processing messages with a higher update_id. This ensures efficiency and prevents your output files from being filled with redundant information.
Walkthrough
Let's enhance your existing /telegram-notes skill to handle audio messages using Whisper and manage the processing state with update_id.
Task: Modify your /telegram-notes skill to include Whisper transcription and state management.
Open your
SKILL.mdfile: Navigate to.claude/skills/telegram-notes/SKILL.mdin your project's file explorer (e.g., VS Code).Add Whisper transcription: Locate the part of your skill that processes messages. You'll need to add an instruction for Claude Code to transcribe audio messages using Whisper before it attempts to classify them.
This instruction tells Claude Code to identify audio messages and use Whisper to convert them into text, making them available for subsequent classification.
- Implement state management:
You need to instruct Claude Code to save the
update_idof the last processed message and then use this information on subsequent runs.
This ensures that your skill will only fetch and process new messages, avoiding duplicates.
Save and restart Claude Code: Save the
SKILL.mdfile. Then, exit Claude Code (by typing/quitor pressingCtrl+C) and restart it by typingclaudein your terminal. This reloads your updated skill.Test your updated skill: Send a mix of 5-10 new text and voice messages to your Telegram bot. Then, run your skill:
/telegram-notesVerify that both text and voice messages are processed and classified correctly in your
ideas.mdandtasks.mdfiles.Test duplicate prevention: Send a couple more new messages (text or voice) to your bot. Then, run
/telegram-notesagain. Check yourideas.mdandtasks.mdfiles. The new messages should be added, but the previously processed messages should not be duplicated.
Common mistakes
- Forgetting to restart Claude Code: If you modify a skill's
SKILL.mdfile, Claude Code needs to be restarted for the changes to take effect. - Not specifying "local Whisper": If you don't specify "local Whisper," Claude Code might try to use a cloud-based transcription service, which could have privacy or cost implications.
- Incorrectly handling
update_id: Ensure the logic for saving and reading the.last_update_idfile is clear and correctly implemented in your skill's instructions to avoid processing old messages or missing new ones.
Key takeaways
- Whisper is an OpenAI model for local, multi-language speech-to-text transcription.
- You can instruct Claude Code to use Whisper by simply asking it to "transcribe audio using local Whisper."
update_idis a unique, sequential identifier for Telegram messages, useful for tracking processing state.- Storing the last processed
update_idallows your skill to avoid reprocessing old data and prevent duplicates. - Restart Claude Code after modifying
SKILL.mdfor changes to take effect.
The student marks this lesson as read to continue. (Knowledge checks coming later.)