Skip to content

Doubao Voice Input Guide

Doubao Voice Input is a deeply integrated speech recognition feature in HagiCode that utilizes the ByteDance Doubao Open Platform’s speech recognition service. This feature allows you to quickly convert your speech to text by speaking, enabling you to work hands-free and complete various operations with voice. Through deep integration, HagiCode leverages the current project’s context information to provide more accurate recognition of domain-specific vocabulary and technical terms, significantly improving recognition accuracy in technical discussions.

In the HagiCode platform, Doubao Voice Input is particularly suitable for:

  • Quickly inputting proposals: Create and submit proposals by simply speaking, significantly improving efficiency without typing
  • Providing comments: Add comments and feedback during code reviews or document reviews with voice input
  • Replying to messages: Quickly respond to messages in conversations and discussions without typing
  • Long-form content creation: Quickly generate project documentation, technical specifications, meeting notes, and other long-form content
  • Deep Integration with Context Awareness: HagiCode deeply integrates with Doubao speech recognition, automatically leveraging current project context (such as code structure, technology stack, domain terminology) to provide precise domain vocabulary recognition, significantly improving technical term recognition accuracy
  • Free Trial Hours: Doubao Platform provides 20 hours of free recognition time for new users to experience voice input with zero barrier
  • Extremely Fast: Real-time recognition, see text as you speak with no waiting
  • High Accuracy: Powered by Doubao’s advanced speech recognition models combined with project context for precise and reliable results
  • Seamless Integration: Directly integrated into HagiCode’s message input box, no need to switch applications
  • Easy to Use: Simply click the microphone to start, intuitive and straightforward

Before using Doubao Voice Input, you need to:

  1. Get a Doubao Open Platform Account

  2. Create an Application and Get Credentials

    • Create a speech recognition application on the platform
    • Get your APP ID and Access Token
  3. Ensure Network Connection

    • Speech recognition service requires network connection
    • Ensure your device can access the Doubao API service

Here are the basic steps to use Doubao Voice Input:

  1. Get APP ID and Access Token
  2. Configure voice recognition in product
  3. Test API Key
  4. Find voice input box
  5. Click microphone button
  6. Authorize microphone permission
  7. Start recording
  8. View recognition results in real-time
  9. Click to stop recording
  10. Text insertion complete

Doubao Voice Input requires running in an HTTP/HTTPS environment.

EnvironmentSupport StatusDescription
localhostLocal development environment (http://localhost or http://127.0.0.1)
HTTPS Remote ServerWeb applications deployed to public networks with HTTPS
HTTP Remote ServerWeb applications deployed to public networks with HTTP
file:// protocolDirectly opening HTML files is not supported

HagiCode’s Desktop version has a built-in local HTTP environment and fully supports voice input. You can use speech recognition directly without additional configuration.

Local host mode supports voice input:

  • Both HTTP and HTTPS are supported when using localhost or 127.0.0.1
  • HTTPS is required when deployed to public networks
  1. Visit Doubao Speech Recognition Console
  2. Log in or register an account
  3. Go to the console and create a new speech recognition application
  4. In the application details page, find and copy the APP ID
  1. In the Doubao Open Platform console
  2. Go to your speech recognition application
  3. Find the API key management area
  4. Generate or copy the Access Token
  1. Open HagiCode application
  2. Go to SettingsVoice Recognition Settings
  3. Fill in the following information in the configuration form:
    • Provider: Select doubao (Doubao)
    • APP ID: Paste the APP ID you got from Doubao platform
    • Access Token: Paste the Access Token you got from Doubao platform
  4. (Optional) Adjust other configuration parameters as needed
  5. Click Test API Key button to verify the configuration
  6. After successful verification, the configuration is automatically saved to browser local storage
FieldRequiredDescriptionDefault
ProviderYesSpeech recognition service providerdoubao
APP IDYesApplication unique identifier, obtained from Doubao Open Platform-
Access TokenYesAuthentication access token, obtained from Doubao Open Platform-
Service URLNoAPI service address, usually use default value(1)
Hotword Table IDNoUsed to improve recognition accuracy for specific vocabulary, see Hotword Table Documentation-
Max Recording DurationNoMaximum duration for a single recording, range 10-600 seconds300 seconds
Sample RateNoAudio sample rate, supports 16000 Hz16000 Hz
Bit DepthNoAudio bit depth16-bit
Channel CountNoAudio channel count, mono1
(1) wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async

After configuration, it’s recommended to test whether the API Key is valid:

  1. Click Test API Key button
  2. The system will call the test interface to verify your configuration
  3. If the configuration is correct, a success message will be displayed
  4. If the configuration is incorrect, an error message will be displayed. Please check:
    • Whether APP ID and Access Token are correct
    • Whether network connection is normal
    • Whether Access Token has expired

In HagiCode, text input boxes with a microphone icon support voice input. These components are typically called VoiceTextArea.

Common voice input locations include:

  • Message input boxes
  • Text editing areas
  • Various form fields that require text input
  1. Find the input box with a microphone icon
  2. Click the microphone button
  3. The browser will request microphone permission
  4. Click Allow to authorize microphone access

After authorization is successful, voice recognition will start automatically, and you can see:

  • Waveform Animation: Dynamic sound wave waveform will display inside the microphone button, indicating recording is in progress
  • Duration Display: Current recording duration will display below the button
  • Real-time Recognition: Recognized text will temporarily display at the cursor position

During recording, the speech recognition engine will convert your speech to text in real-time:

  • Recognized text will be temporarily displayed in gray in the input box
  • Recognition results will continuously update as you speak
  • Supports Mandarin Chinese recognition with high accuracy

To stop voice recognition, you can:

  1. Click the microphone button: Click the button again to stop recording
  2. Click the input box: Click other areas of the input box will also stop recording

After stopping, the final recognized text will be formally inserted into the input box, and you can continue editing or send.