Doubao Voice Input Guide
Overview
Section titled “Overview”What is Doubao Voice Input
Section titled “What is Doubao Voice Input”Doubao Voice Input is a deeply integrated speech recognition feature in HagiCode that utilizes the ByteDance Doubao Open Platform’s speech recognition service. This feature allows you to quickly convert your speech to text by speaking, enabling you to work hands-free and complete various operations with voice. Through deep integration, HagiCode leverages the current project’s context information to provide more accurate recognition of domain-specific vocabulary and technical terms, significantly improving recognition accuracy in technical discussions.
Key Use Cases
Section titled “Key Use Cases”In the HagiCode platform, Doubao Voice Input is particularly suitable for:
- Quickly inputting proposals: Create and submit proposals by simply speaking, significantly improving efficiency without typing
- Providing comments: Add comments and feedback during code reviews or document reviews with voice input
- Replying to messages: Quickly respond to messages in conversations and discussions without typing
- Long-form content creation: Quickly generate project documentation, technical specifications, meeting notes, and other long-form content
Why Choose Doubao Voice Input
Section titled “Why Choose Doubao Voice Input”- Deep Integration with Context Awareness: HagiCode deeply integrates with Doubao speech recognition, automatically leveraging current project context (such as code structure, technology stack, domain terminology) to provide precise domain vocabulary recognition, significantly improving technical term recognition accuracy
- Free Trial Hours: Doubao Platform provides 20 hours of free recognition time for new users to experience voice input with zero barrier
- Extremely Fast: Real-time recognition, see text as you speak with no waiting
- High Accuracy: Powered by Doubao’s advanced speech recognition models combined with project context for precise and reliable results
- Seamless Integration: Directly integrated into HagiCode’s message input box, no need to switch applications
- Easy to Use: Simply click the microphone to start, intuitive and straightforward
Quick Start
Section titled “Quick Start”Prerequisites
Section titled “Prerequisites”Before using Doubao Voice Input, you need to:
-
Get a Doubao Open Platform Account
- Visit Doubao Speech Recognition Console
- Register or log in to your account
-
Create an Application and Get Credentials
- Create a speech recognition application on the platform
- Get your
APP IDandAccess Token
-
Ensure Network Connection
- Speech recognition service requires network connection
- Ensure your device can access the Doubao API service
Basic Usage Flow
Section titled “Basic Usage Flow”Here are the basic steps to use Doubao Voice Input:
- Get APP ID and Access Token
- Configure voice recognition in product
- Test API Key
- Find voice input box
- Click microphone button
- Authorize microphone permission
- Start recording
- View recognition results in real-time
- Click to stop recording
- Text insertion complete
Technical Requirements
Section titled “Technical Requirements”HTTP Environment
Section titled “HTTP Environment”Doubao Voice Input requires running in an HTTP/HTTPS environment.
| Environment | Support Status | Description |
|---|---|---|
| localhost | ✓ | Local development environment (http://localhost or http://127.0.0.1) |
| HTTPS Remote Server | ✓ | Web applications deployed to public networks with HTTPS |
| HTTP Remote Server | ✗ | Web applications deployed to public networks with HTTP |
| file:// protocol | ✗ | Directly opening HTML files is not supported |
Desktop Version
Section titled “Desktop Version”HagiCode’s Desktop version has a built-in local HTTP environment and fully supports voice input. You can use speech recognition directly without additional configuration.
Host Mode
Section titled “Host Mode”Local host mode supports voice input:
- Both HTTP and HTTPS are supported when using
localhostor127.0.0.1 - HTTPS is required when deployed to public networks
Configuration Steps
Section titled “Configuration Steps”Get APP ID
Section titled “Get APP ID”- Visit Doubao Speech Recognition Console
- Log in or register an account
- Go to the console and create a new speech recognition application
- In the application details page, find and copy the
APP ID
Get Access Token
Section titled “Get Access Token”- In the Doubao Open Platform console
- Go to your speech recognition application
- Find the API key management area
- Generate or copy the
Access Token
Configure in Product
Section titled “Configure in Product”- Open HagiCode application
- Go to Settings → Voice Recognition Settings
- Fill in the following information in the configuration form:
- Provider: Select
doubao(Doubao) - APP ID: Paste the APP ID you got from Doubao platform
- Access Token: Paste the Access Token you got from Doubao platform
- Provider: Select
- (Optional) Adjust other configuration parameters as needed
- Click Test API Key button to verify the configuration
- After successful verification, the configuration is automatically saved to browser local storage
Configuration Field Description
Section titled “Configuration Field Description”| Field | Required | Description | Default |
|---|---|---|---|
| Provider | Yes | Speech recognition service provider | doubao |
| APP ID | Yes | Application unique identifier, obtained from Doubao Open Platform | - |
| Access Token | Yes | Authentication access token, obtained from Doubao Open Platform | - |
| Service URL | No | API service address, usually use default value | (1) |
| Hotword Table ID | No | Used to improve recognition accuracy for specific vocabulary, see Hotword Table Documentation | - |
| Max Recording Duration | No | Maximum duration for a single recording, range 10-600 seconds | 300 seconds |
| Sample Rate | No | Audio sample rate, supports 16000 Hz | 16000 Hz |
| Bit Depth | No | Audio bit depth | 16-bit |
| Channel Count | No | Audio channel count, mono | 1 |
(1) wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async | |||
Test API Key
Section titled “Test API Key”After configuration, it’s recommended to test whether the API Key is valid:
- Click Test API Key button
- The system will call the test interface to verify your configuration
- If the configuration is correct, a success message will be displayed
- If the configuration is incorrect, an error message will be displayed. Please check:
- Whether APP ID and Access Token are correct
- Whether network connection is normal
- Whether Access Token has expired
Find Voice Input Location
Section titled “Find Voice Input Location”In HagiCode, text input boxes with a microphone icon support voice input. These components are typically called VoiceTextArea.
Common voice input locations include:
- Message input boxes
- Text editing areas
- Various form fields that require text input
Start Voice Recognition
Section titled “Start Voice Recognition”- Find the input box with a microphone icon
- Click the microphone button
- The browser will request microphone permission
- Click Allow to authorize microphone access
Recording Status Description
Section titled “Recording Status Description”After authorization is successful, voice recognition will start automatically, and you can see:
- Waveform Animation: Dynamic sound wave waveform will display inside the microphone button, indicating recording is in progress
- Duration Display: Current recording duration will display below the button
- Real-time Recognition: Recognized text will temporarily display at the cursor position
Real-time Recognition Results Display
Section titled “Real-time Recognition Results Display”During recording, the speech recognition engine will convert your speech to text in real-time:
- Recognized text will be temporarily displayed in gray in the input box
- Recognition results will continuously update as you speak
- Supports Mandarin Chinese recognition with high accuracy
Stop Recognition
Section titled “Stop Recognition”To stop voice recognition, you can:
- Click the microphone button: Click the button again to stop recording
- Click the input box: Click other areas of the input box will also stop recording
After stopping, the final recognized text will be formally inserted into the input box, and you can continue editing or send.