Doubao Voice Input Guide

Overview

What is Doubao Voice Input

Doubao Voice Input is a deeply integrated speech recognition feature in HagiCode that utilizes the ByteDance Doubao Open Platform’s speech recognition service. This feature allows you to quickly convert your speech to text by speaking, enabling you to work hands-free and complete various operations with voice. Through deep integration, HagiCode leverages the current project’s context information to provide more accurate recognition of domain-specific vocabulary and technical terms, significantly improving recognition accuracy in technical discussions.

Key Use Cases

In the HagiCode platform, Doubao Voice Input is particularly suitable for:

Quickly inputting proposals: Create and submit proposals by simply speaking, significantly improving efficiency without typing
Providing comments: Add comments and feedback during code reviews or document reviews with voice input
Replying to messages: Quickly respond to messages in conversations and discussions without typing
Long-form content creation: Quickly generate project documentation, technical specifications, meeting notes, and other long-form content

Why Choose Doubao Voice Input

Deep Integration with Context Awareness: HagiCode deeply integrates with Doubao speech recognition, automatically leveraging current project context (such as code structure, technology stack, domain terminology) to provide precise domain vocabulary recognition, significantly improving technical term recognition accuracy
Free Trial Hours: Doubao Platform provides 20 hours of free recognition time for new users to experience voice input with zero barrier
Extremely Fast: Real-time recognition, see text as you speak with no waiting
High Accuracy: Powered by Doubao’s advanced speech recognition models combined with project context for precise and reliable results
Seamless Integration: Directly integrated into HagiCode’s message input box, no need to switch applications
Easy to Use: Simply click the microphone to start, intuitive and straightforward

Quick Start

Prerequisites

Before using Doubao Voice Input, you need to:

Get a Doubao Open Platform Account
- Visit Doubao Speech Recognition Console
- Register or log in to your account
Create an Application and Get Credentials
- Create a speech recognition application on the platform
- Get your APP ID and Access Token
Ensure Network Connection
- Speech recognition service requires network connection
- Ensure your device can access the Doubao API service

Basic Usage Flow

Here are the basic steps to use Doubao Voice Input:

Get APP ID and Access Token
Configure voice recognition in product
Test API Key
Find voice input box
Click microphone button
Authorize microphone permission
Start recording
View recognition results in real-time
Click to stop recording
Text insertion complete

Technical Requirements

HTTP Environment

Doubao Voice Input requires running in an HTTP/HTTPS environment.

Environment	Support Status	Description
localhost	✓	Local development environment (http://localhost or http://127.0.0.1)
HTTPS Remote Server	✓	Web applications deployed to public networks with HTTPS
HTTP Remote Server	✗	Web applications deployed to public networks with HTTP
file:// protocol	✗	Directly opening HTML files is not supported

Desktop Version

HagiCode’s Desktop version has a built-in local HTTP environment and fully supports voice input. You can use speech recognition directly without additional configuration.

Host Mode

Local host mode supports voice input:

Both HTTP and HTTPS are supported when using localhost or 127.0.0.1
HTTPS is required when deployed to public networks

Configuration Steps

Get APP ID

Visit Doubao Speech Recognition Console
Log in or register an account
Go to the console and create a new speech recognition application
In the application details page, find and copy the APP ID

Get Access Token

In the Doubao Open Platform console
Go to your speech recognition application
Find the API key management area
Generate or copy the Access Token

Configure in Product

Open HagiCode application
Go to Settings → Voice Recognition Settings
Fill in the following information in the configuration form:
- Provider: Select doubao (Doubao)
- APP ID: Paste the APP ID you got from Doubao platform
- Access Token: Paste the Access Token you got from Doubao platform
(Optional) Adjust other configuration parameters as needed
Click Test API Key button to verify the configuration
After successful verification, the configuration is automatically saved to browser local storage

Configuration Field Description

Field	Required	Description	Default
Provider	Yes	Speech recognition service provider	doubao
APP ID	Yes	Application unique identifier, obtained from Doubao Open Platform	-
Access Token	Yes	Authentication access token, obtained from Doubao Open Platform	-
Service URL	No	API service address, usually use default value	(1)
Hotword Table ID	No	Used to improve recognition accuracy for specific vocabulary, see Hotword Table Documentation	-
Max Recording Duration	No	Maximum duration for a single recording, range 10-600 seconds	300 seconds
Sample Rate	No	Audio sample rate, supports 16000 Hz	16000 Hz
Bit Depth	No	Audio bit depth	16-bit
Channel Count	No	Audio channel count, mono	1
(1) `wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async`

Test API Key

After configuration, it’s recommended to test whether the API Key is valid:

Click Test API Key button
The system will call the test interface to verify your configuration
If the configuration is correct, a success message will be displayed
If the configuration is incorrect, an error message will be displayed. Please check:
- Whether APP ID and Access Token are correct
- Whether network connection is normal
- Whether Access Token has expired

Usage

Find Voice Input Location

In HagiCode, text input boxes with a microphone icon support voice input. These components are typically called VoiceTextArea.

Common voice input locations include:

Message input boxes
Text editing areas
Various form fields that require text input

Start Voice Recognition

Find the input box with a microphone icon
Click the microphone button
The browser will request microphone permission
Click Allow to authorize microphone access

Recording Status Description

After authorization is successful, voice recognition will start automatically, and you can see:

Waveform Animation: Dynamic sound wave waveform will display inside the microphone button, indicating recording is in progress
Duration Display: Current recording duration will display below the button
Real-time Recognition: Recognized text will temporarily display at the cursor position

Real-time Recognition Results Display

During recording, the speech recognition engine will convert your speech to text in real-time:

Recognized text will be temporarily displayed in gray in the input box
Recognition results will continuously update as you speak
Supports Mandarin Chinese recognition with high accuracy

Stop Recognition

To stop voice recognition, you can:

Click the microphone button: Click the button again to stop recording
Click the input box: Click other areas of the input box will also stop recording

After stopping, the final recognized text will be formally inserted into the input box, and you can continue editing or send.