Skip to content

hotwords

1 post with the tag “hotwords”

Guide to Implementing Hotword Support for Doubao Speech Recognition

Guide to Implementing Hotword Support for Doubao Speech Recognition

Section titled “Guide to Implementing Hotword Support for Doubao Speech Recognition”

This article explains in detail how to implement hotword support for Doubao speech recognition in the HagiCode project. By using both custom hotwords and platform hotword tables, you can significantly improve recognition accuracy for domain-specific vocabulary.

Speech recognition technology has developed for many years, yet one problem has consistently bothered developers. General-purpose speech recognition models can cover everyday language, but they often fall short when it comes to professional terminology, product names, and personal names. Think about it: a voice assistant in the medical field needs to accurately recognize terms like “hypertension,” “diabetes,” and “coronary heart disease”; a legal system needs to precisely capture terms such as “cause of action,” “defense,” and “burden of proof.” In these scenarios, a general-purpose model is trying its best, but that is often not enough.

We ran into the same challenge in the HagiCode project. As a multifunctional AI coding assistant, HagiCode needs to handle speech recognition for a wide range of technical terminology. However, the Doubao speech recognition API, in its default configuration, could not fully meet our accuracy requirements for specialized terms. It is not that Doubao is not good enough; rather, every domain has its own terminology system. After some research and technical exploration, we found that the Doubao speech recognition API actually provides hotword support. With a straightforward configuration, it can significantly improve the recognition accuracy of specific vocabulary. In a sense, once you tell it which words to pay attention to, it listens for them more carefully.

What this article shares is the complete solution we used in the HagiCode project to implement Doubao speech recognition hotwords. Both modes, custom hotwords and platform hotword tables, are available, and they can also be combined. With this solution, developers can flexibly configure hotwords based on business scenarios so the speech recognition system can better “recognize” professional, uncommon, yet critical vocabulary.

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project with a modern technology stack, designed to provide developers with an intelligent programming assistance experience. As a complex multilingual, multi-platform project, HagiCode needs to handle speech recognition scenarios involving many technical terms, which in turn drove our research into and implementation of the hotword feature.

If you are interested in HagiCode’s technical implementation, you can visit the GitHub repository for more details, or check out our official documentation for the complete installation and usage guide.

The Doubao speech recognition API provides two ways to configure hotwords, and each one has its own ideal use cases and advantages.

Custom hotword mode lets us pass hotword text directly through the corpus.context field. This approach is especially suitable for scenarios where you need to quickly configure a small number of hotwords, such as temporarily recognizing a product name or a person’s name. In HagiCode’s implementation, we parse the multi-line hotword text entered by the user into a list of strings, then format it into the context_data array required by the Doubao API. This approach is very direct: you simply tell the system which words to pay attention to, and it does exactly that.

Platform hotword table mode uses the corpus.boosting_table_id field to reference a preconfigured hotword table in the Doubao self-learning platform. This approach is suitable for scenarios where you need to manage a large number of hotwords. We can create and maintain hotword tables on the Doubao self-learning platform, then reference them by ID. For a project like HagiCode, where specialized terms need to be continuously updated and maintained, this mode offers much better manageability. Once the number of hotwords grows, having a centralized place to manage them is far better than entering them manually every time.

Interestingly, these two modes can also be used together. The Doubao API supports including both custom hotwords and a platform hotword table ID in the same request, with the combination strategy controlled by the combine_mode parameter. This flexibility allows HagiCode to handle a wide range of complex professional terminology recognition needs. Sometimes, combining multiple approaches produces better results.

In HagiCode’s frontend implementation, we defined a complete set of hotword configuration types and validation logic. The first part is the type definition:

export interface HotwordConfig {
contextText: string; // Multi-line hotword text
boostingTableId: string; // Doubao platform hotword table ID
combineMode: boolean; // Whether to use both together
}

This simple interface contains all configuration items for the hotword feature. Among them, contextText is the part users interact with most directly: we allow users to enter one hotword phrase per line, which is very intuitive. Asking users to enter one term per line is much easier than making them understand a complicated configuration format.

Next comes the validation function. Based on the Doubao API limitations, we defined strict validation rules: at most 100 lines of hotword text, up to 50 characters per line, and no more than 5000 characters in total; boosting_table_id can be at most 200 characters and may contain only letters, numbers, underscores, and hyphens. These limits are not arbitrary; they come directly from the official Doubao documentation. API limits are API limits, and we have to follow them.

export function validateContextText(contextText: string): HotwordValidationResult {
if (!contextText || contextText.trim().length === 0) {
return { isValid: true, errors: [] };
}
const lines = contextText.split('\n').filter(line => line.trim().length > 0);
const errors: string[] = [];
if (lines.length > 100) {
errors.push(`Hotword line count cannot exceed 100 lines; current count is ${lines.length}`);
}
const totalChars = contextText.length;
if (totalChars > 5000) {
errors.push(`Total hotword character count cannot exceed 5000; current count is ${totalChars}`);
}
for (let i = 0; i < lines.length; i++) {
if (lines[i].length > 50) {
errors.push(`Hotword on line ${i + 1} exceeds the 50-character limit`);
}
}
return { isValid: errors.length === 0, errors };
}
export function validateBoostingTableId(boostingTableId: string): HotwordValidationResult {
if (!boostingTableId || boostingTableId.trim().length === 0) {
return { isValid: true, errors: [] };
}
const errors: string[] = [];
if (boostingTableId.length > 200) {
errors.push(`boosting_table_id cannot exceed 200 characters; current count is ${boostingTableId.length}`);
}
if (!/^[a-zA-Z0-9_-]+$/.test(boostingTableId)) {
errors.push('boosting_table_id can contain only letters, numbers, underscores, and hyphens');
}
return { isValid: errors.length === 0, errors };
}

These validation functions run immediately when the user configures hotwords, ensuring that problems are caught as early as possible. From a user experience perspective, this kind of instant feedback is very important. It is always better for users to know what is wrong while they are typing rather than after they submit.

In HagiCode’s frontend implementation, we chose to use the browser’s localStorage to store hotword configuration. There were several considerations behind this design decision. First, hotword configuration is highly personalized, and different users may have different domain-specific needs. Second, this approach simplifies the backend implementation because it does not require extra database tables or API endpoints. Finally, after users configure it once in the browser, the settings can be loaded automatically on subsequent uses, which is very convenient. Put simply, it is the easiest approach.

const HOTWORD_STORAGE_KEYS = {
contextText: 'hotword-context-text',
boostingTableId: 'hotword-boosting-table-id',
combineMode: 'hotword-combine-mode',
} as const;
export const DEFAULT_HOTWORD_CONFIG: HotwordConfig = {
contextText: '',
boostingTableId: '',
combineMode: false,
};
// Load hotword configuration
export function loadHotwordConfig(): HotwordConfig {
const contextText = localStorage.getItem(HOTWORD_STORAGE_KEYS.contextText) || '';
const boostingTableId = localStorage.getItem(HOTWORD_STORAGE_KEYS.boostingTableId) || '';
const combineMode = localStorage.getItem(HOTWORD_STORAGE_KEYS.combineMode) === 'true';
return { contextText, boostingTableId, combineMode };
}
// Save hotword configuration
export function saveHotwordConfig(config: HotwordConfig): void {
localStorage.setItem(HOTWORD_STORAGE_KEYS.contextText, config.contextText);
localStorage.setItem(HOTWORD_STORAGE_KEYS.boostingTableId, config.boostingTableId);
localStorage.setItem(HOTWORD_STORAGE_KEYS.combineMode, String(config.combineMode));
}

The logic in this code is straightforward and clear. We read from localStorage when loading configuration, and write to localStorage when saving it. We also provide a default configuration so the system can still work properly when no configuration exists yet. There has to be a sensible default, after all.

In HagiCode’s backend implementation, we needed to add hotword-related properties to the SDK configuration class. Taking C# language characteristics and usage patterns into account, we used List<string> to store custom hotword contexts:

public class DoubaoVoiceConfig
{
/// <summary>
/// App ID
/// </summary>
public string AppId { get; set; } = string.Empty;
/// <summary>
/// Access token
/// </summary>
public string AccessToken { get; set; } = string.Empty;
/// <summary>
/// Service URL
/// </summary>
public string ServiceUrl { get; set; } = string.Empty;
/// <summary>
/// Custom hotword context list
/// </summary>
public List<string>? HotwordContexts { get; set; }
/// <summary>
/// Doubao platform hotword table ID
/// </summary>
public string? BoostingTableId { get; set; }
}

The design of this configuration class follows HagiCode’s usual concise style. HotwordContexts is a nullable list type, and BoostingTableId is a nullable string, so when there is no hotword configuration, these properties have no effect on the request at all. If you are not using the feature, it should stay out of the way.

Payload construction is the core of the entire hotword feature. Once we have hotword configuration, we need to format it into the JSON structure required by the Doubao API. This process happens before the SDK sends the request:

private void AddCorpusToRequest(Dictionary<string, object> request)
{
var corpus = new Dictionary<string, object>();
// Add custom hotwords
if (Config.HotwordContexts != null && Config.HotwordContexts.Count > 0)
{
corpus["context"] = new Dictionary<string, object>
{
["context_type"] = "dialog_ctx",
["context_data"] = Config.HotwordContexts
.Select(text => new Dictionary<string, object> { ["text"] = text })
.ToList()
};
}
// Add platform hotword table ID
if (!string.IsNullOrEmpty(Config.BoostingTableId))
{
corpus["boosting_table_id"] = Config.BoostingTableId;
}
// Add corpus to the request only when it is not empty
if (corpus.Count > 0)
{
request["corpus"] = corpus;
}
}

This code shows how to dynamically construct the corpus field based on configuration. The key point is that we add the corpus field only when hotword configuration actually exists. This design ensures backward compatibility: when no hotwords are configured, the request structure remains exactly the same as before. Backward compatibility matters; adding a feature should not disrupt existing logic.

Between the frontend and backend, hotword parameters are passed through WebSocket control messages. HagiCode is designed so that when the frontend starts recording, it loads the hotword configuration from localStorage and sends it to the backend through a WebSocket message.

const controlMessage = {
type: 'control',
payload: {
command: 'StartRecognition',
contextText: '高血压\n糖尿病\n冠心病',
boosting_table_id: 'medical_table',
combineMode: false
}
};

There is one detail to note here: the frontend passes multi-line text separated by newline characters, and the backend needs to parse it. The backend WebSocket handler parses these parameters and passes them to the SDK:

private async Task HandleControlMessageAsync(
string connectionId,
DoubaoSession session,
ControlMessage message)
{
if (message.Payload is SessionControlRequest controlRequest)
{
// Parse hotword parameters
string? contextText = controlRequest.ContextText;
string? boostingTableId = controlRequest.BoostingTableId;
bool? combineMode = controlRequest.CombineMode;
// Parse multi-line text into a hotword list
if (!string.IsNullOrEmpty(contextText))
{
var hotwords = contextText
.Split('\n', StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.Where(s => s.Length > 0)
.ToList();
session.HotwordContexts = hotwords;
}
session.BoostingTableId = boostingTableId;
}
}

With this design, passing hotword configuration from frontend to backend becomes clear and efficient. There is nothing especially mysterious about it; the data is simply passed through layer by layer.

In real usage, configuring custom hotwords is very simple. Open the speech recognition settings page in HagiCode and find the “Hotword Configuration” section. In the “Custom Hotword Text” input box, enter one hotword phrase per line.

For example, if you are developing a medical-related application, you could configure it like this:

高血压
糖尿病
冠心病
心绞痛
心肌梗死
心力衰竭

After you save the configuration, these hotwords are automatically passed to the Doubao API every time speech recognition starts. In our tests, once hotwords were configured, the recognition accuracy for related professional terms improved noticeably. The improvement is real, and clearly better than before.

If you need to manage a large number of hotwords, or if the hotwords need frequent updates, the platform hotword table mode is a better fit. First, create a hotword table on the Doubao self-learning platform and obtain the generated boosting_table_id, then enter this ID on the HagiCode settings page.

The Doubao self-learning platform provides capabilities such as bulk import and categorized management for hotwords, which is very practical for teams that need to manage large sets of specialized terminology. By managing hotwords on the platform, you can maintain them centrally and roll out updates consistently. Once the hotword list becomes large, having a single place to manage it is much more practical than manual entry every time.

In some complex scenarios, you may need to use both custom hotwords and a platform hotword table at the same time. In that case, simply configure both in HagiCode and enable the “Combination Mode” switch.

In combination mode, the Doubao API considers both hotword sources at the same time, so recognition accuracy is usually higher than using either source alone. However, it is worth noting that combination mode increases request complexity, so it is best to decide whether to enable it after practical testing. More complexity is only worth it if the real-world results justify it.

Integrating the hotword feature into the HagiCode project is very straightforward. Here are some commonly used code snippets:

import {
loadHotwordConfig,
saveHotwordConfig,
validateHotwordConfig,
parseContextText,
getEffectiveHotwordMode,
type HotwordConfig
} from '@/types/hotword';
// Load and validate configuration
const config = loadHotwordConfig();
const validation = validateHotwordConfig(config);
if (!validation.isValid) {
console.error('Hotword configuration validation failed:', validation.errors);
return;
}
// Parse hotword text
const hotwords = parseContextText(config.contextText);
console.log('Parsed hotwords:', hotwords);
// Get effective hotword mode
const mode = getEffectiveHotwordMode(config);
console.log('Current hotword mode:', mode);

Backend usage is similarly concise:

var config = new DoubaoVoiceConfig
{
AppId = "your_app_id",
AccessToken = "your_access_token",
ServiceUrl = "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async",
// Configure custom hotwords
HotwordContexts = new List<string>
{
"高血压",
"糖尿病",
"冠心病"
},
// Configure platform hotword table
BoostingTableId = "medical_table_v1"
};
var client = new DoubaoVoiceClient(config, logger);
await client.ConnectAsync();
await client.SendFullClientRequest();

There are several points that deserve special attention when implementing and using the hotword feature.

First is the character limit. The Doubao API has strict restrictions on hotwords, including line count, characters per line, and total character count. If any limit is exceeded, the API returns an error. In HagiCode’s frontend implementation, we check these constraints during user input through validation functions, which prevents invalid configurations from being sent to the backend. Catching problems early is always better than waiting for the API to fail.

Second is the format of boosting_table_id. This field allows only letters, numbers, underscores, and hyphens, and it cannot contain spaces or other special characters. When creating a hotword table on the Doubao self-learning platform, be sure to follow the naming rules. That kind of strict format validation is common for APIs.

Third is backward compatibility. Hotword parameters are entirely optional. If no hotwords are configured, the system behaves exactly as it did before. This design ensures that existing users are not affected in any way, and it also makes gradual migration and upgrades easier. Adding a feature should not disrupt the previous logic.

Finally, there is error handling. When hotword configuration is invalid, the Doubao API returns corresponding error messages. HagiCode’s implementation records detailed logs to help developers troubleshoot issues. At the same time, the frontend displays validation errors in the UI to help users correct the configuration. Good error handling naturally leads to a better user experience.

Through this article, we have provided a detailed introduction to the complete solution for implementing Doubao speech recognition hotwords in the HagiCode project. This solution covers the entire process from requirement analysis and technical selection to code implementation, giving developers a practical example they can use for reference.

The key points can be summarized as follows. First, the Doubao API supports both custom hotwords and platform hotword tables, and they can be used independently or in combination. Second, the frontend uses localStorage to store configuration in a simple and efficient way. Third, the backend passes hotword parameters by dynamically constructing the corpus field, preserving strong backward compatibility. Fourth, comprehensive validation logic ensures configuration correctness and avoids invalid requests. Overall, the solution is not complicated; it simply follows the API requirements carefully.

Implementing the hotword feature further strengthens HagiCode’s capabilities in the speech recognition domain. By flexibly configuring business-related professional terms, developers can help the speech recognition system better understand content from specific domains and therefore provide more accurate services. Ultimately, technology should serve real business needs, and solving practical problems is what matters most.

If you found this article helpful, feel free to give HagiCode a Star on GitHub. Your support motivates us to keep sharing technical practice and experience. In the end, writing and sharing technical content that helps others is a pleasure in itself.


Thank you for reading. If you found this article useful, click the like button below 👍 so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and positions.