Skip to content

Blog

Running AI CLI Tools in Docker Containers: A Practical Guide to User Isolation and Persistent Volumes

Running AI CLI Tools in Docker Containers: A Practical Guide to User Isolation and Persistent Volumes

Section titled “Running AI CLI Tools in Docker Containers: A Practical Guide to User Isolation and Persistent Volumes”

Integrating AI coding tools like Claude Code, Codex, and OpenCode into containerized environments sounds simple, but there are hidden complexities everywhere. This article takes a deep dive into how the HagiCode project solves core challenges in Docker deployments, including user permissions, configuration persistence, and version management, so you can avoid the common pitfalls.

When we decided to run AI coding CLI tools inside Docker containers, the most intuitive thought was probably: “Aren’t containers just root? Why not install everything directly and call it done?” In reality, that seemingly simple idea hides several core problems that must be solved.

First, security restrictions are the first hurdle. Take Claude CLI as an example: it explicitly forbids running as the root user. This is a mandatory security check, and if root is detected, it refuses to start. You might think, can’t I just switch users with the USER directive? It is not that simple. There is still a mapping problem between the non-root user inside the container and the user permissions on the host machine.

Second, state persistence is the second trap. Claude Code requires login, Codex has its own configuration, and OpenCode also has a cache directory. If you have to reconfigure everything every time the container restarts, the whole idea of “automation” loses its meaning. We need these configurations to persist beyond the lifecycle of the container.

The third problem is permission consistency. Can processes inside the container access configuration files created by the host user? UID/GID mismatches often cause file permission errors, and this is extremely common in real deployments.

These problems may look independent, but in practice they are tightly connected. During HagiCode’s development, we gradually worked out a practical solution. Next, I will share the technical details and the lessons learned from those pitfalls.

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI-assisted programming platform that integrates multiple mainstream AI coding assistants, including Claude Code, Codex, and OpenCode. As a project that needs cross-platform and highly available deployment, HagiCode has to solve the full range of challenges involved in containerized deployment.

If you find the technical solution in this article valuable, that is a sign HagiCode has something real to offer in engineering practice. In that case, the HagiCode official website and GitHub repository are both worth following.

There is a common misunderstanding here: Docker containers run as root by default, so why not just install the tools as root? If you think that way, Claude CLI will quickly teach you otherwise.

Terminal window
# Run Claude CLI directly as root? No.
docker run --rm -it --user root myimage claude
# Output: Error: This command cannot be run as root user

This is a hard security restriction in Claude CLI. The reason is simple: these CLI tools read and write sensitive user configuration, including API tokens, local caches, and even scripts written by the user. Running them with root privileges introduces too much risk.

So the question becomes: how can we satisfy the CLI’s security requirements while keeping container management flexible? We need to change the way we think about it: instead of switching users at runtime, create a dedicated user during the image build stage.

Creating a dedicated user: more than just changing a name

Section titled “Creating a dedicated user: more than just changing a name”

You might think that adding a single USER line to the Dockerfile is enough. That is indeed the simplest approach, but it is not robust enough.

HagiCode’s approach is to create a hagicode user with UID 1000, which usually matches the default user on most host machines:

RUN groupadd -o -g 1000 hagicode && \
useradd -o -u 1000 -g 1000 -s /bin/bash -m hagicode && \
mkdir -p /home/hagicode/.claude && \
chown -R hagicode:hagicode /home/hagicode

But this only solves the built-in user inside the image. What if the host user is UID 1001? You still need to support dynamic mapping when the container starts.

docker-entrypoint.sh contains the key logic:

Terminal window
if [ -n "$PUID" ] && [ -n "$PGID" ]; then
if ! id hagicode >/dev/null 2>&1; then
groupadd -g "$PGID" hagicode
useradd -u "$PUID" -g "$PGID" -s /bin/bash -m hagicode
fi
fi

The advantage of this design is clear: use the default UID 1000 at image build time, then adjust dynamically at runtime through the PUID and PGID environment variables. No matter what UID the host user has, ownership of configuration files remains correct.

The design philosophy of persistent volumes

Section titled “The design philosophy of persistent volumes”

Each AI CLI tool has its own preferred configuration directory, so they need to be mapped one by one:

CLI ToolPath in ContainerNamed Volume
Claude/home/hagicode/.claudeclaude-data
Codex/home/hagicode/.codexcodex-data
OpenCode/home/hagicode/.config/opencodeopencode-config-data

Why use named volumes instead of bind mounts? Three reasons:

  1. Simpler management: Named volumes are managed automatically by Docker, so you do not need to create host directories manually.
  2. Permission isolation: The initial contents of the volumes are created by the user inside the container, avoiding permission conflicts with the host.
  3. Independent migration: Volumes can exist independently of containers, so data is not lost when images are upgraded.

docker-compose-builder-web automatically generates the corresponding volume configuration:

volumes:
claude-data:
codex-data:
opencode-config-data:
services:
hagicode:
volumes:
- claude-data:/home/hagicode/.claude
- codex-data:/home/hagicode/.codex
- opencode-config-data:/home/hagicode/.config/opencode
user: "${PUID:-1000}:${PGID:-1000}"

Pay attention to the user field here: PUID and PGID are injected through environment variables to ensure that processes inside the container run with an identity that matches the host user. This detail matters because permission issues are painful to debug once they appear.

Version management: baked-in versions with runtime overrides

Section titled “Version management: baked-in versions with runtime overrides”

Pinning Docker image versions is essential for reproducibility. But in real development, we often need to test a newer version or urgently fix a bug. If we had to rebuild the image every time, the workflow would be far too inefficient.

HagiCode’s strategy is fixed versions as the default, with runtime overrides as an extension mechanism. It is a pragmatic engineering compromise between stability and flexibility.

Dockerfile.template pins versions here:

USER hagicode
WORKDIR /home/hagicode
# Configure the global npm install path
RUN mkdir -p /home/hagicode/.npm-global && \
npm config set prefix '/home/hagicode/.npm-global'
# Install CLI tools using pinned versions
RUN npm install -g @anthropic-ai/claude-code@2.1.71 && \
npm install -g @openai/codex@0.112.0 && \
npm install -g opencode-ai@1.2.25 && \
npm cache clean --force

docker-entrypoint.sh supports runtime overrides:

Terminal window
install_cli_override_if_needed() {
local package_name="$2"
local override_version="$5"
if [ -n "$override_version" ]; then
gosu hagicode npm install -g "${package_name}@${override_version}"
fi
}
# Example usage
install_cli_override_if_needed "" "@anthropic-ai/claude-code" "" "" "${CLAUDE_CODE_CLI_VERSION}"

This lets you test a new version through an environment variable without rebuilding the image:

Terminal window
docker run -e CLAUDE_CODE_CLI_VERSION=2.2.0 myimage

This design is practical because nobody wants to rebuild an image every time they test a new feature.

In addition to configuring CLI tools manually, some scenarios require automatic configuration injection. The most typical example is an API token.

Terminal window
if [ -n "$ANTHROPIC_AUTH_TOKEN" ]; then
mkdir -p /home/hagicode/.claude
cat > /home/hagicode/.claude/settings.json <<EOF
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "${ANTHROPIC_AUTH_TOKEN}"
}
}
EOF
chown -R hagicode:hagicode /home/hagicode/.claude
fi

Two things matter here: pass sensitive information through environment variables instead of hard-coding it into the image, and make sure the ownership of configuration files is set correctly, otherwise the CLI tools will not be able to read them.

This is the easiest trap to fall into. The host user has UID 1001, while the container uses 1000, so files created on one side cannot be accessed on the other.

Terminal window
# Correct approach: make the container match the host user
docker run \
-e PUID=$(id -u) \
-e PGID=$(id -g) \
myimage

This issue is very common, and it can be frustrating the first time you run into it.

Configuration disappears after container restart

Section titled “Configuration disappears after container restart”

If you find yourself logging in again after every restart, check whether you forgot to mount a persistent volume:

volumes:
- claude-data:/home/hagicode/.claude

Nothing is more frustrating than carefully setting up a configuration only to see it disappear.

Do not run npm install -g directly inside a running container. The correct approaches are:

  1. Set an environment variable to trigger override installation.
  2. Or rebuild the image.
Terminal window
# Option 1: runtime override
docker run -e CLAUDE_CODE_CLI_VERSION=2.2.0 myimage
# Option 2: rebuild the image
docker build -t myimage:v2 .

There is more than one road to Rome, but some roads are smoother than others.

  • Pass API tokens through environment variables instead of writing them into the image.
  • Set configuration file permissions to 600.
  • Always run the application as a non-root user.
  • Update CLI versions regularly to fix security vulnerabilities.

Security is always important, but the real challenge is consistently enforcing it in practice.

If you want to support a new CLI tool in the future, there are only three steps:

  1. Dockerfile.template: add the installation step.
  2. docker-entrypoint.sh: add the version override logic.
  3. docker-compose-builder-web: add the persistent volume mapping.

This template-based design makes extension simple without changing the core logic.

Running AI CLI tools in Docker containers involves three core challenges: user permissions, configuration persistence, and version management. By combining dedicated users, named-volume isolation, and environment-variable-based overrides, the HagiCode project built a deployment architecture that is both secure and flexible.

Key design points:

  • User isolation: Create a dedicated user during the image build stage, with runtime support for dynamic PUID/PGID mapping.
  • Persistence strategy: Each CLI tool gets its own named volume, so restarts do not affect configuration.
  • Version flexibility: Fixed defaults ensure reproducibility, while runtime overrides provide room for testing.
  • Automated configuration: Sensitive configuration can be injected automatically through environment variables.

This solution has been running stably in the HagiCode project for some time, and I hope it offers useful reference points for developers with similar needs.

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Technical Analysis of the HagiCode Soul Platform: The Evolution from Emerging Needs to an Independent Platform

Technical Analysis of the HagiCode Soul Platform: The Evolution from Emerging Needs to an Independent Platform

Section titled “Technical Analysis of the HagiCode Soul Platform: The Evolution from Emerging Needs to an Independent Platform”

Writing technical articles is not really such a grand thing. It is mostly just a matter of organizing the pitfalls you have run into and the detours you have taken. We have all been inexperienced before, after all. This article takes an in-depth look at the design philosophy, architectural evolution, and core technical implementation of Soul in the HagiCode project, and explores how an independent platform can provide a more focused experience for creating and sharing Agent personas.

In the practice of building AI Agents, we often run into a question that looks simple but is actually crucial: how do we give different Agents stable and distinctive language styles and personality traits?

It is a slightly frustrating question, honestly. In the early Hero system of HagiCode, different Heroes (Agent instances) were mainly distinguished through profession settings and generic prompts. That approach came with some fairly obvious pain points, and anyone who has tried something similar has probably felt the same.

First, language style was difficult to keep consistent. The same “developer engineer” role might sound professional and rigorous one day, then casual and loose the next. This was not a model problem so much as the absence of an independent personality configuration layer to constrain and guide the output style.

Second, the sense of character was generally weak. When we described an Agent’s traits, we often had to rely on vague adjectives like “friendly,” “professional,” or “humorous,” without concrete language rules to support those abstract descriptions. Put plainly, it sounded nice in theory, but there was little to hold onto in practice.

Third, persona configurations were almost impossible to reuse. Suppose we carefully designed the speaking style of a “catgirl waitress” and wanted to reuse that expression style in another business scenario. In practice, we would almost have to configure it again from scratch. Sometimes you do not want to possess something beautiful, only reuse it a little… and even that turns out to be hard.

To solve those real problems, we introduced the Soul mechanism: an independent language style configuration layer separate from equipment and descriptions. Soul can define an Agent’s speaking habits, tone preferences, and wording boundaries, can be shared and reused across multiple Heroes, and can also be injected into the system prompt automatically on the first Session call.

Some people might say that this is just configuring a few prompts. But sometimes the real question is not whether something can be done; it is how to do it more elegantly. As Soul matured, we realized it had enough depth to develop independently. A dedicated Soul platform could let users focus on creating, sharing, and browsing interesting persona configurations without being distracted by the rest of the Hero system. That is how the standalone platform at soul.hagicode.com came into being.

HagiCode is an open-source AI coding assistant project built with a modern technology stack and aimed at giving developers a smooth intelligent programming experience. The Soul platform approach shared in this article comes from our own hands-on exploration while building HagiCode to solve the practical problem of Agent persona management. If you find the approach valuable, then it probably means we have accumulated a certain amount of engineering judgment in practice, and the HagiCode project itself may also be worth a closer look.

The Technical Architecture Evolution of the Soul Platform

Section titled “The Technical Architecture Evolution of the Soul Platform”

The Soul platform did not appear all at once. It went through three clear stages. The story began abruptly and concluded naturally.

Phase 1: Soul Configuration Embedded in Hero

Section titled “Phase 1: Soul Configuration Embedded in Hero”

The earliest Soul implementation existed as a functional module inside the Hero workspace. We added an independent SOUL editing area to the Hero UI, supporting both preset application and text fine-tuning.

Preset application let users choose from classic persona templates such as “professional developer engineer” and “catgirl waitress.” Text fine-tuning let users personalize those presets further. On the backend, the Hero entity gained a Soul field, with SoulCatalogId used to identify its source.

This stage solved the question of whether the capability existed at all, and it grew forward somewhat awkwardly, like anything young does. But as Soul content became richer, the limitations of an architecture tightly coupled with the Hero system started to show.

To provide a better Soul discovery and reuse experience, we built a SOUL Marketplace catalog page with support for browsing, searching, viewing details, and favoriting.

At this stage, we introduced a combinatorial design built from 50 main Catalogs (base roles) and 10 orthogonal rules (expression styles). The main Catalogs defined the Agent’s core persona, with abstract character settings such as “Mistport Traveler” and “Night Hunter.” The orthogonal rules defined how the Agent expressed itself, with language style traits such as “Concise & Professional” and “Verbose & Friendly.”

50 x 10 = 500 possible combinations gave users a wide configuration space for personas. It is not an overwhelming number, but it is not small either. There are many roads to Rome, after all; some are simply easier to walk than others. On the backend, the full SOUL catalog was generated through catalog-sources.json, while the frontend presented those catalog entries as an interactive card list.

The in-site Marketplace was a good transitional solution, but only that: transitional. It was still attached to the main system, and for users who only wanted Soul functionality, the access path remained too deep. Not everyone wants to take the scenic route just to do something simple.

Phase 3: Splitting into an Independent Platform

Section titled “Phase 3: Splitting into an Independent Platform”

In the end, we decided to move Soul into an independent repository (repos/soul). The Marketplace in the original main system was changed into an external jump guide, while the new platform adopted a Builder-first design philosophy: the homepage is the creation workspace by default, so users can start building their own persona configuration the moment they open the site.

The technology stack was also comprehensively upgraded in this stage: Vite 8 + React 19 + TypeScript 5.9, a unified design language through the shadcn/ui component system, and Tailwind CSS 4 theme variables. The improvement in frontend engineering laid a solid foundation for future feature iteration.

Everything faded away… no, actually, everything was only just beginning.

One core design principle of the Soul platform is local-first. That means the homepage must remain fully functional without a backend, and failure to load remote materials must never block page entry.

There is nothing especially miraculous about that. It simply means thinking one step further when designing the system. Using a local snapshot as the baseline and remote data as enhancement lets the product remain basically usable under any network condition. Concretely, we implemented a two-layer material architecture:

export async function loadBuilderMaterials(): Promise<BuilderMaterials> {
const localMaterials = createLocalMaterials(snapshot) // local baseline
try {
const inspirationFragments = await fetchMarketplaceItems() // remote enhancement
return { ...localMaterials, inspirationFragments, remoteState: "ready" }
} catch (error) {
return { ...localMaterials, remoteState: "fallback" } // graceful degradation
}
}

Local materials come from build-time snapshots of the main system documentation and include the complete data for 50 base roles and 10 expression rules. Remote materials come from Souls published by users and fetched through the Marketplace API. Together, they give users a full spectrum of materials, from official templates to community creativity. If that sounds dramatic, it really is just local plus remote.

The core data abstraction of Soul is the SoulFragment:

export type SoulFragment = {
fragmentId: string
group: "main-catalog" | "expression-rule" | "published-soul"
title: string
summary: string
content: string
keywords: string[]
localized?: Partial<Record<AppLocale, LocalizedFragmentContent>>
sourceRef: SoulFragmentSourceRef
meta: SoulFragmentMeta
}

The group field distinguishes fragment types: the main catalog defines the character core, orthogonal rules define expression style, and user-published Souls are marked as published-soul. The localized field supports multilingual presentation, allowing the same fragment to display different titles and descriptions in different language environments. Internationalization is something you really want to think about early, and in this case we actually did.

The Builder draft state encapsulates the user’s current editing state:

export type SoulBuilderDraft = {
draftId: string
name: string
selectedMainFragmentId: string | null
selectedRuleFragmentId: string | null
inspirationSoulId: string | null
mainSlotText: string
ruleSlotText: string
customPrompt: string
previewText: string
updatedAt: string
}

Each fragment selected in the editor has its content concatenated into the corresponding slot, forming the final preview text. mainSlotText corresponds to the main role content, ruleSlotText corresponds to the expression rule content, and customPrompt is the user’s additional instruction text.

Preview compilation is the core capability of Soul Builder. It assembles user-selected fragments and custom text into a system prompt that can be copied directly:

export function compilePreview(
draft: Pick<SoulBuilderDraft, "mainSlotText" | "ruleSlotText" | "customPrompt">,
fragments: {
mainFragment: SoulFragment | null
ruleFragment: SoulFragment | null
inspirationFragment: SoulFragment | null
}
): PreviewCompilation {
// Assembly logic: main role + expression rule + inspiration reference + custom content
}

The compilation result is shown in the central preview panel, where users can see the final effect in real time and copy it to the clipboard with one click. It sounds simple, and it is. But simple things are often the most useful.

Frontend state management in Soul Builder follows one important principle: clear separation of state boundaries. More specifically, drawer state is not persisted and does not write directly into the draft. Only explicit Builder actions trigger meaningful state changes.

// Domain state (useSoulBuilder)
export function useSoulBuilder() {
// Material loading and caching
// Slot aggregation and preview compilation
// Copy actions and feedback messages
// Locale-safe descriptors
}
// Presentation state (useHomeEditorState)
export function useHomeEditorState() {
// activeSlot, drawerSide, drawerOpen
// default focus behavior
}

That separation ensures both edit-state safety and responsive UI behavior. Opening and closing the drawer is purely a UI interaction and should not trigger complicated persistence logic. It may sound obvious, but it matters: UI state and business state should be separated clearly so interface interactions do not pollute the core data model.

Soul Builder uses a single-drawer mode: only one slot drawer may be open at a time. Clicking the mask, pressing the ESC key, or switching slots automatically closes the current drawer. This simplifies state management and also matches common drawer interaction patterns on mobile.

Closing the drawer does not clear the current editing content, so when users come back, their context is preserved. This kind of “lightweight” drawer design avoids interrupting the user’s flow. Nobody wants carefully written content to disappear because of one accidental click.

Internationalization is an important capability of the Soul platform. System copy fully supports bilingual switching, while user draft text is never rewritten when the language changes, because draft text is user-authored free input rather than system-translated content.

Official inspiration cards (Marketplace Souls) keep the upstream display name while also providing a best-effort English summary. For Souls with Chinese names, we generate English versions through predefined mapping rules:

// English name mapping for main roles
const mainNameEnglishMap = {
"雾港旅人": "Mistport Traveler",
"夜航猎手": "Night Hunter",
// ...
}
// English name mapping for orthogonal rules
const ruleNameEnglishMap = {
"简洁干练": "Concise & Professional",
"啰嗦亲切": "Verbose & Friendly",
// ...
}

The mapping table itself looks simple enough, but keeping it in good shape still takes care. There are 50 main roles and 10 orthogonal rules, which means 500 combinations in total. That is not huge, but it is enough to deserve respect.

Bulk generation of the Soul Catalog happens on the backend, where C# is used to automate the creation of 50 x 10 = 500 combinations:

foreach (var main in source.MainCatalogs)
{
foreach (var orthogonal in source.OrthogonalCatalogs)
{
var catalogId = $"soul-{main.Index:00}-{orthogonal.Index:00}";
var displayName = BuildNickname(main, orthogonal);
var soulSnapshot = BuildSoulSnapshot(main, orthogonal);
// Write to the database...
}
}

The nickname generation algorithm combines the main role name with the expression rule name to create imaginative Agent codenames:

private static readonly string[] MainHandleRoots = [
"雾港", "夜航", "零帧", "星渊", "霓虹", "断云", ...
];
private static readonly string[] OrthogonalHandleSuffixes = [
"旅人", "猎手", "术师", "行者", "星使", ...
];
// Combination examples: 雾港旅人, 夜航猎手, 零帧术师...

Soul snapshot assembly follows a fixed template format that combines the main role core, signature traits, expression rule core, and output constraints together:

private static string BuildSoulSnapshot(main, orthogonal) => string.Join('\n', [
$"你的人设内核来自「{main.Name}」:{main.Core}",
$"保持以下标志性语言特征:{main.Signature}",
$"你的表达规则来自「{orthogonal.Name}」:{orthogonal.Core}",
$"必须遵循这些输出约束:{orthogonal.Signature}"
]);

Template assembly may sound terribly dull, but without that sort of dull work, interesting products rarely appear.

After splitting Soul from the main system into an independent platform, one important challenge was handling existing user data. It is a familiar problem: splitting things apart is easy, migration is not. We adopted three safeguards:

Backward compatibility protection. Previously saved Hero SOUL snapshots remain visible, and historical snapshots can still be previewed even if they no longer have a Marketplace source ID. In other words, none of the user’s prior configurations are lost; only where they appear has changed.

Main system API deprecation. The in-site Marketplace API returns HTTP status 410 Gone together with a migration notice that guides users to soul.hagicode.com.

Hero SOUL form refactoring. A migration notice block was added to the Hero Soul editing area to clearly tell users that the Soul platform is now independent and to provide a one-click jump button:

HeroSoulForm.tsx
<div className="rounded-2xl border border-orange-200/70 bg-orange-50/80 p-4">
<div>{t('hero.soul.migrationTitle')}</div>
<p>{t('hero.soul.migrationDescription')}</p>
<Button onClick={onOpenSoulPlatform}>
{t('hero.soul.openSoulPlatformAction')}
</Button>
</div>

Looking back at the development of the Soul platform as a whole, there are a few practical lessons worth sharing. They are not grand principles, just things learned from real mistakes.

Local-first runtime assumptions. When designing features that depend on remote data, always assume the network may be unavailable. Using local snapshots as the baseline and remote data as enhancement ensures the product remains basically usable under any network condition.

Clear separation of state boundaries. UI state and business state should be distinguished clearly so interface interactions do not pollute the core data model. Drawer toggles are purely UI state and should not be mixed with draft persistence.

Design for internationalization early. If your product has multilingual requirements, it is best to think about them during the data model design phase. The localized field adds some structural complexity, but it greatly reduces the long-term maintenance cost of multilingual content.

Automate the material synchronization workflow. Local materials for the Soul platform come from the main system documentation. When upstream documentation changes, there needs to be a mechanism to sync it into frontend snapshots. We designed the npm run materials:sync script to automate that process and keep materials aligned with upstream.

Based on the current architecture, the Soul platform could move in several directions in the future. These are only tentative ideas, but perhaps they can be useful as a starting point.

Community sharing ecosystem. Support user uploads and sharing of custom Souls, with rating, commenting, and recommendation mechanisms so excellent Soul configurations can be discovered and reused by more people.

Multimodal expansion. Beyond text style, the platform could also support dimensions such as voice style configuration, emoji usage preferences, and code style and formatting rules. It sounds attractive in theory; implementation may tell a more complicated story.

Intelligent assistance. Automatically recommend Souls based on usage scenarios, support style transfer and fusion, and even run A/B tests on the real-world effectiveness of different Souls. There is no better way to know than to try.

Cross-platform synchronization. Support importing persona configurations from other AI platforms, provide a standardized Soul export format, and integrate with mainstream Agent frameworks.

This article shares the full evolution of the HagiCode Soul platform from its earliest emerging need to an independent platform. We discussed why a Soul mechanism is needed to solve Agent persona consistency, analyzed the three stages of architectural evolution (embedded configuration, in-site Marketplace, and independent platform), examined the core data model, state management, preview compilation, and internationalization design in depth, and summarized practical migration lessons.

The essence of Soul is an independent persona configuration layer separated from business logic. It makes the language style of AI Agents definable, reusable, and shareable. From a technical perspective, the design itself is not especially complicated, but the problem it solves is real and broadly relevant.

If you are also building AI Agent products, it may be worth asking whether your persona configuration solution is flexible enough. The Soul platform’s practical experience may offer a few useful ideas.

Perhaps one day you will run into a similar problem as well. If this article can help a little when that happens, that is probably enough.


If you found this article helpful, feel free to give the project a Star on GitHub. The public beta has already started, and you are welcome to install it and try it out.

Thank you for reading. If you found this article useful, likes, bookmarks, and shares are all appreciated. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Technical Analysis of the HagiCode Skill System: Building a Scalable AI Skill Management Platform

Technical Analysis of the HagiCode Skill System: Building a Scalable AI Skill Management Platform

Section titled “Technical Analysis of the HagiCode Skill System: Building a Scalable AI Skill Management Platform”

This article takes an in-depth look at the architecture and implementation of the Skill management system in the HagiCode project, covering the technical details behind four core capabilities: local global management, marketplace search, intelligent recommendations, and trusted provider management.

In the field of AI coding assistants, how to extend the boundaries of AI capabilities has always been a core question. Claude Code itself is already strong at code assistance, but different development teams and different technology stacks often need specialized capabilities for specific scenarios, such as handling Docker deployments, database optimization, or frontend component generation. That is exactly where a Skill system becomes especially important.

During the development of the HagiCode project, we ran into a similar challenge: how do we let Claude Code “learn” new professional skills like a person would, while still maintaining a solid user experience and good engineering maintainability? This problem is both hard and simple in its own way. Around that question, we designed and implemented a complete Skill management system.

This article walks through the technical architecture and core implementation of the system in detail. It is intended for developers interested in AI extensibility and command-line tool integration. It might be useful to you, or it might not, but at least it is written down now.

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project designed to help development teams improve engineering efficiency. The project’s stack includes ASP.NET Core, the Orleans distributed framework, a TanStack Start + React frontend, and the Skill management subsystem introduced in this article.

The GitHub repository is HagiCode-org/site. If you find the technical approach in this article valuable, feel free to give it a Star. More Stars tend to improve the mood, after all.

The Skill system uses a frontend-backend separated architecture. There is nothing especially mysterious about that.

Frontend uses TanStack Start + React to build the user interface, with Redux Toolkit managing state. The four main capabilities map directly to four Tab components: Local Skills, Skill Gallery, Intelligent Recommendations, and Trusted Providers. In the end, the design is mostly about making the user experience better.

Backend is based on ASP.NET Core + ABP Framework, using Orleans Grain for distributed state management. The online API client wraps the IOnlineApiClient interface to communicate with the remote skill catalog service.

The overall architectural principle is to separate command execution from business logic. Through the adapter pattern, the implementation details of npm/npx command execution are hidden inside independent modules. After all, nobody really wants command-line calls scattered all over the codebase.

Core Capability 1: Local Global Management

Section titled “Core Capability 1: Local Global Management”

Local global management is the most basic module. It is responsible for listing installed skills and supporting uninstall operations. There is nothing overly complicated here; it is mostly about doing the basics well.

The implementation lives in LocalSkillsTab.tsx and LocalSkillCommandAdapter.cs. The core idea is to wrap the npx skills command, parse its JSON output, and convert it into internal data structures. It sounds simple, and in practice it mostly is.

public async Task<IReadOnlyList<LocalSkillInventoryResponseDto>> GetLocalSkillsAsync(
CancellationToken cancellationToken = default)
{
var result = await _commandAdapter.ListGlobalSkillsAsync(cancellationToken);
return result.Skills.Select(skill => new LocalSkillInventoryResponseDto
{
Name = skill.Name,
Version = skill.Version,
Source = skill.Source,
InstalledPath = skill.InstalledPath,
Description = skill.Description
}).ToList();
}

The data flow is very clear: the frontend sends a request -> SkillGalleryAppService receives it -> LocalSkillCommandAdapter executes the npx command -> the JSON result is parsed -> a DTO is returned. Each step follows naturally from the previous one.

Skill uninstallation uses the npx skills remove -g <skillName> -y command, and the system automatically handles dependencies and cleanup. Installation metadata is stored in managed-install.json inside the skill directory, recording information such as install time and source version for later updates and auditing. Some things are simply worth recording.

Skill installation requires several coordinated steps. In truth, it is not especially complicated:

public async Task<SkillInstallResultDto> InstallAsync(
SkillInstallRequestDto request,
CancellationToken cancellationToken = default)
{
// 1. Normalize the installation reference
var normalized = _referenceNormalizer.Normalize(
request.SkillId,
request.Source,
request.SkillSlug,
request.Version);
// 2. Check prerequisites
await _prerequisiteChecker.CheckAsync(cancellationToken);
// 3. Acquire installation lock
using var installLock = await _lockProvider.AcquireAsync(normalized.SkillId);
// 4. Execute installation command
var result = await _installCommandRunner.ExecuteAsync(
new SkillInstallCommandExecutionRequest
{
Command = $"npx skills add {normalized.FullReference} -g -y",
Timeout = TimeSpan.FromMinutes(4)
},
cancellationToken);
// 5. Persist installation metadata
await _metadataStore.WriteAsync(normalized.SkillPath, request);
return new SkillInstallResultDto { Success = result.Success };
}

Several key design patterns are used here: the reference normalizer converts different input formats, such as tanweai/pua and @opencode/docker-skill, into a unified internal representation; the installation lock mechanism ensures only one installation operation can run for the same skill at a time; and streaming output pushes installation progress to the frontend in real time through Server-Sent Events, so users can watch terminal-like logs as they happen.

In the end, all of these patterns are there for one purpose: to keep the system simpler to use and maintain.

Marketplace search lets users discover and install skills from the community. One person’s ability is always limited; collective knowledge goes much further.

The search feature relies on the online API https://api.hagicode.com/v1/skills/search. To improve response speed, the system implements caching. Cache is a bit like memory: if you keep useful things around, you do not have to think so hard the next time.

private async Task<IReadOnlyList<SkillGallerySkillDto>> SearchCatalogAsync(
string query,
CancellationToken cancellationToken,
IReadOnlySet<string>? allowedSources = null)
{
var cacheKey = $"skill_search:{query}:{string.Join(",", allowedSources ?? Array.Empty<string>())}";
if (_memoryCache.TryGetValue(cacheKey, out var cached))
return (IReadOnlyList<SkillGallerySkillDto>)cached!;
var response = await _onlineApiClient.SearchAsync(
new SearchSkillsRequest
{
Query = query,
Limit = _options.LimitPerQuery,
},
cancellationToken);
var results = response.Skills
.Where(skill => allowedSources is null || allowedSources.Contains(skill.Source))
.Select(skill => new SkillGallerySkillDto { ... })
.ToList();
_memoryCache.Set(cacheKey, results, TimeSpan.FromMinutes(10));
return results;
}

Search results support filtering by trusted sources, so users only see skill sources they trust. Seed queries such as popular and recent are used to initialize the catalog, allowing users to see recommended popular skills the first time they open it. First impressions still matter.

Core Capability 3: Intelligent Recommendations

Section titled “Core Capability 3: Intelligent Recommendations”

Intelligent recommendations are the most complex part of the system. They can automatically recommend the most suitable skills based on the current project context. Complex as it is, it is still worth building.

The full recommendation flow is divided into five stages:

1. Build project context
2. AI generates search queries
3. Search the online catalog in parallel
4. AI ranks the candidates
5. Return the recommendation list

First, the system analyzes characteristics such as the project’s technology stack, programming languages, and domain structure to build a “project profile.” That profile is a bit like a resume, recording the key traits of the project.

Then an AI Grain is used to generate targeted search queries. This design is actually quite interesting: instead of directly asking the AI, “What skills should I recommend?”, we first ask it to think about “What search terms are likely to find relevant skills?” Sometimes the way you ask the question matters more than the answer itself:

var queryGeneration = await aiGrain.GenerateSkillRecommendationQueriesAsync(
projectContext, // Project context
locale, // User language preference
maxQueries, // Maximum number of queries
effectiveSearchHero); // AI model selection

Next, those search queries are executed in parallel to gather a candidate skill list. Parallel processing is, at the end of the day, just a way to save time.

Finally, another AI Grain ranks the candidate skills. This step considers factors such as skill relevance to the project, trust status, and user historical preferences:

var ranking = await aiGrain.RankSkillRecommendationsAsync(
projectContext,
candidates,
installedSkillNames,
locale,
maxRecommendations,
effectiveRankingHero);
response.Items = MergeRecommendations(projectContext, candidates, ranking, maxRecommendations);

AI models can respond slowly or become temporarily unavailable. Even the best systems stumble sometimes. For that reason, the system includes a deterministic fallback mechanism: when the AI service is unavailable, it uses a rule-based heuristic algorithm to generate recommendations, such as inferring likely required skills from dependencies in package.json.

Put plainly, this fallback mechanism is simply a backup plan for the system.

Core Capability 4: Trusted Provider Management

Section titled “Core Capability 4: Trusted Provider Management”

Trusted provider management allows users to control which skill sources are considered trustworthy. Trust is still something users should be able to define for themselves.

Trusted providers support two matching rules: exact match (exact) and prefix match (prefix).

public static TrustedSkillProviderResolutionSnapshot Resolve(
TrustedSkillProviderSnapshot snapshot,
string source)
{
var normalizedSource = Normalize(source);
foreach (var entry in snapshot.Entries.OrderBy(e => e.SortOrder))
{
if (!entry.IsEnabled) continue;
foreach (var rule in entry.MatchRules)
{
bool isMatch = rule.MatchType switch
{
TrustedSkillProviderMatchRuleType.Exact
=> string.Equals(normalizedSource, Normalize(rule.Value),
StringComparison.OrdinalIgnoreCase),
TrustedSkillProviderMatchRuleType.Prefix
=> normalizedSource.StartsWith(Normalize(rule.Value) + "/",
StringComparison.OrdinalIgnoreCase),
_ => false
};
if (isMatch)
return new TrustedSkillProviderResolutionSnapshot
{
IsTrustedSource = true,
ProviderId = entry.ProviderId,
DisplayName = entry.DisplayName
};
}
}
return new TrustedSkillProviderResolutionSnapshot { IsTrustedSource = false };
}

Built-in trusted providers include well-known organizations and projects such as Vercel, Azure, anthropics, Microsoft, and browser-use. Custom providers can be added through configuration files by specifying a provider ID, display name, badge label, matching rules, and more. The world is large enough that only trusting a few built-ins would never be enough.

Trusted configuration is persisted using an Orleans Grain:

public class TrustedSkillProviderGrain : Grain<TrustedSkillProviderState>,
ITrustedSkillProviderGrain
{
public async Task UpdateConfigurationAsync(TrustedSkillProviderSnapshot snapshot)
{
State.Snapshot = snapshot;
await WriteStateAsync();
}
public Task<TrustedSkillProviderSnapshot> GetConfigurationAsync()
{
return Task.FromResult(State.Snapshot);
}
}

The benefit of this approach is that configuration changes are automatically synchronized across all nodes, without any need to refresh caches manually. Automation is, ultimately, about letting people worry less.

The Skill system needs to execute various npx commands. If that logic were scattered everywhere, the code would quickly become difficult to maintain. That is why we designed an adapter interface. Design patterns, in the end, exist to make code easier to maintain:

public interface ISkillInstallCommandRunner
{
Task<SkillInstallCommandExecutionResult> ExecuteAsync(
SkillInstallCommandExecutionRequest request,
CancellationToken cancellationToken = default);
}

Different commands have different executor implementations, but all of them implement the same interface, making testing and replacement straightforward.

Installation progress is pushed to the frontend in real time through Server-Sent Events:

public async Task InstallWithProgressAsync(
SkillInstallRequestDto request,
IServerStreamWriter<SkillInstallProgressEventDto> stream,
CancellationToken cancellationToken)
{
var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = "npx",
Arguments = $"skills add {request.FullReference} -g -y",
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false
}
};
process.OutputDataReceived += async (sender, e) =>
{
await stream.WriteAsync(new SkillInstallProgressEventDto
{
EventType = "output",
Data = e.Data ?? string.Empty
});
};
process.Start();
process.BeginOutputReadLine();
await process.WaitForExitAsync(cancellationToken);
}

On the frontend, users can see terminal-like output in real time, which makes the experience very intuitive. Real-time feedback helps people feel at ease.

Take installing the pua skill as an example (it is a popular community skill):

  1. Open the Skills drawer and switch to the Skill Gallery tab
  2. Enter pua in the search box
  3. Click the search result to view the skill details
  4. Click the Install button
  5. Switch to the Local Skills tab to confirm the installation succeeded

The installation command is npx skills add tanweai/pua -g -y, and the system handles all the details automatically. There are not really that many steps once you take them one by one.

If your team has its own skill repository, you can add it as a trusted source:

providerId: "my-team"
displayName: "My Team Skills"
badgeLabel: "MyTeam"
isEnabled: true
sortOrder: 100
matchRules:
- matchType: "prefix"
value: "my-team/"
- matchType: "exact"
value: "my-team/special-skill"

This way, all skills from your team will display a trusted badge, making users more comfortable installing them. Labels and signals do help people feel more confident.

Creating a custom skill requires the following structure:

my-skill/
├── SKILL.md # Skill metadata (YAML front matter)
├── index.ts # Skill entry point
├── agents/ # Supported agent configuration
└── references/ # Reference resources

An example SKILL.md format:

---
name: my-skill
description: A brief description of what this skill does
---
# My Skill
Detailed documentation...
  1. Network requirements: skill search and installation require access to api.hagicode.com and the npm registry
  2. Node.js version: Node.js 18 or later is recommended
  3. Permission requirements: global npm installation permissions are required
  4. Concurrency control: only one install or uninstall operation can run for the same skill at a time
  5. Timeout settings: the default timeout for installation is 4 minutes, but complex scenarios may require adjustment

These notes exist, ultimately, to help things go smoothly.

This article introduced the complete implementation of the Skill management system in the HagiCode project. Through a frontend-backend separated architecture, the adapter pattern, Orleans-based distributed state management, and related techniques, the system delivers:

  • Local global management: a unified skill management interface built by wrapping npx skills commands
  • Marketplace search: rapid discovery of community skills through the online API and caching mechanisms
  • Intelligent recommendations: AI-powered skill recommendations based on project context
  • Trust management: a flexible configuration system that lets users control trust boundaries

This design approach is not only applicable to Skill management. It is also useful as a reference for any scenario that needs to integrate command-line tools while balancing local storage and online services.

If this article helped you, feel free to give us a Star on GitHub: github.com/HagiCode-org/site. You can also visit the official site to learn more: hagicode.com.

You may think this system is well designed, or you may not. Either way, that is fine. Once code is written, someone will use it, and someone will not.

Thank you for reading. If you found this article useful, feel free to like, bookmark, and share it. This content was produced with AI-assisted collaboration, and the final content was reviewed and approved by the author.

I Might Be Replaced by an Agent, So I Ran the Numbers

I Might Be Replaced by an Agent, So I Ran the Numbers

Section titled “I Might Be Replaced by an Agent, So I Ran the Numbers”

Quantifying AI replacement risk with data: a deep dive into how the HagiCode team uses six core formulas to redefine how knowledge workers evaluate their competitiveness.

With AI technology advancing at breakneck speed, every knowledge worker is facing an urgent question: In the AI era, will I be replaced?

It sounds a little alarmist, but plenty of people are quietly uneasy about it. You just finish learning a new framework, and AI is already telling you your role might be automated away; you finally master a language, and then discover that someone using AI is producing three times as much as you. If you are reading this, you have probably felt at least some of that anxiety.

And honestly, that anxiety is not irrational. No one wants to admit that the skills they spent years building could be outperformed by a single ChatGPT session. Still, anxiety is one thing; life goes on.

Traditional discussions usually start from the question of “what AI can do,” but that framing misses two critical dimensions:

  1. The business perspective: whether a company is willing to equip an employee with AI tools depends on whether AI costs make economic sense relative to labor costs. It is not enough for AI to be capable of replacing a role; the company also has to run the numbers. Capital is not a charity, and every dollar has to count.
  2. The efficiency perspective: AI-driven productivity gains need to be quantified instead of being reduced to the vague claim that “using AI makes you stronger.” Maybe your efficiency doubles with AI, but someone else gets a 5x improvement. That gap matters. It is like school: everyone sits in the same class, but some score 90 while others barely pass.

So the real question is: how do we turn this fuzzy anxiety into measurable indicators?

It is always better to know where you stand than to fumble around in the dark. That is what we are talking about today: the design logic behind the AI productivity calculator built by the HagiCode team.

So I made a site: https://cost.hagicode.com.

HagiCode is an open-source AI coding assistant project built to help developers code more efficiently.

What is interesting is that while building their own product, the HagiCode team accumulated a lot of hands-on experience around AI productivity. They realized that the value of an AI tool cannot be assessed in isolation from a company’s employment costs. Based on that insight, the team decided to build a productivity calculator to help knowledge workers evaluate their competitiveness in the AI era more scientifically.

Plenty of people could build something like this. The difference is that very few are willing to do it seriously. The HagiCode team spent time on it as a way of giving something back to the developer community.

The design shared in this article is a summary of HagiCode’s experience applying AI in real engineering work. If you find this evaluation framework valuable, it suggests that HagiCode really does have something to offer in engineering practice. In that case, the HagiCode project itself is also worth paying attention to.

A company’s real cost for an employee is far more than salary alone. A lot of people only realize this when changing jobs: you negotiate a monthly salary of 20,000 CNY, but take home only 14,000. On the company side, the spend is not just 20,000 either. Social insurance, housing fund contributions, training, and recruiting costs all have to be included.

According to the implementation in calculate-ai-risk.ts:

Total annual employment cost = Annual salary x (1 + city coefficient) + Annual salary / 12

The city coefficient reflects differences in hiring and retention costs across cities:

City tierRepresentative citiesCoefficient
Tier 1Beijing / Shanghai / Shenzhen / Guangzhou0.4
New Tier 1Hangzhou / Chengdu / Suzhou / Nanjing0.3
Tier 2Wuhan / Xi’an / Tianjin / Zhengzhou0.2
OtherYichang / Luoyang and others0.1

A Tier 1 city coefficient of 0.4 means the company needs to pay roughly 40% extra in recruiting, training, insurance, and similar overhead. The all-in cost of hiring someone in Beijing really is much higher than in a Tier 2 city.

The cost of living in major cities is high too. You could think of it as another version of a “drifter tax.”

Different AI models have separate input and output pricing, and the gap can be huge. In coding scenarios, the input/output ratio is roughly 3:1. You might give the AI a block of code to review, while its analysis is usually much shorter than the input.

The blended unit price formula is:

Blended unit price = (input-output ratio x input price + output price) / (input-output ratio + 1)

Take GPT-5 as an example:

  • Input: $2.5/1M tokens
  • Output: $15/1M tokens
  • Blended = (3 x 2.5 + 15) / 4 = $5.625/1M tokens

For models priced in USD, you also need to convert using an exchange rate. The HagiCode team currently sets that rate to 7.25 and updates it as the market changes.

Exchange rates are like the stock market: no one can predict them exactly. You just follow the trend.

Average daily AI cost = Average daily token demand (M) x blended unit price (CNY/1M)
Annual AI cost = Average daily AI cost x 264 working days

264 = 22 days/month x 12 months, which is the number of working days in a standard year. Why not use 365? Because you have to account for weekends, holidays, sick leave, and so on.

We are not robots, after all. AI may not need rest, but people still need room to breathe.

4. The Core Innovation: Equivalent Headcount

Section titled “4. The Core Innovation: Equivalent Headcount”

This is the heart of the whole evaluation system, and also where the HagiCode team’s insight shows most clearly.

Affordable workflow count = Total annual employment cost / Annual AI cost
Affordability ratio = min(affordable workflow count, 1)
Equivalent headcount = 1 + (productivity multiplier - 1) x affordability ratio

That formula looks a little abstract, so let me unpack it.

The traditional view would simply say, “your efficiency improved by 2x.” But this formula introduces a crucial constraint: is the company’s AI budget sustainable?

For example, Xiao Ming improves his efficiency by 3x, but his annual AI usage costs 300,000 CNY while the company is only paying him a salary of 200,000 CNY. In that case, his personal productivity may be impressive, but it is not sustainable. No company is going to lose money just to keep him operating at peak efficiency.

That is what the affordability ratio means. If the company can only afford 0.5 of an AI workflow, then Xiao Ming’s equivalent headcount is 1 + (3 - 1) x 0.5 = 2 people, not 3.

The key insight: what matters is not just how large your productivity multiplier is, but whether the company can afford the AI investment required to sustain that multiplier.

The logic is simple once you see it. Most people just do not think from that angle. We are used to looking at the world from our own side, not from the boss’s side, where money does not come out of thin air either.

AI cost ratio = Annual AI cost / Total annual employment cost
Productivity gain = Productivity multiplier - 1
Cost-benefit ratio = Productivity gain / AI cost ratio
  • Cost-benefit ratio < 1: the AI investment is not worth it; the productivity gain does not justify the cost
  • Cost-benefit ratio 1-2: barely worth it
  • Cost-benefit ratio > 2: high return, strongly recommended

This metric is especially useful for managers because it helps them quickly judge whether a given role is worth equipping with AI tools.

At the end of the day, ROI is what matters. You can talk about higher efficiency all you want, but if the cost explodes, no one is going to buy the argument.

Risk is categorized according to equivalent headcount:

Equivalent headcountRisk levelConclusion
>= 2.0High riskIf your coworkers gain the same conditions, they become a serious threat to you
1.5 - 2.0WarningCoworkers have begun to build a clear productivity advantage
< 1.5SafeFor now, you can still maintain a gap

After seeing that table, you probably have a rough sense of where you stand. Still, there is no point in panicking. Anxiety does not solve problems. It is better to think about how to raise your own productivity multiplier.

To make the results more fun, the calculator introduces a system of seven special titles. These titles are persisted through localStorage, allowing users to unlock and display their own “achievements.”

Title IDNameUnlock condition
craftsman-spiritCraftsman SpiritAverage daily token usage = 0
prompt-alchemistPrompt AlchemistDaily tokens <= 20M and productivity multiplier >= 6
all-in-operatorAll-In OperatorDaily tokens >= 150M and productivity multiplier >= 3
minimalist-runnerMinimalist RunnerDaily tokens <= 5M and productivity multiplier >= 2
cost-tamerCost TamerCost-benefit ratio >= 2.5 and AI cost ratio <= 15%
danger-oracleDanger OracleEquivalent headcount >= 2.5 or entering the high-risk zone
budget-coordinatorBudget CoordinatorAffordable workflow count >= 8

Each title also carries a hidden meaning:

TitleHidden meaning
Craftsman SpiritYou can still do fine without AI, but you need unique competitive strengths
Prompt AlchemistYou achieve high output with very few tokens; a classic power-user profile
All-In OperatorHigh input, high output; suitable for high-frequency scenarios
Minimalist RunnerLightweight AI usage; suitable for light-assistance scenarios
Cost TamerExtremely high ROI; the kind of employee companies love
Danger OracleYou are already, or soon will be, in a high-risk group
Budget CoordinatorYou can operate multiple AI workflows at the same time

Gamification is really just a way to make dry data a little more entertaining. After all, who does not like collecting achievements? Like badges in a game, they may not have much practical value, but they still feel good to earn.

Data Sources: An Authoritative Pricing System

Section titled “Data Sources: An Authoritative Pricing System”

The calculator’s pricing data comes from multiple official API pricing pages to keep the results authoritative and up to date:

This data is updated regularly, with the latest refresh on 2026-03-19.

Data only matters when it is current. Once it is outdated, it stops being useful. On that front, the HagiCode team has been quite responsible about keeping things updated.

Suppose you are a developer in Beijing with an annual salary of 400,000 CNY, using Claude Sonnet 4.6, consuming 50M tokens per day on average, and estimating that AI gives you a 3x productivity boost. The simulated input looks like this:

const input = {
annualIncomeCny: 400000,
cityTier: "tier1", // Beijing
modelId: "claude-sonnet-4-6",
performanceMultiplier: 3.0,
dailyTokenUsageM: 50,
}
// Calculation process
// Total annual employment cost = 400k x (1 + 0.4) + 400k/12 ~= 603.3k
// Annual AI cost ~= 50 x 7.125 x 264 ~= 94k
// Affordable workflow count ~= 603.3 / 94 ~= 6.4 workflows
// Equivalent headcount = 1 + (3 - 1) x 1 = 3 people

Conclusion: if one of your coworkers has the same conditions, their output would be equivalent to three people. You are already in the high-risk zone.

If you discover that your current AI usage is “not worth it” (cost-benefit ratio < 1), you can consider:

  1. Reducing token usage: use more efficient prompts and cut down ineffective requests
  2. Choosing a more cost-effective model: for example, DeepSeek-V3 (priced in CNY and cheaper)
  3. Increasing your productivity multiplier: learn advanced Agent usage techniques and truly turn AI into productivity

In the end, all of this comes down to the art of balance. Use too much and you waste money; use too little and nothing changes. The key is finding the sweet spot.

When designing this calculator, the HagiCode team made several engineering decisions worth learning from:

  1. Pure frontend computation: all calculations run in the browser, with no backend API dependency, which protects user privacy
  2. Configuration-driven: all formulas, pricing, and role data are centralized in configuration files, so future updates do not require changing core code logic
  3. Multilingual support: supports both Chinese and English
  4. Instant feedback: results update in real time as soon as the user changes inputs
  5. Detailed formula display: every result includes the full calculation formula to help users understand it

This design makes the calculator easy to maintain and extend, while also serving as a reference template for similar data-driven applications.

Good architecture, like good code, takes time to build up. The HagiCode team put real thought into it.

The core value of the AI productivity calculator is that it turns the vague anxiety of an “AI replacement threat” into metrics that can be quantified and compared.

The equivalent headcount formula, 1 + (productivity multiplier - 1) x affordability ratio, is the core innovation of the entire framework. It considers not only productivity gains, but also whether a company can afford the AI cost, making the evaluation much closer to reality.

This framework tells us one thing clearly: in the AI era, not knowing where you stand is the most dangerous position of all.

Instead of worrying, let the data speak.

A lot of fear comes from the unknown. Once you quantify everything, the situation no longer feels quite so terrifying. At worst, you improve yourself or change tracks. Life is long, and there is no need to hang everything on a single tree.


Visit cost.hagicode.com now and complete your AI productivity assessment.



Data source: cost.hagicode.com | Powered by HagiCode

In the end, a line of poetry came to mind: “This feeling might have become a thing to remember, yet even then one was already lost.” The AI era is much the same. Instead of waiting until you are replaced and filled with regret, it is better to start taking action now…

Thank you for reading. If you found this article useful, likes, bookmarks, and shares are all welcome. This content was created with AI-assisted collaboration, and the final version was reviewed and confirmed by the author.

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Section titled “Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs”

During the development of the HagiCode project, we needed to integrate multiple AI coding assistant CLIs at the same time, including Claude Code, Codex, and CodeBuddy. Each CLI has different interfaces, parameters, and output formats, and the repeated integration code made the project harder and harder to maintain. In this article, we share how we built a unified abstraction layer with HagiCode.Libs to solve this engineering pain point. You could also say it is simply some hard-earned experience gathered from the pitfalls we have already hit.

The market for AI coding assistants is quite lively now. Besides Claude Code, there are also OpenAI’s Codex, Zhipu’s CodeBuddy, and more. As an AI coding assistant project, HagiCode needs to integrate these different CLI tools across multiple subprojects, including desktop, backend, and web.

At first, the problem was manageable. Integrating one CLI was only a few hundred lines of code. But as the number of CLIs we needed to support kept growing, things started to get messy.

Each CLI has its own command-line argument format, different environment variable requirements, and a wide variety of output formats. Some output JSON, some output streaming JSON, and some output plain text. On top of that, there are cross-platform compatibility issues. Executable discovery and process management work very differently between Windows and Unix systems, so code duplication kept increasing. In truth, it was just a bit more Ctrl+C and Ctrl+V, but maintenance quickly became painful.

The most frustrating part was that every time we wanted to add support for a new CLI capability, we had to change the same code in several projects. That approach was clearly not sustainable in the long run. Code has a temper too; duplicate it too many times and it starts causing trouble.

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project that needs to maintain multiple subprojects at the same time, including a frontend VSCode extension, backend AI services, and a cross-platform desktop client. In a way, it was exactly this complex, multi-language, multi-platform environment that led to the birth of HagiCode.Libs. You could say we were forced into it, and so be it.

Although these AI coding assistant CLIs each have their own characteristics, from a technical perspective they share several obvious traits:

Similar interaction patterns: they all start a CLI process, send a prompt, receive streaming responses, parse messages, and then either end or continue the session. At the end of the day, the whole flow follows the same basic mold.

Similar configuration needs: they all need API key authentication, working directory setup, model selection, tool permission control, and session management. After all, everyone is making a living from APIs; the differences are mostly a matter of flavor.

The same cross-platform challenges: they all need to solve executable path resolution (claude vs claude.exe vs /usr/local/bin/claude), process startup and environment variable handling, shell command escaping, and argument construction. Cross-platform work is painful no matter how you describe it. Only people who have stepped into the traps really understand the difference between Windows and Unix.

Based on this analysis, we needed a unified abstraction layer that could provide a consistent interface, encapsulate cross-platform CLI discovery logic, handle streaming output parsing, and support both dependency injection and non-DI scenarios. It is the kind of problem that makes your head hurt just thinking about it, but you still have to face it. After all, it is our own project, so we have to finish it even if we have to cry our way through it.

We created HagiCode.Libs, a lightweight .NET 10 library workspace released under the MIT license and now published on GitHub. It may not be some world-shaking masterpiece, but it is genuinely useful for solving real problems.

HagiCode.Libs/
├── src/
│ ├── HagiCode.Libs.Core/ # Core capabilities
│ │ ├── Discovery/ # CLI executable discovery
│ │ ├── Process/ # Cross-platform process management
│ │ ├── Transport/ # Streaming message transport
│ │ └── Environment/ # Runtime environment resolution
│ ├── HagiCode.Libs.Providers/ # Provider implementations
│ │ ├── ClaudeCode/ # Claude Code provider
│ │ ├── Codex/ # Codex provider
│ │ └── Codebuddy/ # CodeBuddy provider
│ ├── HagiCode.Libs.ConsoleTesting/ # Testing framework
│ ├── HagiCode.Libs.ClaudeCode.Console/
│ ├── HagiCode.Libs.Codex.Console/
│ └── HagiCode.Libs.Codebuddy.Console/
└── tests/ # xUnit tests

When designing HagiCode.Libs, we followed a few principles. They all came from lessons learned the hard way:

Zero heavy framework dependencies: it does not depend on ABP or any other large framework, which keeps it lightweight. These days, the fewer dependencies you have, the fewer headaches you get. Most people have already been beaten up by dependency hell at least once.

Cross-platform support: native support for Windows, macOS, and Linux, without writing separate code for different platforms. One codebase that runs everywhere is a pretty good thing.

Streaming processing: CLI output is handled with asynchronous streams, which fits modern .NET programming patterns much better. Times change, and async is king.

Flexible integration: it supports dependency injection scenarios while also allowing direct instantiation. Different people have different preferences, so we wanted it to be convenient either way.

If your project already uses dependency injection, such as ASP.NET Core or the generic host, you can integrate it directly. It is a small thing, but a well-behaved one:

using HagiCode.Libs.Providers;
using Microsoft.Extensions.DependencyInjection;
var services = new ServiceCollection();
services.AddHagiCodeLibs();
await using var provider = services.BuildServiceProvider();
var claude = provider.GetRequiredService<ICliProvider<ClaudeCodeOptions>>();
var options = new ClaudeCodeOptions
{
ApiKey = "your-api-key",
Model = "claude-sonnet-4-20250514"
};
await foreach (var message in claude.ExecuteAsync(options, "Hello, Claude!"))
{
Console.WriteLine($"{message.Type}: {message.Content}");
}

If you are writing a simple script or working in a non-DI scenario, creating an instance directly also works. Put simply, it depends on your personal preference:

var claude = new ClaudeCodeProvider();
var options = new ClaudeCodeOptions
{
ApiKey = "sk-ant-xxx",
Model = "claude-sonnet-4-20250514"
};
await foreach (var message in claude.ExecuteAsync(options, "Help me write a quicksort"))
{
// Handle messages
}

Both approaches use the same underlying implementation, so you can choose the integration style that best fits your project. There is no universal right answer in this world. What suits you is the best option. It may sound cliché, but it is true.

Each provider has its own dedicated testing console project, making it easier to validate the integration independently. Testing is one of those things where if you are going to do it, you should do it properly:

Terminal window
# Claude Code tests
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-provider
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-all claude
# CodeBuddy tests
dotnet run --project src/HagiCode.Libs.Codebuddy.Console -- --test-provider codebuddy-cli
# Codex tests
dotnet run --project src/HagiCode.Libs.Codex.Console -- --test-provider codex-cli

The testing scenarios cover several key cases:

  • Ping: health check to confirm the CLI is available
  • Simple Prompt: basic prompt test
  • Complex Prompt: multi-turn conversation test
  • Session Restore/Resume: session recovery test
  • Repository Analysis: repository analysis test

This standalone testing console design is especially useful during debugging because it lets us quickly identify whether the issue is in the HagiCode.Libs layer or in the CLI itself. Debugging is really just about finding where the problem is. Once the direction is right, you are already halfway there.

Cross-platform compatibility is one of the core goals of HagiCode.Libs. We configured the GitHub Actions workflow .github/workflows/cli-discovery-cross-platform.yml to run real CLI discovery validation across ubuntu-latest, macos-latest, and windows-latest.

This ensures that every code change does not break cross-platform compatibility. During local development, you can also reproduce it with the following commands. After all, you cannot ask CI to take the blame for everything. Your local environment should be able to run it too:

Terminal window
npm install --global @anthropic-ai/claude-code@2.1.79
HAGICODE_REAL_CLI_TESTS=1 dotnet test --filter "Category=RealCli"

HagiCode.Libs uses asynchronous streams to process CLI output. Compared with traditional callback or event-based approaches, this fits the asynchronous programming style of modern .NET much better. In the end, this is simply how technology moves forward, whether anyone likes it or not:

public async IAsyncEnumerable<CliMessage> ExecuteAsync(
TOptions options,
string prompt,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Start the CLI process
// Parse streaming JSON output
// Yield the CliMessage sequence
}

The message types include:

  • user: user message
  • assistant: assistant response
  • tool_use: tool invocation
  • result: session end

This design lets callers handle streaming output flexibly, whether for real-time display, buffered post-processing, or forwarding to other services. Why worry whether the sky is sunny or cloudy? What matters is that once the idea opens up, you can use it however you like.

The HagiCode.Libs.Exploration module provides Git repository discovery and status checking, which is especially useful in repository analysis scenarios. This feature was also born out of necessity, because HagiCode needs to analyze repositories:

// Discover Git repositories
var repositories = await GitRepositoryDiscovery.DiscoverAsync("/path/to/search");
// Get repository information
var info = await GitRepository.GetInfoAsync(repoPath);
Console.WriteLine($"Branch: {info.Branch}, Remote: {info.RemoteUrl}");
Console.WriteLine($"Has uncommitted changes: {info.HasUncommittedChanges}");

HagiCode’s code analysis capabilities use this module to identify project structure and Git status. It is a good example of making full use of what we built.

Based on our practice in the HagiCode project, there are several points that deserve special attention. They are all real issues that need to be handled carefully:

API key security: do not hardcode API keys in your code. Use environment variables or configuration management instead. HagiCode.Libs supports passing configuration through Options objects, making it easier to integrate with different configuration sources. When it comes to security, there is no such thing as being too careful.

CLI version pinning: in CI/CD, we pin specific versions, such as @anthropic-ai/claude-code@2.1.79, to reduce uncertainty caused by version drift. It is also a good idea to use fixed versions in local development. Versioning can be painful. If you do not pin versions, the problem will teach you a lesson very quickly.

Test categorization: default tests use fake providers to keep them deterministic and fast, while real CLI tests must be enabled explicitly. This gives CI fast feedback while still allowing real-environment validation when needed. Striking that balance is never easy. Speed and stability always require trade-offs.

Session management: different CLIs have different session recovery mechanisms. Claude Code uses the .claude/ directory to store sessions, while Codex and CodeBuddy each have their own approaches. When using them, be sure to check their respective documentation and understand the details of their session persistence mechanisms. There is no harm in understanding it clearly.

HagiCode.Libs is the unified abstraction layer we built during the development of HagiCode to solve the repeated engineering work involved in multi-CLI integration. By providing a consistent interface, encapsulating cross-platform details, and supporting flexible integration patterns, it greatly reduces the engineering complexity of integrating multiple AI coding assistants. Much may fade away, but the experience remains.

If you also need to integrate multiple AI CLI tools in your project, or if you are interested in cross-platform process management and streaming message handling, feel free to check it out on GitHub. The project is released under the MIT license, and contributions and feedback are welcome. In the end, it is a happy coincidence that we met here, so since you are already here, we might as well become friends.

The approach shared in this article was shaped by real pitfalls and real optimization work inside HagiCode. What else could we do? Running into pitfalls is normal. If you think this solution is valuable, then perhaps our engineering work is doing all right. And HagiCode itself may also be worth your attention. You might even find a pleasant surprise.


If this article helped you:

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Why HagiCode Chose Hermes as Its Integrated Agent Core

Why HagiCode Chose Hermes as Its Integrated Agent Core

Section titled “Why HagiCode Chose Hermes as Its Integrated Agent Core”

When building an AI-assisted coding platform, choosing the right Agent core directly determines the upper limit of the system’s capabilities. Some things simply cannot be forced; pick the wrong framework, and no amount of effort will make it feel right. This article shares the thinking behind HagiCode’s technical selection and our hands-on experience integrating Hermes Agent.

When building an AI-assisted coding product, one of the hardest parts is choosing the underlying Agent framework. There are actually quite a few options on the market, but some are too limited in functionality, some are overly complex to deploy, and others simply do not scale well enough. What we needed was a solution that could run on a $5 VPS while also being able to connect to a GPU cluster. That requirement may not sound extreme, but it is enough to scare plenty of teams away.

In practice, many so-called “all-in-one Agents” either only run in the cloud or require absurdly high local deployment costs. After spending two weeks researching different approaches, we made a bold decision: rebuild the entire Agent core around Hermes as the underlying engine for our integrated Agent.

Everything that followed may simply have been fate.

The approach shared in this article comes from real-world experience in the HagiCode project. HagiCode is an AI-assisted coding platform that provides developers with an intelligent coding assistant through a VSCode extension, a desktop client, and web services. You may have used similar tools before and felt they were just missing that final touch; we understand that feeling well.

Before diving into Hermes itself, it helps to explain why HagiCode needed something like it in the first place. Things rarely work exactly the way you want, so you need a practical reason to commit to a technical direction.

As an AI coding assistant, HagiCode needs to support several usage scenarios at the same time:

  • Local development environments: developers want to run it on their own machines so data never leaves the local environment. These days, data security is never a trivial concern.
  • Team collaboration environments: small teams should be able to share an Agent deployment running on a server. Saving money matters, and everyone has limits.
  • Elastic cloud expansion: when handling complex tasks, the system should automatically scale out to a GPU cluster. It is always better to be prepared.

This “we want everything at once” requirement is what led us to Hermes. Whether it was the perfect choice, I cannot say for sure, but at the time we did not see a better option.

Hermes Agent is an autonomous AI Agent created by Nous Research. Some readers may not be familiar with Nous Research; they are the lab behind open-source large models such as Hermes, Nomos, and Psyché. They have built many excellent things, even if they are still more underappreciated than they deserve.

Unlike traditional IDE coding assistants or simple API chat wrappers, Hermes has a defining trait: the longer it runs, the more capable it becomes. It is not designed to complete a task once and stop; it keeps learning and accumulating experience over long-running operation. In that sense, it feels a little like a person.

Several of Hermes’s core capabilities happen to align very closely with HagiCode’s needs.

This means HagiCode can choose the most suitable deployment model based on each user’s scenario: individuals run it locally, teams deploy it on servers, and complex tasks use GPU resources. One codebase handles all of it. In a world this busy, saving one layer of complexity is already a win.

Multi-platform messaging gateway Hermes natively supports Telegram, Discord, Slack, WhatsApp, and more. For HagiCode, this means we can support AI assistants on those channels much more easily in the future. More paths forward are always welcome.

Rich tool system Hermes comes with 40+ built-in tools and supports MCP (Model Context Protocol) extensions. This is essential for a coding assistant: executing shell commands, working with the file system, and calling Git all depend on tool support. An Agent without tools is like a bird without wings.

Cross-session memory Hermes includes a persistent memory system and uses FTS5 full-text search to recall historical conversations. That allows the Agent to remember prior context instead of “losing its memory” every time. Sometimes people wish they could forget things that easily, but reality is usually less generous.

Now that the “why” is clear, let us look at the “how.” Once something makes sense in theory, the next step is to build it.

In HagiCode’s architecture, all AI Providers implement a unified IAIProvider interface:

public sealed class HermesCliProvider : IAIProvider, IVersionedAIProvider
{
public ProviderCapabilities Capabilities { get; } = new ProviderCapabilities
{
SupportsStreaming = true, // Supports streaming output
SupportsTools = true, // Supports tool invocation
SupportsSystemMessages = true, // Supports system prompts
SupportsArtifacts = false
};
}

This abstraction layer allows HagiCode to switch seamlessly between different AI Providers. Whether the backend is OpenAI, Claude, or Hermes, the upper-layer calling pattern stays exactly the same. In plain terms, it keeps things simple.

Hermes communicates through ACP (Agent Communication Protocol). This protocol is designed specifically for Agent communication, and its main methods include:

MethodDescription
initializeInitialize the connection and obtain the protocol version and client capabilities
authenticateHandle authentication and support multiple authentication methods
session/newCreate a new session and configure the working directory and MCP servers
session/promptSend a prompt and receive a response

HagiCode implements the ACP transport layer through StdioAcpTransport, launching a Hermes subprocess and communicating with it over standard input and output. It may sound complicated, but in practice it is manageable as long as you have enough patience.

Configuration is managed through the HermesPlatformConfiguration class:

public sealed class HermesPlatformConfiguration : IAcpPlatformConfiguration
{
public string ExecutablePath { get; set; } = "hermes";
public string Arguments { get; set; } = "acp";
public int StartupTimeoutMs { get; set; } = 5000;
public string ClientName { get; set; } = "HagiCode";
public HermesAuthenticationConfiguration Authentication { get; set; }
public HermesSessionDefaultsConfiguration SessionDefaults { get; set; }
}

Configure Hermes in appsettings.json:

{
"Providers": {
"HermesCli": {
"ExecutablePath": "hermes",
"Arguments": "acp",
"StartupTimeoutMs": 10000,
"ClientName": "HagiCode",
"Authentication": {
"PreferredMethodId": "api-key",
"MethodInfo": {
"api-key": "your-api-key-here"
}
},
"SessionDefaults": {
"Model": "claude-sonnet-4-20250514",
"ModeId": "default"
}
}
}
}

Configuration often looks simple on paper, but getting every detail right still takes real effort.

HagiCode uses Orleans to build its distributed system, and the Hermes integration is implemented through the following components:

  • HermesGrain: An Orleans Grain implementation that handles session execution
  • HermesPlatformConfiguration: Platform-specific configuration
  • HermesAcpSessionAdapter: ACP session adapter
  • HermesConsole: A dedicated validation console

The name Orleans does have a certain charm to it. Even if this Orleans has nothing to do with the legendary city, a good name never hurts.

The following is the core execution logic of the Hermes Provider:

private async IAsyncEnumerable<AIStreamingChunk> StreamCoreAsync(
AIRequest request,
string? embeddedCommandPrompt,
[EnumeratorCancellation] CancellationToken cancellationToken)
{
// 1. Create transport layer and launch Hermes subprocess
await using var transport = new StdioAcpTransport(
platformConfiguration.GetExecutablePath(),
platformConfiguration.GetArguments(),
platformConfiguration.GetEnvironmentVariables(),
platformConfiguration.GetStartupTimeout(),
_loggerFactory.CreateLogger<StdioAcpTransport>());
await transport.ConnectAsync(cancellationToken);
// 2. Initialize and obtain protocol version and authentication methods
var initializeResult = await SendHermesRequestAsync(
transport, nextRequestId++, "initialize",
BuildInitializeParameters(platformConfiguration), cancellationToken);
// 3. Handle authentication
var authMethods = ParseAuthMethods(initializeResult);
if (!isAuthenticated)
{
var methodId = platformConfiguration.Authentication.ResolveMethodId(authMethods);
await SendHermesRequestAsync(transport, nextRequestId++, "authenticate", ...);
}
// 4. Create session
var newSessionResult = await SendHermesRequestAsync(
transport, nextRequestId++, "session/new",
BuildNewSessionParameters(platformConfiguration, workingDirectory, model), cancellationToken);
var sessionId = ParseSessionId(newSessionResult);
// 5. Execute prompt and collect streaming responses
await foreach (var payload in transport.ReceiveMessagesAsync(cancellationToken))
{
// Handle session/update notifications and convert them into streaming chunks
if (TryParseSessionNotification(root, out var notification))
{
if (_responseMapper.TryConvertToStreamingChunk(notification, out var chunk))
{
yield return chunk;
}
}
}
}

With code, the details eventually become familiar. What matters most is the overall approach.

To ensure Hermes remains available, HagiCode implements a health check mechanism:

public async Task<ProviderTestResult> PingAsync(CancellationToken cancellationToken = default)
{
var response = await ExecuteAsync(
new AIRequest
{
Prompt = "Reply with exactly PONG.",
CessionId = null,
AllowedTools = Array.Empty<string>(),
WorkingDirectory = ResolveWorkingDirectory(null)
},
cancellationToken);
var success = string.Equals(response.Content.Trim(), "PONG", StringComparison.OrdinalIgnoreCase);
return new ProviderTestResult
{
ProviderName = Name,
Success = success,
ResponseTimeMs = stopwatch.ElapsedMilliseconds,
ErrorMessage = success ? null : $"Unexpected Hermes ping response: '{response.Content}'."
};
}

That is roughly what a “health check” looks like here. In some ways, people are not so different: it helps to check in from time to time, even if no one tells us exactly what to look for.

There are a few pitfalls worth understanding before integrating Hermes. Everyone steps into a few traps sooner or later.

Hermes supports multiple authentication methods, including API keys and tokens, so you need to choose based on the actual deployment scenario. Misconfiguration can cause connection failures, and the resulting error messages are not always intuitive. Sometimes the reported error is far away from the real root cause, which means slow and careful debugging is unavoidable.

When creating a session, you can configure a list of MCP servers so Hermes can call external tools. But keep the following points in mind:

  • MCP server addresses must be reachable
  • Timeouts must be configured reasonably
  • The system needs degradation handling when a server is unavailable

In practice, defensive thinking matters more than people expect.

Each session must specify a working directory so Hermes can access project files correctly. In multi-project scenarios, the working directory needs to switch dynamically. It sounds straightforward, but there are more edge cases than you might think.

Hermes responses may be split across session/update notifications and the final result, so they must be merged correctly. Otherwise, content may be lost.

Runtime errors should be returned explicitly instead of silently falling back to another Provider. That way, users know the issue came from Hermes rather than wondering why the system suddenly switched models behind the scenes.

HagiCode’s decision to use Hermes as its integrated Agent core was not a casual impulse. It was a careful choice based on practical requirements and the technical characteristics of the framework. Whether it proves to be the perfect long-term answer is still too early to say, but so far it has been serving us well.

Hermes gives HagiCode the flexibility to adapt to a wide range of scenarios. Its powerful tool system and MCP support allow the AI assistant to do real work, while the ACP protocol and Provider abstraction layer keep the integration process clear and controllable.

If you are choosing an Agent framework for your own AI project, I hope this article offers a useful reference. Picking the right underlying architecture can make everything that follows much easier.

Thank you for reading. If you found this article useful, you are welcome to support it with a like, bookmark, or share. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

.NET Code Protection in Practice: From Obfuscation to Virtual Machine Protection

.NET Code Protection in Practice: From Obfuscation to Virtual Machine Protection

Section titled “.NET Code Protection in Practice: From Obfuscation to Virtual Machine Protection”

This article explains how to implement a multi-layered code protection strategy in .NET projects, covering the full path from basic obfuscation to professional virtual machine protection.

In .NET application development, protecting core code such as license validation, business logic, and sensitive configuration from decompilation and reverse engineering is, frankly, a topic you cannot avoid. As the .NET ecosystem has matured, developers have gained access to a range of protection options, from built-in obfuscation attributes to professional virtualization-based protection tools.

As a complex multilingual monorepo project, HagiCode includes desktop applications, build systems, and license management capabilities. The code inevitably contains license validation logic, sensitive configuration such as API keys and product IDs, and business-critical logic. Those parts need serious protection, because no one wants their hard work to be exposed so easily.

This article shares the code protection approach we actually adopted in the HagiCode project and summarizes the full journey from early pitfalls to later optimization. Hopefully it gives you some useful ideas.

HagiCode is an open source AI coding assistant project dedicated to providing developers with an intelligent programming experience. The project uses a monorepo architecture and simultaneously maintains a VSCode extension, backend AI services, a cross-platform desktop client, and more. That multi-language, multi-platform complexity makes code protection an engineering challenge we have to face head-on.

The approach shared in this article is the result of real trial and error during HagiCode development. If you want to see how we solved these technical problems, keep reading. You may find a few unexpected takeaways.

1. Microsoft’s Built-in Obfuscation Attribute

Section titled “1. Microsoft’s Built-in Obfuscation Attribute”

.NET Framework provides a built-in [ObfuscationAttribute], which is the most basic and commonly used code obfuscation marker. This attribute lives in the System.Reflection namespace and allows you to apply baseline protection to code without introducing third-party tools.

Core features:

  • Feature property: Specifies the obfuscation feature, such as "ultra" (high obfuscation) or "all" (full obfuscation)
  • Exclude property: true means exclude from obfuscation, and false means apply obfuscation
  • Can be applied to classes, methods, properties, and other type members

In the HagiCode project, you can see it used like this:

[Obfuscation(Feature = "ultra", Exclude = false)]
public async Task<LicenseValidationResult?> ValidateLicenseAsync(...)

The advantages of this approach are fairly obvious:

  • No extra dependencies; it is built into .NET Framework out of the box
  • Can be recognized and processed by third-party obfuscation tools
  • Does not significantly increase the size of the compiled assembly

That said, it also has limitations. It is only a marker, and the actual obfuscation result depends on the tool implementation. It cannot provide virtual machine protection-level security.

VMP is a professional code protection tool that provides high-level protection by compiling code into virtual machine instructions. Unlike simple name obfuscation, VMP actually transforms code logic into a form that conventional decompilers cannot reconstruct.

Protection level classification:

LevelVirtualizationMutationAnti-debuggingString EncryptionUse Cases
HIGHfullhighenabledenabledLicense validation, session concurrency, sensitive constants
MEDIUMpartialmediumenabledenabledBusiness logic, domain models
LOWnonelowdisableddisabledUtility classes, non-critical code

The HagiCode project defines a declarative attribute system for marking code that needs protection:

// High-priority protection
[VmProtect(VmProtectionPriority.High, Reason = "Contains license verification logic")]
public class KeygenClient { ... }
// Exclude from protection
[VmExclude(Reason = "Public API that must remain unchanged")]
public class PublicApi { ... }
// Inherited protection
[VmProtect(Priority.High, ProtectDerived = true)]
public class BaseLicenseValidator { ... }

VMP protection does not only matter at runtime. It also needs to be automated as part of the build pipeline, because doing it manually would be far too tedious. HagiCode’s build system supports several modes:

  • Native Windows mode: Invoke the VMProtect tool directly
  • Linux Docker container mode: Run VMP inside a container to solve cross-platform compatibility issues
  • Attribute scanning: Automatically discover protection markers in code
  • Validation mechanism: Confirm that protection has been applied successfully

Taken together, these capabilities make the process much easier to manage.

1. Using Microsoft’s Built-in Obfuscation Attribute

Section titled “1. Using Microsoft’s Built-in Obfuscation Attribute”

Apply ObfuscationAttribute directly in code:

using System.Reflection;
[Obfuscation(Feature = "ultra", Exclude = false)]
public class LicenseService
{
[Obfuscation(Feature = "ultra", Exclude = false)]
public async Task<bool> ValidateLicenseAsync(string key)
{
// License validation logic
}
[Obfuscation(Feature = "flow", Exclude = false)]
private string DecryptToken(string encrypted)
{
// Decryption logic
}
}

Sometimes you need to let test assemblies access internal members while still keeping production code secure:

AssemblyInfo.cs
[assembly: InternalsVisibleTo("HagiCode.Application.Tests")]
[assembly: InternalsVisibleTo("DynamicProxyGenAssembly2")] // for Moq

This makes testing much more convenient, because the code still needs to be tested properly.

2. Custom Attribute Definitions for VMP Protection

Section titled “2. Custom Attribute Definitions for VMP Protection”

Create custom protection attributes to control VMP behavior:

using System;
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method | AttributeTargets.Property)]
public class VmProtectAttribute : Attribute
{
public VmProtectionPriority Priority { get; set; }
public string? Reason { get; set; }
public bool ProtectDerived { get; set; }
}
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method | AttributeTargets.Property)]
public class VmExcludeAttribute : Attribute
{
public string? Reason { get; set; }
}
public enum VmProtectionPriority
{
None = 0,
Low = 1,
Medium = 2,
High = 3
}

Custom attributes are often easier to work with because they reflect your own protection requirements directly.

vmp_config.yml
protection:
priority_mode: "attribute" # Attribute-based priority
default_level: "medium"
tools:
- name: "vmprotect"
path: "C:\\Program Files\\VMProtect Ultimate\\VMProtect.exe"
protection_levels:
high:
virtualization: "full"
mutation: "high"
anti_debug: true
anti_dump: true
encrypt_strings: true
encrypt_resources: true
medium:
virtualization: "partial"
mutation: "medium"
anti_debug: true
encrypt_strings: true
low:
virtualization: "none"
mutation: "low"
anti_debug: false

A clearer configuration makes the system much easier to maintain later.

1. Protection Practices for Critical Components

Section titled “1. Protection Practices for Critical Components”

According to HagiCode’s code-protection specification, the following components must use HIGH-priority protection:

// Production constants - must be encrypted and protected by VMP
[VmProtect(VmProtectionPriority.High, Reason = "Production constants")]
public static class ProductionConstants
{
// Encrypted string accessor, protected by VMP
[VmProtect(VmProtectionPriority.High)]
public static string GetLicenseServerUrl(IOptions<LicenseOptions> options) => ...;
}
// License validation logic
[VmProtect(VmProtectionPriority.High, Reason = "License verification logic")]
public class KeygenClient : IKeygenClient
{
[Obfuscation(Feature = "ultra", Exclude = false)]
public async Task<LicenseValidationResult?> ValidateLicenseAsync(...) { ... }
}
// Machine fingerprint service
[VmProtect(VmProtectionPriority.High)]
public class MachineFingerprintService : IMachineFingerprintService { ... }

Critical code deserves stronger protection, because exposing core logic would cause real problems.

2. String Encryption and Runtime Decryption

Section titled “2. String Encryption and Runtime Decryption”

Encrypt strings at build time and decrypt them at runtime:

public static class StringDecryption
{
[VmProtect(VmProtectionPriority.High, Reason = "CRITICAL SECURITY")]
public static string DecryptString(byte[] encryptedData, byte[] key, byte[] iv)
{
using var aes = Aes.Create();
aes.Key = key;
aes.IV = iv;
using var decryptor = aes.CreateDecryptor();
using var ms = new MemoryStream(encryptedData);
using var cs = new CryptoStream(ms, decryptor, CryptoStreamMode.Read);
using var reader = new StreamReader(cs);
return reader.ReadToEnd();
}
}
// Production constant accessor (lazy loading + caching)
public static class ProductionConstants
{
private static string? _cachedLicenseServerUrl;
public static string GetLicenseServerUrl(IOptions<LicenseOptions> options)
{
if (_cachedLicenseServerUrl == null)
{
var encrypted = GetEncryptedLicenseServerUrl();
#if DEBUG
_cachedLicenseServerUrl = options.Value.PrimaryServer.Url;
#else
_cachedLicenseServerUrl = StringDecryption.DecryptString(
encrypted,
GetEncryptionKey(),
GetEncryptionIV());
#endif
}
return _cachedLicenseServerUrl;
}
}

This step matters because sensitive information should never be left in plaintext.

After the build, you must verify whether protection was applied successfully; otherwise, you cannot be sure it is actually working:

// Example verification script
public bool VerifyProtection(string assemblyPath)
{
// 1. Check the VMP signature
var bytes = File.ReadAllBytes(assemblyPath);
var vmpSignature = Encoding.ASCII.GetBytes("VMProtect");
if (bytes.Any(b => vmpSignature.Contains(b)))
{
return true;
}
// 2. Check for file size changes (the protected file is usually larger)
var originalInfo = new FileInfo(assemblyPath.Replace(".dll", ".bak"));
if (originalInfo.Exists)
{
var sizeRatio = (double)new FileInfo(assemblyPath).Length / originalInfo.Length;
return sizeRatio > 1.1;
}
return false;
}

Verification is always worth doing, because otherwise problems can slip through unnoticed.

There are several pitfalls here that deserve special attention:

  1. Do not obfuscate all code: Public APIs, interface definitions, and DTO classes usually do not need protection. Excessive obfuscation can hurt performance and debugging. The HagiCode project learned this the hard way.

  2. Protect key accessors: Methods that retrieve encryption keys must receive the same or a higher protection level than the encrypted data itself; otherwise the whole setup loses its value.

  3. Balance testing and production: DEBUG builds should skip encryption to make development and debugging easier, while RELEASE builds should enable full protection. Remember to separate them with conditional compilation such as #if DEBUG.

  4. Consider the Docker environment: Running VMP on Linux requires a containerized approach to ensure tool compatibility. HagiCode uses a Wine + VMP container solution to solve the cross-platform problem.

  5. Verification is mandatory: After the build finishes, you must verify that protection was applied successfully. Otherwise sensitive code may still be exposed, and the verification code shown earlier exists for exactly this purpose.

With this multi-layer protection strategy, HagiCode built a comprehensive code security system that spans from baseline obfuscation to virtual machine protection:

  • Layer 1: Use ObfuscationAttribute for baseline marking and provide hints to third-party tools
  • Layer 2: Use custom VmProtectAttribute declarations to express protection intent and priority
  • Layer 3: Use VMP virtual machine protection to transform critical code into irreversible virtual machine instructions
  • Layer 4: Automatically scan and apply protection during the build, then verify the result

This approach can resist ordinary decompilation tools while also standing up better against advanced reverse engineering attacks. If you are building a .NET application that needs code protection, I hope this gives you at least a useful reference point.


If this article helped you:

Building an AI Adventure Party: A Practical Guide to Multi-Agent Collaboration Configuration in HagiCode

Building an AI Adventure Party: A Practical Guide to Multi-Agent Collaboration Configuration in HagiCode

Section titled “Building an AI Adventure Party: A Practical Guide to Multi-Agent Collaboration Configuration in HagiCode”

In modern software development, a single AI Agent is no longer enough for complex needs. How can multiple AI assistants from different companies collaborate within the same project? This article shares the multi-Agent collaboration configuration approach that the HagiCode project developed through real-world practice.

Many developers have likely had this experience: bringing an AI assistant into a project really does improve coding efficiency. But as requirements grow more complex, one AI Agent starts to fall short. You want it to handle code review, documentation generation, unit tests, and more at the same time, but the result is often that it cannot balance everything well, and output quality becomes inconsistent.

What is even more frustrating is that once you try to introduce multiple AI assistants, things get more complicated. Each Agent has its own configuration method, API interface, and execution logic, and they may even conflict with one another. It is like a sports team where every player is individually strong, but nobody knows how to coordinate, so the whole match turns into chaos.

The HagiCode project ran into the same problem during development. As a complex project involving a frontend VSCode extension, backend AI services, and a cross-platform desktop client, in the 2026-03 version at that time we needed to integrate multiple AI assistants from different companies at once: Claude Code, Codex, CodeBuddy, iFlow, and more. Figuring out how to let them coexist harmoniously in the same project while making the best use of their individual strengths became a critical problem we had to solve.

That alone would already be enough trouble. After all, who wants to deal with a group of AI tools fighting each other every day?

The approach shared in this article is the multi-Agent collaboration configuration practice we developed in the HagiCode project through real trial and error and repeated optimization. If you are also struggling with multiple AI assistants working together, this article may give you some ideas. Maybe. Every project is different, after all.

HagiCode is an AI coding assistant project that adopts an “adventure party” model in which multiple AI engines work together. Project repository: github.com/HagiCode-org/site.

The multi-Agent configuration approach shared here is one of the core techniques that allows HagiCode to maintain efficient development in complex projects. There is nothing especially mystical about it - it just turns a group of AIs into an adventure party that can actually coordinate.

HagiCode’s Multi-Agent Architecture Design

Section titled “HagiCode’s Multi-Agent Architecture Design”

From “Going Solo” to “Team Collaboration”

Section titled “From “Going Solo” to “Team Collaboration””

In the early days of the HagiCode project, we also tried using a single AI Agent to handle everything. We quickly discovered a clear bottleneck in that approach: different tasks demand different strengths. Some tasks require stronger contextual understanding, while others need more precise code editing. One Agent has a hard time excelling at all of them.

That made us realize that multiple Agents had to work together. But the problem was this: how do you let AI products from different companies coexist peacefully in the same project? We needed to solve several core issues:

  1. Configuration management complexity: each Agent has different configuration methods, API interfaces, and execution modes
  2. Unified communication protocol: we need a standardized way for different Agents to exchange data
  3. Task coordination and division of labor: how do we assign work reasonably so each Agent can play to its strengths

With those questions in mind, we started designing HagiCode’s multi-Agent architecture. It was not really that complicated in the end; we just had to think it through clearly.

After multiple iterations, this is the architecture we settled on:

┌─────────────────────────────────────────────────────────────────┐
│ AIProviderFactory │
│ (Factory pattern for unified management of all AI Providers) │
├─────────────────────────────────────────────────────────────────┤
│ ClaudeCodeCli │ CodexCli │ CodebuddyCli │ IFlowCli │
│ (Anthropic) │ (OpenAI) │ (Zhipu GLM) │ (Zhipu) │
└─────────────────────────────────────────────────────────────────┘

The core idea is to let different AI Agents be managed by the same code through a unified Provider interface. At the same time, the factory pattern is used to dynamically create and configure these Providers, ensuring scalability and flexibility across the system.

It is like division of labor in daily life. Everyone has a role; here we simply turned that idea into code architecture.

Agent Types and Division of Responsibilities

Section titled “Agent Types and Division of Responsibilities”

Based on HagiCode’s real-world experience, we assigned different responsibilities to each Agent:

AgentProviderModelPrimary Use
ClaudeCodeCliAnthropicglm-5-turboGenerate technical solutions and Proposals
CodexCliOpenAI/Zedgpt-5.4Execute precise code changes
CodebuddyCliZhipuglm-4.7Refine proposal descriptions and documentation
IFlowCliZhipuglm-4.7Archive proposals and historical records (configuration at the time; now legacy-compatible only)
OpenCodeCli--General-purpose code editing
GitHubCopilotMicrosoft-Assisted programming and code completion

The logic behind this division of labor is simple: every Agent has its own area of strength. Claude Code performs well at understanding and analyzing complex requirements, so it handles early solution design. Codex is more precise when modifying code, so it is better suited for concrete implementation work. CodeBuddy offers strong cost performance, which makes it a great fit for refining documentation.

After all, the right tool for the right job is usually the best choice. There are many roads to Rome; some are simply easier to walk than others.

To manage different AI Agents in a unified way, we first need to define a common interface. In HagiCode, that interface looks like this:

public interface IAIProvider
{
// Unified Provider interface
Task<IAIProvider?> GetProviderAsync(AIProviderType providerType);
Task<IAIProvider?> GetProviderAsync(string providerName, CancellationToken cancellationToken);
}

The interface looks simple, but it is the foundation of the entire multi-Agent system. With a unified interface, we can call AI products from different companies in exactly the same way, no matter what is underneath.

This is really just a matter of making complex things simple. Simple is beautiful, after all.

Once the interface is unified, the next question is how to create these Provider instances. HagiCode uses the factory pattern:

private IAIProvider? CreateProvider(AIProviderType providerType, ProviderConfiguration config)
{
return providerType switch
{
AIProviderType.ClaudeCodeCli =>
ActivatorUtilities.CreateInstance<ClaudeCodeCliProvider>(_serviceProvider, Options.Create(config)),
AIProviderType.CodebuddyCli =>
ActivatorUtilities.CreateInstance<CodebuddyCliProvider>(_serviceProvider, Options.Create(config)),
AIProviderType.CodexCli =>
ActivatorUtilities.CreateInstance<CodexCliProvider>(_serviceProvider, Options.Create(config)),
AIProviderType.IFlowCli =>
ActivatorUtilities.CreateInstance<IFlowCliProvider>(_serviceProvider, Options.Create(config)),
_ => null
};
}

This uses dependency injection through ActivatorUtilities.CreateInstance, which can dynamically create Provider instances at runtime while automatically injecting dependencies. The benefit of this design is that when a new Agent type is added, you only need to add the corresponding Provider class and then add one more case branch in the factory method. There is no need to modify the existing code at all.

That is reason enough. Who wants to rewrite a pile of old code every time a new feature is added?

To make configuration more flexible, we also implemented a type-mapping mechanism:

public static AIProviderTypeExtensions
{
private static readonly Dictionary<string, AIProviderType> _typeMap = new(
StringComparer.OrdinalIgnoreCase)
{
["ClaudeCodeCli"] = AIProviderType.ClaudeCodeCli,
["CodebuddyCli"] = AIProviderType.CodebuddyCli,
["CodexCli"] = AIProviderType.CodexCli,
["IFlowCli"] = AIProviderType.IFlowCli,
// ...more type mappings
};
}

The purpose of this mapping table is to convert string-form Provider names into enum types. This allows configuration files to use intuitive string names, while the internal code uses type-safe enums for processing.

Configuration should be as intuitive as possible. Nobody wants to memorize a pile of obscure code names.

In practice, everything can be configured in appsettings.json:

AI:
Providers:
Providers:
ClaudeCodeCli:
Enabled: true
Model: glm-5-turbo
WorkingDirectory: /path/to/project
CodebuddyCli:
Enabled: true
Model: glm-4.7
CodexCli:
Enabled: true
Model: gpt-5.4
IFlowCli:
Enabled: true
Model: glm-4.7

Each Provider can independently configure parameters such as enablement, model version, and working directory. This design preserves flexibility while remaining easy to manage and maintain.

In some ways, configuration files are like life’s options: you can choose to enable or disable certain things. The only difference is that code choices are easier to regret later.

With the unified technical architecture in place, the next step is making multiple Agents work together. HagiCode designed a task flow mechanism so different Agents can handle different stages of the work:

Proposal creation (user)
[Claude Code] ──generate proposal──▶ Proposal document
│ │
│ ▼
│ [Codebuddy] ──refine description──▶ Refined proposal
│ │
│ ▼
│ [Codex] ──execute changes──▶ Code changes
│ │
│ ▼
└──────────────────────▶ [iFlow] ──archive──▶ Historical records

The benefit of this division of labor is that each Agent only needs to focus on the tasks it does best, rather than trying to do everything. Claude Code generates proposals from scratch. Codebuddy makes proposal descriptions clearer. Codex turns proposals into actual code changes. iFlow archives and preserves those changes.

This is really just teamwork, the same as in daily life. Everyone has a role, and only together can something big get done. Here, the team members just happen to be AIs.

In actual operation, we summarized the following lessons:

1. Agent selection strategy matters

Tasks should not be assigned casually; they should be matched to each Agent’s strengths:

  • Proposal generation: use Claude Code, because it has stronger contextual understanding
  • Code execution: use Codex, because it is more precise for code modification
  • Proposal refinement: use Codebuddy, because it offers strong cost performance
  • Archival storage: use iFlow, because it is stable and reliable

After all, putting the right person on the right task is a timeless principle.

2. Configuration isolation ensures stability

Each Agent’s configuration is managed independently, supports environment-variable overrides, and uses separate working directories. As a result, a configuration error in one Agent does not affect the others.

This is like personal boundaries in life. Everyone needs their own space; non-interference makes coexistence possible.

3. Error-handling mechanism

A failure in a single Agent should not affect the overall workflow. We implemented a fallback strategy: when one Agent fails, the system can automatically switch to a backup plan or skip that step and continue with later tasks. At the same time, complete logging makes troubleshooting easier afterward.

Nobody can guarantee that errors will never happen. The key is how you handle them. Life works much the same way.

4. Monitoring and observability

Through the ACP protocol (our custom communication protocol based on JSON-RPC 2.0), we can track the execution status of each Agent. Session isolation ensures concurrency safety, while dynamic caching improves performance.

The things you cannot see are often the ones most likely to go wrong. Some visibility is always better than flying blind.

After adopting this multi-Agent collaboration configuration, the HagiCode project’s development efficiency improved significantly. Specifically:

  1. Task-handling capacity doubled: in the past, one Agent had to handle many kinds of tasks at once; now tasks can be processed in parallel, and throughput has increased dramatically
  2. More stable output quality: each Agent focuses only on what it does best, so consistency and quality both improve
  3. Lower maintenance cost: unified interfaces and configuration management make the whole system easier to maintain and extend
  4. Adding new Agents is simple: to integrate a new AI product, you only need to implement the interface and add configuration, without changing the core logic

This approach not only solved HagiCode’s own problems, but also proved that multi-Agent collaboration is a viable architectural choice.

The gains were quite noticeable. The process was just a bit of a hassle.

This article shared the HagiCode project’s practical experience with multi-Agent collaboration configuration. The main takeaways include:

  1. Standardized interfaces: IAIProvider unifies the behavior of different Agents, allowing the code to ignore which company’s product is underneath
  2. Factory pattern: ActivatorUtilities.CreateInstance dynamically creates Provider instances, supporting runtime configuration and dependency injection
  3. Protocol unification: the ACP protocol provides standardized communication between Agents through a bidirectional mechanism based on JSON-RPC 2.0
  4. Task routing: assign work reasonably across different Agents so each can play to its strengths, instead of expecting one Agent to do everything

This design not only solves the problem of “multiple Agents fighting each other,” but also uses the adventure party task flow mechanism to make the development process more automated and specialized.

If you are also considering introducing multiple AI assistants, I hope this article gives you some useful reference points. Of course, every project is different, and the specific approach still needs to be adjusted to the actual situation. There is no one-size-fits-all solution; the best solution is the one that fits you.

Beautiful things or people do not need to be possessed. As long as they remain beautiful, simply appreciating that beauty is enough. Technical solutions are the same: the one that suits you is the best one…

Building an AI Adventure Party: HagiCode Multi-Agent Collaboration Configuration in Practice

Building an AI Adventure Party: HagiCode Multi-Agent Collaboration Configuration in Practice

Section titled “Building an AI Adventure Party: HagiCode Multi-Agent Collaboration Configuration in Practice”

In modern software development, a single AI Agent is no longer enough to meet complex requirements. How can multiple AI assistants from different companies collaborate within the same project? This article shares the multi-Agent collaboration configuration approach that the HagiCode project developed through real-world practice.

Many developers have probably had this experience: after introducing an AI assistant into a project, productivity really does improve. But as requirements become more and more complex, one AI Agent starts to feel insufficient. You want it to handle code review, documentation generation, unit testing, and other tasks at the same time, but the result is often that it cannot keep everything balanced, and the output quality becomes inconsistent.

What is even more frustrating is that once you try to bring in multiple AI assistants, the problem becomes more complicated. Each Agent has its own configuration method, API interface, and execution logic, and they may even conflict with one another. It is like a sports team in which every player is talented, but nobody knows how to work together, so the match turns into a mess.

The HagiCode project ran into the same challenge during development. As a complex project involving a frontend VSCode extension, backend AI services, and a cross-platform desktop client, we needed to connect multiple AI assistants from different companies at the same time: Claude Code, Codex, CodeBuddy, iFlow, and more. How to let them coexist harmoniously in the same project and make the most of their strengths became a key problem we had to solve.

That alone would already be enough trouble. After all, who wants to deal with a bunch of fighting AIs every day?

The approach shared in this article is the multi-Agent collaboration configuration practice that we developed in the HagiCode project through real trial and error and repeated optimization. If you are also struggling with multiple AI assistants working together, this article may give you some inspiration. Maybe. Every project is different, after all.

HagiCode is an AI coding assistant project that adopts an “adventure party” model in which multiple AI engines work together. Project repository: github.com/HagiCode-org/site.

The multi-Agent configuration approach shared in this article is one of the core technologies that allows HagiCode to maintain efficient development in complex projects. There is nothing especially magical about it; it simply turns a group of AIs into an adventure party that can actually coordinate.

HagiCode’s Multi-Agent Architecture Design

Section titled “HagiCode’s Multi-Agent Architecture Design”

From “Going Solo” to “Team Collaboration”

Section titled “From “Going Solo” to “Team Collaboration””

In the early days of the HagiCode project, we also tried using a single AI Agent to handle every task. We soon discovered a clear bottleneck in that approach: different tasks require different strengths. Some tasks need stronger contextual understanding, while others need more precise code modification capabilities. One Agent has a hard time excelling at everything.

That made us realize that multiple Agents had to work together. But the problem was this: how do you let AI products from different companies coexist peacefully in the same project? We needed to solve several core issues:

  1. Configuration management complexity: each Agent has different configuration methods, API interfaces, and execution modes
  2. Unified communication protocol: we need a standardized way for different Agents to exchange data
  3. Task coordination and division of labor: how do we assign work reasonably so each Agent can play to its strengths

With those questions in mind, we started designing HagiCode’s multi-Agent architecture. It was not actually that complicated; we just had to think it through clearly.

After multiple iterations, this is the architecture we settled on:

┌─────────────────────────────────────────────────────────────────┐
│ AIProviderFactory │
│ (Factory pattern for unified management of all AI Providers) │
├─────────────────────────────────────────────────────────────────┤
│ ClaudeCodeCli │ CodexCli │ CodebuddyCli │ IFlowCli │
│ (Anthropic) │ (OpenAI) │ (Zhipu GLM) │ (Zhipu) │
└─────────────────────────────────────────────────────────────────┘

The core idea is to let different AI Agents be managed by the same set of code through a unified Provider interface. At the same time, the factory pattern is used to dynamically create and configure these Providers, ensuring scalability and flexibility across the system.

It is like division of labor in everyday life. Everyone has their own role; here we simply turned that idea into code architecture.

Agent Types and Division of Responsibilities

Section titled “Agent Types and Division of Responsibilities”

Based on HagiCode’s real-world experience, we assigned different responsibilities to each Agent:

AgentProviderModelPrimary Use
ClaudeCodeCliAnthropicglm-5-turboGenerate technical solutions and Proposals
CodexCliOpenAI/Zedgpt-5.4Execute precise code changes
CodebuddyCliZhipuglm-4.7Refine proposal descriptions and documentation
IFlowCliZhipuglm-4.7Archive proposals and historical records
OpenCodeCli--General-purpose code editing
GitHubCopilotMicrosoft-Assisted programming and code completion

The logic behind this division of labor is simple: every Agent has its own area of strength. Claude Code performs well at understanding and analyzing complex requirements, so it handles early solution design. Codex is more precise when modifying code, so it is better suited for concrete implementation work. CodeBuddy offers strong cost performance, which makes it ideal for refining proposal text and documentation.

After all, the right tool for the right job is the best choice. There are many roads to Rome; some are simply easier to walk than others.

To manage different AI Agents in a unified way, we first need to define a common interface. In HagiCode, that interface looks like this:

public interface IAIProvider
{
// Unified Provider interface
Task<IAIProvider?> GetProviderAsync(AIProviderType providerType);
Task<IAIProvider?> GetProviderAsync(string providerName, CancellationToken cancellationToken);
}

The interface looks simple, but it is the foundation of the entire multi-Agent system. With a unified interface, we can call AI products from different companies in the same way regardless of which company is behind them.

This is really just about making complex things simple. Simple is beautiful, after all.

Once the interface is unified, the next question is how to create these Provider instances. HagiCode uses the factory pattern:

private IAIProvider? CreateProvider(AIProviderType providerType, ProviderConfiguration config)
{
return providerType switch
{
AIProviderType.ClaudeCodeCli =>
ActivatorUtilities.CreateInstance<ClaudeCodeCliProvider>(_serviceProvider, Options.Create(config)),
AIProviderType.CodebuddyCli =>
ActivatorUtilities.CreateInstance<CodebuddyCliProvider>(_serviceProvider, Options.Create(config)),
AIProviderType.CodexCli =>
ActivatorUtilities.CreateInstance<CodexCliProvider>(_serviceProvider, Options.Create(config)),
AIProviderType.IFlowCli =>
ActivatorUtilities.CreateInstance<IFlowCliProvider>(_serviceProvider, Options.Create(config)),
_ => null
};
}

This uses dependency injection through ActivatorUtilities.CreateInstance, which can dynamically create Provider instances at runtime while automatically injecting dependencies. The benefit of this design is that when a new Agent type is added, you only need to add the corresponding Provider class and then add one more case branch in the factory method. There is no need to modify the existing code at all.

That is reason enough. Who wants to rewrite a pile of old code every time a new feature is added?

To make configuration more flexible, we also implemented a type-mapping mechanism:

public static AIProviderTypeExtensions
{
private static readonly Dictionary<string, AIProviderType> _typeMap = new(
StringComparer.OrdinalIgnoreCase)
{
["ClaudeCodeCli"] = AIProviderType.ClaudeCodeCli,
["CodebuddyCli"] = AIProviderType.CodebuddyCli,
["CodexCli"] = AIProviderType.CodexCli,
["IFlowCli"] = AIProviderType.IFlowCli,
// ...more type mappings
};
}

The purpose of this mapping table is to convert string-form Provider names into enum types. This allows configuration files to use intuitive string names, while the internal code uses type-safe enums for processing.

Configuration should be as intuitive as possible. Nobody wants to memorize a pile of complicated code names.

In practice, everything can be configured in appsettings.json:

AI:
Providers:
Providers:
ClaudeCodeCli:
Enabled: true
Model: glm-5-turbo
WorkingDirectory: /path/to/project
CodebuddyCli:
Enabled: true
Model: glm-4.7
CodexCli:
Enabled: true
Model: gpt-5.4
IFlowCli:
Enabled: true
Model: glm-4.7

Each Provider can independently configure parameters such as enablement, model version, and working directory. This design preserves flexibility while remaining easy to manage and maintain.

Configuration files are a bit like life’s options: you can choose to enable or disable certain things. The only difference is that code choices are easier to regret later.

With the unified technical architecture in place, the next step is making multiple Agents work together. HagiCode designed a task flow mechanism so different Agents can handle different stages of the work:

Proposal creation (user)
[Claude Code] ──generate proposal──▶ Proposal document
│ │
│ ▼
│ [Codebuddy] ──refine description──▶ Refined proposal
│ │
│ ▼
│ [Codex] ──execute changes──▶ Code changes
│ │
│ ▼
└──────────────────────▶ [iFlow] ──archive──▶ Historical records

The benefit of this division of labor is that each Agent only needs to focus on the tasks it does best, rather than trying to do everything. Claude Code is responsible for generating proposals from scratch. Codebuddy makes proposal descriptions clearer. Codex turns proposals into actual code changes. iFlow archives and preserves those changes.

This is really just teamwork, much like in everyday life. Everyone has their own role, and only together can something big get done. The only difference is that the team members here happen to be AIs.

In actual operation, we summarized the following lessons:

1. Agent selection strategy matters

Tasks should not be assigned casually; they should be matched to each Agent’s strengths:

  • Proposal generation: use Claude Code, because it has stronger contextual understanding
  • Code execution: use Codex, because it is more precise for code modification
  • Proposal refinement: use Codebuddy, because it offers strong cost performance
  • Archival storage: use iFlow, because it is stable and reliable

After all, putting the right person on the right task is a timeless principle.

2. Configuration isolation ensures stability

Each Agent’s configuration is managed independently, supports environment-variable overrides, and uses separate working directories. As a result, a configuration error in one Agent does not affect the others.

This is like personal boundaries in life. Everyone needs their own space; non-interference makes harmonious coexistence possible.

3. Error-handling mechanism

A failure in a single Agent should not affect the overall workflow. We implemented a fallback strategy: when one Agent fails, the system can automatically switch to a backup plan or skip that step and continue with later tasks. At the same time, complete logging makes troubleshooting easier afterward.

Nobody can guarantee that errors will never happen. The key is how you handle them. Life works much the same way.

4. Monitoring and observability

Through the ACP protocol (our custom communication protocol based on JSON-RPC 2.0), we can track the execution status of each Agent. Session isolation ensures concurrency safety, while dynamic caching improves performance.

The things you cannot see are often the ones most likely to go wrong. Some visibility is always better than flying blind.

After adopting this multi-Agent collaboration configuration, the HagiCode project’s development efficiency improved significantly. Specifically:

  1. Task-handling capacity doubled: in the past, one Agent had to handle many kinds of tasks at once; now tasks can be processed in parallel, and throughput has increased dramatically
  2. More stable output quality: each Agent focuses only on what it does best, so consistency and quality both improve
  3. Lower maintenance cost: unified interfaces and configuration management make the whole system easier to maintain and extend
  4. Adding new Agents is simple: to integrate a new AI product, you only need to implement the interface and add configuration, without changing the core logic

This approach not only solved HagiCode’s own problems, but also proved that multi-Agent collaboration is a viable architectural choice.

The gains were quite noticeable. The process was just a bit of a hassle.

This article shared the HagiCode project’s practical experience with multi-Agent collaboration configuration. The main takeaways include:

  1. Standardized interfaces: IAIProvider unifies the behavior of different Agents, allowing the code to ignore which company’s product is underneath
  2. Factory pattern: ActivatorUtilities.CreateInstance dynamically creates Provider instances, supporting runtime configuration and dependency injection
  3. Protocol unification: the ACP protocol provides standardized communication between Agents through a bidirectional mechanism based on JSON-RPC 2.0
  4. Task routing: assign work reasonably across different Agents so each can play to its strengths, instead of expecting one Agent to do everything

This design not only solves the problem of “multiple Agents fighting each other,” but also uses the adventure party task flow mechanism to make the development process more automated and specialized.

If you are also considering introducing multiple AI assistants, I hope this article gives you some useful reference points. Of course, every project is different, and the specific approach still needs to be adjusted to the actual situation. There is no one-size-fits-all solution; the best solution is the one that fits you.

Beautiful things or people do not need to be possessed. As long as they remain beautiful, simply appreciating that beauty is enough. Technical solutions are the same: the one that suits you is the best one…


If this article was helpful to you, feel free to give the project a Star on GitHub. Your support is what keeps us sharing more. The public beta has already started, and you are welcome to install it and give it a try.


Thank you for reading. If you found this article useful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

How Gamification Design Makes AI Coding More Fun

How Gamification Design Makes AI Coding More Fun

Section titled “How Gamification Design Makes AI Coding More Fun”

Traditional AI coding tools are actually quite powerful; they just lack a bit of warmth. When we were building HagiCode, we thought: if we are going to write code anyway, why not turn it into a game?

Anyone who has used an AI coding assistant has probably had this experience: at first it feels fresh and exciting, but after a while it starts to feel like something is missing. The tool itself is powerful, capable of code generation, autocomplete, and Bug fixes, but… it does not feel very warm, and over time it can become monotonous and dull.

That alone is enough to make you wonder who wants to stare at a cold, impersonal tool every day.

It is a bit like playing a game. If all you do is finish a task list, with no character growth, no achievement unlocks, and no team coordination, it quickly stops being fun. Beautiful things and people do not need to be possessed to be appreciated; their beauty is enough on its own. Programming tools do not even offer that kind of beauty, so it is easy to lose heart.

We ran into exactly this problem while developing HagiCode. As a multi-AI assistant collaboration platform, HagiCode needs to keep users engaged over the long term. But in reality, even a great tool is hard to stick with if it lacks any emotional connection.

To solve this pain point, we made a bold decision: turn programming into a game. Not the superficial kind with a simple points leaderboard, but a true role-playing gamified experience. The impact of that decision may be even bigger than you imagine.

After all, people need a bit of ritual in their lives.

The ideas shared in this article come from our practical experience on the HagiCode project. HagiCode is a multi-AI assistant collaboration platform that supports Claude Code, Codex, Copilot, OpenCode, and other AI assistants working together. If you are interested in multi-AI collaboration or gamified programming, visit github.com/HagiCode-org/site to learn more.

There is nothing especially mysterious about it. We simply turned programming into an adventure.

The essence of gamification is not just “adding a leaderboard.” It is about building a complete incentive system so users can feel growth, achievement, and social recognition while doing tasks.

HagiCode’s gamification design revolves around one core idea: every AI assistant is a “Hero,” and the user is the captain of this Hero team. You lead these Heroes to conquer various “Dungeons” (programming tasks). Along the way, Heroes gain experience, level up, unlock abilities, and your team earns achievements as well.

This is not a gimmick. It is a design grounded in human behavioral psychology. When tasks are given meaning and progress feedback, people’s engagement and persistence increase significantly.

As the old saying goes, “This feeling can become a memory, though at the time it left us bewildered.” We bring that emotional experience into the tool, so programming is no longer just typing code, but a journey worth remembering.

Hero is the core concept in HagiCode’s gamification system. Each Hero represents one AI assistant. For example, Claude Code is a Hero, and Codex is also a Hero.

A Hero has three equipment slots, and the design is surprisingly elegant:

  1. CLI slot (main class): Determines the Hero’s base ability, such as whether it is Claude Code or Codex
  2. Model slot (secondary class): Determines which model is used, such as Claude 4.5 or Claude 4.6
  3. Style slot (style): Determines the Hero’s behavior style, such as “Fengluo Strategist” or another style

The combination of these three slots creates unique Hero configurations. Much like equipment builds in games, you choose the right setup based on the task. After all, what suits you best is what matters most. Life is similar: many roads lead to Rome, but some are smoother than others.

Each Hero has its own XP and level:

type HeroProgressionSnapshot = {
currentLevel: number; // Current level
totalExperience: number; // Total experience
currentLevelStartExperience: number; // Experience at the start of the current level
nextLevelExperience: number; // Experience required for the next level
experienceProgressPercent: number; // Progress percentage
remainingExperienceToNextLevel: number; // Experience still needed for the next level
lastExperienceGain: number; // Most recent experience gained
lastExperienceGainAtUtc?: string | null; // Time when experience was gained
};

Levels are divided into four stages, and each stage has an immersive name:

export const resolveHeroProgressionStage = (level?: number | null): HeroProgressionStage => {
const normalizedLevel = Math.max(1, level ?? 1);
if (normalizedLevel <= 100) return 'rookieSprint'; // Rookie sprint
if (normalizedLevel <= 300) return 'growthRun'; // Growth run
if (normalizedLevel <= 700) return 'veteranClimb'; // Veteran climb
return 'legendMarathon'; // Legend marathon
};

From “rookie” to “legend,” this growth path gives users a clear sense of direction and achievement. It mirrors personal growth in life, from confusion to maturity, only made more tangible here.

To create a Hero, you need to configure three slots:

const heroDraft: HeroDraft = {
name: 'Athena',
icon: 'hero-avatar:storm-03',
description: 'A brilliant strategist',
executorType: AIProviderType.CLAUDE_CODE_CLI,
slots: {
cli: {
id: 'profession-claude-code',
parameters: { /* CLI-related parameters */ }
},
model: {
id: 'secondary-claude-4-sonnet',
parameters: { /* Model-related parameters */ }
},
style: {
id: 'fengluo-strategist',
parameters: { /* Style-related parameters */ }
}
}
};

Every Hero has a unique avatar, description, and professional identity, which gives what would otherwise be a cold AI assistant more personality and warmth. After all, who wants to work with a tool that has no character?

A “Dungeon” is a classic game concept representing a challenge that requires a team to clear. In HagiCode, each workflow is a Dungeon.

Dungeon organizes workflows into different “Dungeons”:

  • Proposal generation dungeon: Responsible for generating technical proposals
  • Proposal execution dungeon: Responsible for executing tasks in proposals
  • Proposal archive dungeon: Responsible for organizing and archiving completed proposals

Each dungeon has its own Captain Hero, and the captain is automatically chosen as the first enabled Hero.

This is really just division of labor, like in everyday life, except turned into a game mechanic.

You can configure different Hero squads for different dungeons:

const dungeonRoster: HeroDungeonRoster = {
scriptKey: 'proposal.generate',
displayName: 'Proposal Generation',
members: [
{ heroId: 'hero-1', name: 'Athena', executorType: 'ClaudeCode' },
{ heroId: 'hero-2', name: 'Apollo', executorType: 'Codex' }
]
};

For example, you can use Athena for generating proposals because it is good at strategy, and Apollo for implementing code because it is good at execution. That way, every Hero can play to its strengths. It is like forming a band: each person has an instrument, and together they create something beautiful.

Dungeon uses fixed scriptKey values to identify different workflows:

// Script keys map to different workflows
const dungeonScripts = {
'proposal.generate': 'Proposal Generation',
'proposal.execute': 'Proposal Execution',
'proposal.archive': 'Proposal Archive'
};

The task state flow is: queued (waiting) -> dispatching (being assigned) -> dispatched (assigned). The whole process is automated and requires no manual intervention. That is also part of our lazy side, because who wants to manage this stuff by hand?

XP is the core feedback mechanism in the gamification system. Users gain XP by completing tasks, XP levels up Heroes, and leveling up unlocks new abilities, forming a positive feedback loop.

In HagiCode, XP can be earned through the following activities:

  • Completing code execution
  • Successfully calling tools
  • Generating proposals
  • Session management operations
  • Project operations

Every time a valid action is completed, the corresponding Hero gains XP. Just like growth in life, every step counts, only here that growth is quantified.

XP and level progress are visualized in real time:

type HeroDungeonMember = {
heroId: string;
name: string;
icon?: string | null;
executorType: PCode_Models_AIProviderType;
currentLevel?: number; // Current level
totalExperience?: number; // Total experience
experienceProgressPercent?: number; // Progress percentage
};

Users can always see each Hero’s level and progress, and that immediate feedback is the key to gamification design. People need feedback, otherwise how would they know they are improving?

Achievements are another important element in gamification. They provide long-term goals and milestone-driven satisfaction.

HagiCode supports multiple types of achievements:

  • Code generation achievements: Generate X lines of code, generate Y files
  • Session management achievements: Complete Z conversations
  • Project operation achievements: Work across W projects

These achievements are really like milestones in life, except we have turned them into a game mechanic.

Achievements have three states:

type AchievementStatus = 'unlocked' | 'in-progress' | 'locked';

The three states have clear visual distinctions:

  • Unlocked: Gold gradient with a halo effect
  • In progress: Blue pulse animation
  • Locked: Gray, with unlock conditions shown

Each achievement clearly displays its trigger condition, so users know what to do next. When people feel lost, a little guidance always helps.

When an achievement is unlocked, a celebration animation is triggered. That kind of positive reinforcement gives users the satisfying feeling of “I did it” and motivates them to keep going. Small rewards in life work the same way: they may be small, but the happiness can last a long time.

Battle Report is one of HagiCode’s signature features. At the end of each day, it generates a full-screen battle-style report.

Battle Report displays the following information:

type HeroBattleReport = {
reportDate: string;
summary: {
totalHeroCount: number; // Total number of Heroes
activeHeroCount: number; // Number of active Heroes
totalBattleScore: number; // Total battle score
mvp: HeroBattleHero; // Most valuable Hero
};
heroes: HeroBattleHero[]; // Detailed data for all Heroes
};
  • Total team score
  • Number of active Heroes
  • Number of tool calls
  • Total working time
  • MVP (Most Valuable Hero)
  • Detailed card for each Hero

The MVP is the best-performing Hero of the day and is highlighted in the report. This is not just data statistics, but a form of honor and recognition. After all, who does not want to be recognized?

Each Hero card includes:

  • Level progress
  • XP gained
  • Number of executions
  • Usage time

These metrics help users clearly understand how the team is performing. Seeing the results of your own effort is satisfying in itself.

HagiCode’s gamification system uses a modern technology stack and design patterns. There is nothing especially magical about it; we just chose tools that fit the job.

// React + TypeScript for the frontend
import React from 'react';
// Framer Motion for animations
import { AnimatePresence, motion } from 'framer-motion';
// Redux Toolkit for state management
import { useAppDispatch, useAppSelector } from '@/store';
// shadcn/ui for UI components
import { Dialog, DialogContent } from '@/components/ui/dialog';

Framer Motion handles all animation effects, shadcn/ui provides the foundational UI components, and Redux Toolkit manages the complex gamification state. Good tools make good work.

HagiCode uses a Glassmorphism + Tech Dark design style:

/* Primary gradient */
background: linear-gradient(135deg, #22C55E 0%, #25c2a0 50%, #06b6d4 100%);
/* Glass effect */
backdrop-filter: blur(12px);
/* Glow effect */
background: radial-gradient(circle at center, rgba(34, 197, 94, 0.15) 0%, transparent 70%);

The green gradient combined with glassmorphism creates a technical, futuristic atmosphere. Visual beauty is part of the user experience too.

Framer Motion is used to create smooth entrance animations:

<motion.div
animate={{ opacity: 1, y: 0 }}
initial={{ opacity: 0, y: 18 }}
transition={{ duration: 0.35, ease: 'easeOut', delay: index * 0.08 }}
className="card"
>
{/* Card content */}
</motion.div>

Each card enters one after another with a delay of 0.08 seconds, creating a fluid visual effect. Smooth animation improves the experience. That part is hard to argue with.

Gamification data is stored using the Grain storage system to ensure state consistency. Even fine-grained data like accumulated Hero XP can be persisted accurately. No one wants to lose the experience they worked hard to earn.

Creating your first Hero is actually quite simple:

  1. Go to the Hero management page
  2. Click the “Create Hero” button
  3. Configure the three slots (CLI, Model, Style)
  4. Give the Hero a name and description
  5. Save it, and your first Hero is born

It is like meeting a new friend: you give them a name, learn what makes them special, and then head off on an adventure together.

Building a team is also simple:

  1. Go to the Dungeon management page
  2. Choose the dungeon you want to configure, such as “Proposal Generation”
  3. Select members from your Hero list
  4. The system automatically selects the first enabled Hero as Captain
  5. Save the configuration

This is simply the process of forming a team, much like building a team in real life where everyone has their own role.

At the end of each day, you can view the day’s Battle Report:

  1. Click the “Battle Report” button
  2. View the day’s work results in a full-screen display
  3. Check the MVP and the detailed data for each Hero
  4. Share it with team members if you want

This is also a kind of ritual, a way to see how much effort you put in today and how far you still are from your goal.

Use React.memo to avoid unnecessary re-renders:

const HeroCard = React.memo(({ hero }: { hero: HeroDungeonMember }) => {
// Component implementation
});

Performance matters too. No one wants to use a laggy tool.

Detect the user’s motion preference settings and provide a simplified experience for motion-sensitive users:

const prefersReducedMotion = useReducedMotion();
const duration = prefersReducedMotion ? 0 : 0.35;

Not everyone likes animation, and respecting user preferences is part of good design.

Keep legacyIds to support migration from older versions:

type HeroDungeonMember = {
heroId: string;
legacyIds?: string[]; // Supports legacy ID mapping
// ...
};

No one wants to lose data just because of a version upgrade.

Use i18n translation keys for all text to make multi-language support easy:

const displayName = t(`dungeon.${scriptKey}`, { defaultValue: displayName });

Language should never be a barrier to using the product.

Gamification is not just a simple points leaderboard, but a complete incentive system. Through the Hero system, Dungeon system, XP and level system, achievement system, and Battle Report, HagiCode transforms programming work into a heroic journey full of adventure.

The core value of this system lies in:

  • Emotional connection: Giving cold AI assistants personality
  • Positive feedback: Every action produces immediate feedback
  • Long-term goals: Levels and achievements provide a growth path
  • Team identity: A sense of collaboration within Dungeon teams
  • Honor and recognition: Battle Report and MVP showcases

Gamification design makes programming no longer dull, but an interesting adventure. While completing coding tasks, users also experience the fun of character growth, team collaboration, and achievement unlocking, which improves retention and activity.

At its core, programming is already an act of creation. We just made the creative process a little more fun.

If this article helped you:


Thank you for reading. If you found this article useful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

ImgBin CLI Tool Design: HagiCode's Image Asset Management Approach

ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach

Section titled “ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach”

This article explains how to build an automatable image asset pipeline from scratch, covering CLI tool design, a Provider Adapter architecture, and metadata management strategies.

Honestly, I did not expect image asset management to keep us tangled up for this long.

During HagiCode development, we ran into a problem that looked simple on the surface but was surprisingly thorny in practice: generating and managing image assets. In a way, it was like the dramas of adolescence - calm on the outside, turbulent underneath.

As the project accumulated more documentation and marketing materials, we needed a large number of supporting images. Some had to be AI-generated, some had to be selected from an existing asset library, and others needed AI recognition plus automatic labeling. The problem was that all of this had long been handled through scattered scripts and manual steps. Every time we generated an image, we had to run a script by hand, organize metadata by hand, and create thumbnails by hand. That alone was annoying enough, but the bigger issue was that everything was scattered everywhere. When we wanted to find something, we could not. When we needed to reuse something, we could not.

The pain points were concrete:

  1. No unified entry point: the logic for image generation was spread across different scripts, so batch execution was basically impossible.
  2. Missing metadata: generated images had no unified metadata.json, which meant no reliable searchability or traceability.
  3. High manual organization cost: titles and tags had to be sorted out one by one by hand, which was inefficient.
  4. No automation: automatically generating visual assets in a CI/CD pipeline? Not a chance.

We did think about just leaving it alone. But projects still need to move forward. Since we could not avoid the problem, we figured we might as well solve it. So we decided to upgrade ImgBin from a set of scattered scripts into an image asset pipeline that can be executed automatically. Some problems, after all, do not disappear just because you look away.

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI coding assistant project that simultaneously maintains multiple components, including a VSCode extension, backend AI services, and a cross-platform desktop client. In a complex, multilingual, cross-platform environment like this, standardized image asset management becomes a key part of improving development efficiency.

You could say this was one of those small growing pains in HagiCode’s journey. Every project has moments like that: a minor issue that looks insignificant, yet somehow manages to take up half the day.

HagiCode’s build system is based on the TypeScript + Node.js ecosystem, so ImgBin naturally adopted the same tech stack to keep the project technically consistent. Once you are used to one stack, switching to something else just feels like unnecessary trouble.


ImgBin uses a layered architecture that cleanly separates CLI commands, application services, third-party API adapters, and the infrastructure layer:

Component hierarchy
├── CLI Entry (cli.ts) Global argument parsing, command routing
├── Commands (commands/*) generate | batch | annotate | thumbnail
├── Application Services job-runner | metadata | thumbnail | asset-writer
├── Provider Adapters image-api-provider | vision-api-provider
└── Infrastructure Layer config | logger | paths | schema

The benefit of this layered design is clear responsibility boundaries. It also makes testing easier because external dependencies can be mocked cleanly. In practice, it just means each layer does its own job without getting in the way of the others, so when something breaks, it is easier to figure out why.

ImgBin uses a model of “one asset, one directory.” Every time an image is generated, it creates a structure like this:

library/
└── 2026-03/
└── orange-dashboard/
├── original.png # Original image
├── thumbnail.webp # 512x512 thumbnail
└── metadata.json # Structured metadata

The advantages of this model are:

  1. Self-contained: all files for a single asset live in the same directory, making migration and backup convenient.
  2. Traceable: metadata.json makes it possible to trace generation time, prompt, model, and other details.
  3. Extensible: if more variants are needed later, such as thumbnails in multiple sizes, we can simply add new files in the same directory.

Beautiful things do not always need to be possessed. Sometimes it is enough that they remain beautiful, and that you can quietly appreciate them. That may sound a little far afield, but the logic still holds here: once images are kept together, they are more pleasant to look at and much easier to find.

metadata.json is the core of the entire system. It uses a layered storage strategy that separates fields into three categories:

{
"schemaVersion": 2,
"assetId": "orange-dashboard",
"slug": "orange-dashboard",
"title": "Orange Dashboard",
"tags": ["dashboard", "hero", "orange"],
"source": { "type": "generated" },
"paths": {
"assetDir": "library/2026-03/orange-dashboard",
"original": "original.png",
"thumbnail": "thumbnail.webp"
},
"generated": {
"prompt": "orange dashboard for docs hero",
"provider": "azure-openai-image-api",
"model": "gpt-image-1.5"
},
"recognized": {
"title": "Orange Dashboard",
"tags": ["dashboard", "ui", "orange"],
"description": "A modern orange dashboard with charts and metrics"
},
"status": {
"generation": "succeeded",
"recognition": "succeeded",
"thumbnail": "succeeded"
},
"timestamps": {
"createdAt": "2026-03-11T04:01:19.570Z",
"updatedAt": "2026-03-11T04:02:09.132Z"
}
}
  • generated: records the original information from image generation, such as the prompt, provider, and model.
  • recognized: stores AI recognition results, such as auto-generated titles, tags, and descriptions.
  • manual: stores manually curated results. Data in this area has the highest priority and will not be overwritten by AI recognition.

This layered strategy resolves one of our earlier core conflicts: when AI recognition and manual curation disagree, which one should win? The answer is manual input. AI recognition is there to assist, not to decide. That question also became clearer over time - machines are still machines, and in the end, people still need to make the call.


Another core part of ImgBin is the Provider Adapter pattern. We abstract external APIs behind a unified interface so that even if we switch AI service providers, we do not need to change the business logic.

In a way, it is a bit like relationships - outward appearances can change, but what matters is that the inner structure stays the same. Once the interface is fixed, the internal implementation can vary freely.

interface ImageGenerationProvider {
// Generate an image and return its Buffer
generate(options: GenerateOptions): Promise<Buffer>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface GenerateOptions {
prompt: string;
model?: string;
size?: '1024x1024' | '1792x1024' | '1024x1792';
quality?: 'standard' | 'hd';
format?: 'png' | 'webp' | 'jpeg';
}
interface VisionRecognitionProvider {
// Recognize image content and return structured metadata
recognize(imageBuffer: Buffer): Promise<RecognitionResult>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface RecognitionResult {
title?: string;
tags: string[];
description?: string;
confidence: number;
}

The advantages of this interface design are:

  1. Testable: in unit tests, we can pass in mock providers instead of making real external API calls.
  2. Extensible: adding a new provider only requires implementing the interface; caller code does not need to change.
  3. Replaceable: production can use Azure OpenAI while testing can use a local model, with configuration being the only thing that changes.

Sometimes project work feels like that too. On the surface it looks like we just swapped an API, but the internal logic remains exactly the same, and that makes the whole thing a lot less scary.


ImgBin provides four core commands to cover different usage scenarios:

Terminal window
# Simplest usage
imgbin generate --prompt "orange dashboard for docs hero"
# Generate a thumbnail and AI annotations at the same time
imgbin generate --prompt "orange dashboard" --annotate --thumbnail
# Specify an output directory
imgbin generate --prompt "orange dashboard" --output ./library

Batch jobs are defined through YAML or JSON manifest files, which makes them suitable for CI/CD workflows:

assets/jobs/launch.yaml
defaults:
annotate: true
thumbnail: true
libraryRoot: ./library
jobs:
- prompt: "orange dashboard hero"
slug: orange-dashboard
tags: [dashboard, hero, orange]
- prompt: "pricing grid for docs"
slug: pricing-grid
tags: [pricing, grid, docs]

Run the command:

Terminal window
imgbin batch assets/jobs/launch.yaml

The batch job design supports failure isolation: items in the manifest are processed one by one, and a failure in one item does not affect the others. You can also preview the job with --dry-run without actually executing it.

And the best part is that it tells you exactly what succeeded and what failed. Unlike some things in life, where failure happens and you are left not even knowing how it happened.

Run AI recognition on existing images to automatically generate titles, tags, and descriptions:

Terminal window
# Annotate a single image
imgbin annotate ./library/2026-03/orange-dashboard
# Annotate an entire directory in batch
imgbin annotate ./library/2026-03/

Generate thumbnails for existing images:

Terminal window
# Generate a thumbnail
imgbin thumbnail ./library/2026-03/orange-dashboard

The manifest format for batch jobs supports flexible configuration. Defaults can be set globally, and individual jobs can override them:

# Global defaults
defaults:
annotate: true # Enable AI annotation by default
thumbnail: true # Generate thumbnails by default
libraryRoot: ./library
model: gpt-image-1.5
jobs:
# Minimal configuration: only provide a prompt
- prompt: "first image"
# Full configuration
- prompt: "second image"
slug: custom-slug
tags: [tag1, tag2]
annotate: false # Do not run AI annotation for this job
model: dall-e-3 # Use a different model for this job

When executed, ImgBin processes jobs one by one. The result of each job is written to its corresponding metadata.json. Even if one job fails, the others are unaffected. After all jobs complete, the CLI outputs a summary report:

✓ orange-dashboard (succeeded)
✓ pricing-grid (succeeded)
✗ hero-banner (failed: API rate limit exceeded)
2/3 succeeded, 1 failed

Some things cannot be rushed. Taking them one at a time is often the steadier path. Maybe that is the philosophy behind batch jobs.


ImgBin supports flexible configuration through environment variables:

Terminal window
# ImgBin working directory
IMGBIN_WORKDIR=/path/to/imgbin
# Executable path (for invocation inside scripts)
IMGBIN_EXECUTABLE=/path/to/imgbin/dist/cli.js
# Asset library root
IMGBIN_LIBRARY_ROOT=./.imgbin-library
# Azure OpenAI configuration (if using the Azure provider)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=***
AZURE_OPENAI_IMAGE_DEPLOYMENT=gpt-image-1

Configuration is one of those things that can feel both important and not that important at the same time. In the end, whatever feels comfortable and fits your workflow best is usually the right choice.


During implementation, we summarized a few key points:

Interface definitions should be clear and complete, including input parameters, return values, and error handling. It is also a good idea to provide both synchronous and asynchronous invocation styles for different scenarios.

That is one small piece of hard-earned experience. Once an interface is set, nobody wants to keep changing it later.

When one item fails in a batch job, the CLI should:

  1. Write detailed error information to a separate log file.
  2. Continue executing other jobs instead of interrupting the whole process.
  3. Return a non-zero exit code at the end to indicate that some jobs failed.
  4. Clearly display the execution result of every job in the summary report.

Some failures are just failures. There is no point pretending otherwise. It is better to acknowledge them openly and then figure out how to solve them. The same logic applies to projects and to life.

Recognition results are written to the recognized section by default, while manually edited fields are marked in manual. Metadata updates follow an append-only strategy: unless --force is explicitly passed, existing manually curated results are not overwritten.

That point became clear too - some things, once overwritten, are just gone. It is often better to preserve them, because the record itself has value.

Use fs.mkdir({ recursive: true }) to ensure directory creation remains atomic and to avoid race conditions in concurrent scenarios.

Maybe that is what security feels like - being stable when stability matters, moving fast when speed matters, and never getting stuck second-guessing.


As the core tool for image asset management in the HagiCode project, ImgBin solves our problems through the following design choices:

  1. Unified entry point: the CLI covers generation, annotation, thumbnails, and all other core operations.
  2. Metadata-driven: every asset has a complete metadata.json, enabling search and traceability.
  3. Provider Adapter: flexible abstraction for external APIs, making testing and extension easier.
  4. Batch job support: batch image generation can be automated within CI/CD workflows.

Everything else may have faded, but this approach really did end up proving useful.

This solution not only improves HagiCode’s own development efficiency, but also forms a reusable framework for image asset management. If you are building a similarly multi-component project, I believe ImgBin’s design ideas may give you some inspiration.

Youth is all about trying things and making a bit of a mess. If you never put yourself through that, how would you know what you are really capable of?



Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was produced with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

Primary profession management in hero settings

Primary profession management in hero settings

Section titled “Primary profession management in hero settings”
  • Hero settings now include a dedicated Primary Professions tab to toggle availability.
  • Enablement is persisted at the system level and gates hero availability in dungeon selection and status checks.
  • CLI detection surfaces availability and version; enablement toggles stay locked until the CLI is detected.

Practical Guide to Integrating CodeBuddy CLI into a C# Backend

Practical Guide to Integrating CodeBuddy CLI into a C# Backend

Section titled “Practical Guide to Integrating CodeBuddy CLI into a C# Backend”

This article walks through a complete approach to integrating CodeBuddy CLI into a C# backend project so you can deliver AI coding assistant capabilities end to end.

In modern AI coding assistant development, a single AI Provider often cannot satisfy complex and changing development scenarios. HagiCode, as a multifunctional AI coding assistant, needs to support multiple AI Providers to deliver a better user experience. Users should have enough freedom to choose. In early 2026, the project faced a key decision: how to restore CodeBuddy ACP (Agent Communication Protocol) integration capabilities in the C# backend.

The project had previously implemented CodeBuddy integration, but the related code was removed during a refactor. There is not much to complain about there; during iterative development, something always gets left behind. The goal of this technical solution was to fully restore that capability and improve the architecture so it would be more robust and maintainable.

If you are also considering connecting multiple AI coding assistants to your own project, the approach below may give you some ideas. It reflects lessons we summarized after stepping into plenty of pitfalls, and maybe it can help you avoid a few detours.

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project that supports multiple AI Providers and cross-platform operation. To satisfy different user preferences, we need to switch flexibly among different AI coding assistants, which is exactly why we built the CodeBuddy integration described here.

HagiCode uses a modular design, with AI Providers implemented as pluggable components. This architecture lets us add new AI support easily without affecting existing features. When a design is done well up front, it saves a lot of trouble later. If you are interested in our technical architecture, you can view the full source code on GitHub.

The integration between C# and CodeBuddy uses a clear layered architecture. This design makes responsibilities explicit and makes long-term maintenance much easier:

┌─────────────────────────────────────────────┐
│ Provider Contract Layer │
│ AIProviderType enum + extension methods │
├─────────────────────────────────────────────┤
│ Provider Factory Layer │
│ AIProviderFactory dependency injection factory │
├─────────────────────────────────────────────┤
│ Provider Implementation Layer │
│ CodebuddyCliProvider concrete implementation │
├─────────────────────────────────────────────┤
│ ACP Infrastructure Layer │
│ ACPSessionManager / StdioAcpTransport │
│ AcpRpcClient / AcpAgentClient │
└─────────────────────────────────────────────┘

What are the benefits of this layering? Put simply, each layer stays out of the others’ way. If we later want to change the communication mechanism, for example from stdio to WebSocket, we only need to modify the bottom layer, and the business logic above it stays untouched. Nobody wants a communication change to ripple through the entire codebase.

The Provider contract layer is the foundation of the entire architecture. We define the AIProviderType enum, where CodebuddyCli = 3 is used as the enum value, and implement bidirectional mapping between strings and enums through extension methods. That allows strings in configuration files to be converted conveniently into enums, and enums to be converted back to strings for debugging output.

The Provider factory layer is responsible for creating the corresponding Provider instance based on configuration. It uses .NET dependency injection together with ActivatorUtilities.CreateInstance for dynamic creation. The advantage of the factory pattern is that when adding a new Provider, you only need to add the creation logic instead of modifying existing code.

The Provider implementation layer is where the actual work happens. CodebuddyCliProvider implements the IAIProvider interface and provides two invocation modes: ExecuteAsync for non-streaming calls and StreamAsync for streaming calls.

The ACP infrastructure layer provides the communication foundation underneath. This layer handles all protocol details, including process management, message serialization, and response parsing. It is the foundation that keeps everything above it stable.

CodeBuddy uses Stdio (standard input/output) to communicate with external processes. The startup command is simple:

Terminal window
codebuddy --acp

After that, JSON-RPC messages are exchanged through standard input and output. This approach has several advantages:

  1. Fast startup: local process communication avoids network latency
  2. Simple configuration: you only need to specify the executable path
  3. Environment isolation: each session runs in an independent process, so they do not affect one another

Environment variable injection is supported during communication. Common examples include:

  • CODEBUDDY_API_KEY: API key authentication
  • CODEBUDDY_INTERNET_ENVIRONMENT: network environment configuration

As with communication between people, it helps to choose a convenient channel first.

ACP is based on JSON-RPC 2.0. The message format looks roughly like this:

// Request message
{
"jsonrpc": "2.0",
"id": 1,
"method": "agent/prompt",
"params": {
"prompt": "Help me write a sorting algorithm",
"sessionId": "session-123"
}
}
// Response message
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": "Here is the AI response..."
}
}

In the real implementation, we encapsulate all of these protocol details so the upper business layer only needs to care about the prompt and response.

First, restore the CodeBuddy type in the enum file:

PCode.Models/AIProviderType.cs
public enum AIProviderType
{
ClaudeCodeCli = 0,
CodexCli = 1,
GitHubCopilot = 2,
CodebuddyCli = 3, // Restore this enum value
OpenCodeCli = 4,
IFlowCli = 5,
}

Then add string mapping in the extension methods so the configuration file can specify the Provider by string:

AIProviderTypeExtensions.cs
private static readonly Dictionary<string, AIProviderType> _typeMap = new(
StringComparer.OrdinalIgnoreCase)
{
["CodebuddyCli"] = AIProviderType.CodebuddyCli,
["Codebuddy"] = AIProviderType.CodebuddyCli,
["codebuddy"] = AIProviderType.CodebuddyCli,
// ... Mappings for other providers
};

Add a CodeBuddy creation branch in the factory class:

AIProviderFactory.cs
private IAIProvider? CreateProvider(AIProviderType providerType, ProviderConfiguration config)
{
return providerType switch
{
AIProviderType.CodebuddyCli =>
ActivatorUtilities.CreateInstance<CodebuddyCliProvider>(
_serviceProvider,
Options.Create(config)),
// ... Other providers
_ => throw new NotSupportedException($"Provider {providerType} not supported")
};
}

This uses dependency injection through ActivatorUtilities, which automatically handles constructor parameter injection and is very convenient.

Below is the core implementation of CodebuddyCliProvider, covering both streaming and non-streaming invocation modes:

public class CodebuddyCliProvider : IAIProvider
{
private readonly ILogger<CodebuddyCliProvider> _logger;
private readonly IACPSessionManager _sessionManager;
private readonly ProviderConfiguration _config;
public string Name => "CodebuddyCli";
public bool SupportsStreaming => true;
public ProviderCapabilities Capabilities { get; }
public CodebuddyCliProvider(
ILogger<CodebuddyCliProvider> logger,
IACPSessionManager sessionManager,
IOptions<ProviderConfiguration> config)
{
_logger = logger;
_sessionManager = sessionManager;
_config = config.Value;
// Define the capabilities of the current Provider
Capabilities = new ProviderCapabilities
{
SupportsStreaming = true,
SupportsTools = true,
SupportsSystemMessages = true,
SupportsArtifacts = false,
MaxTokens = 8192
};
}
// Non-streaming call: return all results together after completion
public async Task<AIResponse> ExecuteAsync(
AIRequest request,
CancellationToken cancellationToken = default)
{
// Create an independent session for the request
var session = await _sessionManager.CreateSessionAsync(
"CodebuddyCli",
request.WorkingDirectory,
cancellationToken,
request.SessionId);
try
{
var fullPrompt = BuildPrompt(request);
await session.SendPromptAsync(fullPrompt, cancellationToken);
var responseBuilder = new StringBuilder();
var toolCalls = new List<AIToolCall>();
// Collect all response chunks
await foreach (var chunk in StreamFromSession(session, cancellationToken))
{
if (!string.IsNullOrEmpty(chunk.Content))
{
responseBuilder.Append(chunk.Content);
}
// Handle tool calls...
}
return new AIResponse
{
Content = AIResultContentSanitizer.SanitizeResultContent(
responseBuilder.ToString()),
ToolCalls = toolCalls,
Provider = Name,
Model = string.Empty
};
}
finally
{
// Release session resources
await session.DisposeAsync();
}
}
// Streaming call: return response chunks in real time
public async IAsyncEnumerable<AIStreamingChunk> StreamAsync(
AIRequest request,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var session = await _sessionManager.CreateSessionAsync(
"CodebuddyCli",
request.WorkingDirectory,
cancellationToken);
try
{
var fullPrompt = BuildPrompt(request);
await session.SendPromptAsync(fullPrompt, cancellationToken);
await foreach (var chunk in StreamFromSession(session, cancellationToken))
{
yield return chunk;
}
}
finally
{
await session.DisposeAsync();
}
}
private async IAsyncEnumerable<AIStreamingChunk> StreamFromSession(
IACPSession session,
[EnumeratorCancellation] CancellationToken cancellationToken)
{
// Iterate through all updates in the session
await foreach (var notification in session.ReceiveUpdatesAsync(cancellationToken))
{
switch (notification.Update)
{
case AgentMessageChunkSessionUpdate agentMessage:
// Handle text content chunks
if (agentMessage.Content is AcpImp.TextContentBlock textContent)
{
yield return new AIStreamingChunk
{
Content = textContent.Text,
Type = StreamingChunkType.ContentDelta,
IsComplete = false
};
}
break;
case ToolCallSessionUpdate toolCall:
// Handle tool calls
yield return new AIStreamingChunk
{
Content = string.Empty,
Type = StreamingChunkType.ToolCallDelta,
ToolCallDelta = new AIToolCallDelta
{
Id = toolCall.ToolCallId,
Name = toolCall.Kind.ToString(),
Arguments = toolCall.RawInput?.ToString()
}
};
break;
case AcpImp.PromptCompletedSessionUpdate:
// Response complete
yield break;
}
}
}
// Build the full prompt
private string BuildPrompt(AIRequest request, string? embeddedCommandPrompt = null)
{
var sb = new StringBuilder();
// Embedded command prompt, if present
if (!string.IsNullOrEmpty(embeddedCommandPrompt))
{
sb.AppendLine(embeddedCommandPrompt);
sb.AppendLine();
}
// System message
if (!string.IsNullOrEmpty(request.SystemMessage))
{
sb.AppendLine(request.SystemMessage);
sb.AppendLine();
}
// User prompt
sb.Append(request.Prompt);
return sb.ToString();
}
}

There are several key points in this code:

  1. Session management: each request creates an independent session and releases resources after the request completes. This is a lesson learned through trial and error. If session reuse is not handled well, state pollution appears easily.

  2. Streaming processing: IAsyncEnumerable allows the response to be returned while it is still being generated, instead of waiting for all content to finish. This is especially important for long-text scenarios and significantly improves the user experience.

  3. Tool calls: CodeBuddy supports tool calling (Function Calling), handled through ToolCallSessionUpdate. This capability is critical for complex code editing tasks.

  4. Content filtering: AIResultContentSanitizer is used to filter Think block content and keep the output clean.

Add the related services during module registration:

PCodeClaudeHelperModule.cs
public void ConfigureModule(IServiceCollection context)
{
// Register Provider
context.Services.AddTransient<CodebuddyCliProvider>();
// Register ACP infrastructure
context.Services.AddSingleton<IACPSessionManager, ACPSessionManager>();
context.Services.AddSingleton<IAcpPlatformConfigurationResolver, AcpPlatformConfigurationResolver>();
context.Services.AddSingleton<IAIRequestToAcpMapper, AIRequestToAcpMapper>();
context.Services.AddSingleton<IAcpToAIResponseMapper, AcpToAIResponseMapper>();
}

Add CodeBuddy-related configuration to appsettings.json:

AI:
# Default Provider to use
DefaultProvider: "CodebuddyCli"
# Provider configuration
Providers:
CodebuddyCli:
Type: "CodebuddyCli"
WorkingDirectory: "C:/projects/my-app"
ExecutablePath: "C:/tools/codebuddy.cmd"
# Platform-specific configuration
PlatformConfigurations:
CodebuddyCli:
ExecutablePath: "C:/tools/codebuddy.cmd"
Arguments: "--acp"
StartupTimeoutMs: 5000
EnvironmentVariables:
CODEBUDDY_API_KEY: "${CODEBUDDY_API_KEY}"
CODEBUDDY_INTERNET_ENVIRONMENT: "production"

The corresponding configuration model definition:

public class CodebuddyPlatformConfiguration : IAcpPlatformConfiguration
{
public string ProviderName => "CodebuddyCli";
public AcpTransportType TransportType => AcpTransportType.Stdio;
public string ExecutablePath { get; set; } = "codebuddy";
public string Arguments { get; set; } = "--acp";
public int StartupTimeoutMs { get; set; } = 5000;
public Dictionary<string, string?>? EnvironmentVariables { get; set; }
}

We ran into several typical pitfalls during implementation, and sharing them here may help others avoid the same detours:

  1. Session leak issue: at first, sessions were not released correctly, which exhausted process resources. The solution was to use try-finally to ensure resources are released for every request.

  2. Environment variable passing: Windows and Linux use different environment variable syntax, so we later standardized on Dictionary<string, string?> to handle this.

  3. Timeout configuration: CLI startup takes time, so we set a 5-second startup timeout to avoid fast request failures.

  4. Encoding issues: on Windows, the default encoding may cause garbled Chinese text, so UTF-8 encoding is explicitly specified when starting the process.

  1. Session pool: for frequent short requests, consider implementing a session pool to reuse processes
  2. Connection cache: the factory class already supports caching Provider instances
  3. Async first: use asynchronous programming throughout to avoid blocking threads

Performance is always worth optimizing. The longer users wait, the worse the experience becomes.

This article introduced a complete solution for integrating CodeBuddy CLI into a C# backend, covering the entire process from architecture design to concrete implementation. Through a layered architecture, we separate protocol details from business logic, making the code clearer and easier to maintain.

Key takeaways:

  • Use a layered architecture with a Provider contract layer, factory layer, implementation layer, and infrastructure layer
  • Use JSON-RPC over Stdio for inter-process communication
  • Implement flexible configuration and extensibility through dependency injection
  • Provide both streaming and non-streaming invocation modes

This approach is not only suitable for CodeBuddy; adding new AI Providers can follow the same pattern. If you are also building a similar multi-AI-Provider integration, I hope this article gives you a useful reference.



If this article helped you:

Practical Multi-AI Provider Architecture in the HagiCode Platform

Practical Multi-AI Provider Architecture in the HagiCode Platform

Section titled “Practical Multi-AI Provider Architecture in the HagiCode Platform”

This article shares the technical approach we used under the Orleans Grain architecture to integrate two AI tools, iflow and OpenCode, through a unified IAIProvider interface, and compares the implementation differences between WebSocket and HTTP communication in detail.

There is nothing especially mysterious about it. While building HagiCode, we ran into a very practical problem: users wanted to work with different AI tools. That is hardly surprising, since everyone has their own habits. Some prefer Claude Code, some love GitHub Copilot, and some teams use tools they developed themselves.

Our initial solution was simple and direct: write dedicated integration code for each AI tool. But the drawbacks showed up quickly. The codebase filled up with if-else branches, every change required testing in multiple places, and every new tool meant writing another pile of logic from scratch.

Later, I realized it would be better to create a unified IAIProvider interface and abstract the capabilities shared by all AI providers. That way, no matter which tool is used underneath, the upper layers can call it in the same way.

Recently, the project needed to integrate two new tools: iflow and OpenCode. Both support the ACP protocol, but their communication styles are different. iflow uses WebSocket, while OpenCode uses an HTTP API. That became a useful architectural test: adapt two different transport modes behind one unified interface.

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI-assisted development platform built on the Orleans Grain architecture. It integrates with different AI providers through a unified IAIProvider interface, allowing users to flexibly choose the AI tools they prefer.

First, we defined the IAIProvider interface and abstracted the capabilities that every AI provider needs to implement:

public interface IAIProvider
{
string Name { get; }
bool SupportsStreaming { get; }
ProviderCapabilities Capabilities { get; }
Task<AIResponse> ExecuteAsync(AIRequest request, CancellationToken cancellationToken = default);
IAsyncEnumerable<AIStreamingChunk> StreamAsync(AIRequest request, CancellationToken cancellationToken = default);
Task<ProviderTestResult> PingAsync(CancellationToken cancellationToken = default);
IAsyncEnumerable<AIStreamingChunk> SendMessageAsync(AIRequest request, string? embeddedCommandPrompt = null, CancellationToken cancellationToken = default);
}

This interface includes several key methods:

  • ExecuteAsync: execute a one-shot AI request
  • StreamAsync: get streaming responses for real-time display
  • PingAsync: perform a health check to verify whether the provider is available
  • SendMessageAsync: send a message with support for embedded commands

IFlowCliProvider: A WebSocket-Based Implementation

Section titled “IFlowCliProvider: A WebSocket-Based Implementation”

iflow uses WebSocket for ACP communication. The overall architecture looks like this:

IFlowCliProvider → ACPSessionManager → WebSocketAcpTransport → iflow CLI
Dynamic port allocation + process management

The core flow is also fairly straightforward:

  1. ACPSessionManager creates and manages ACP sessions.
  2. WebSocketAcpTransport handles WebSocket communication.
  3. A port is allocated dynamically, and the iflow process is started with iflow --experimental-acp --port.
  4. IAIRequestToAcpMapper and IAcpToAIResponseMapper convert requests and responses.

Here is the core code:

private async IAsyncEnumerable<AIStreamingChunk> StreamCoreAsync(
AIRequest request,
string? embeddedCommandPrompt,
[EnumeratorCancellation] CancellationToken cancellationToken)
{
// Resolve working directory
var resolvedWorkingDirectory = ResolveWorkingDirectory(request);
var effectiveRequest = ApplyEmbeddedCommandPrompt(request, embeddedCommandPrompt);
// Create ACP session
await using var session = await _sessionManager.CreateSessionAsync(
Name,
resolvedWorkingDirectory,
cancellationToken,
request.SessionId);
// Send prompt
var prompt = _requestMapper.ToPromptString(effectiveRequest);
var promptResponse = await session.SendPromptAsync(prompt, cancellationToken);
// Receive streaming response
await foreach (var notification in session.ReceiveUpdatesAsync(cancellationToken))
{
if (_responseMapper.TryConvertToStreamingChunk(notification, out var chunk))
{
if (chunk.Type == StreamingChunkType.Metadata && chunk.IsComplete)
{
yield return chunk;
yield break;
}
yield return chunk;
}
}
}

There are a few design points worth calling out here:

  • Use await using to ensure the session is released correctly and avoid resource leaks.
  • Return streaming responses through IAsyncEnumerable, which naturally supports async streams.
  • Use Metadata chunks to determine completion and ensure the full response has been received.

OpenCodeCliProvider: An HTTP API-Based Implementation

Section titled “OpenCodeCliProvider: An HTTP API-Based Implementation”

OpenCode provides its service through an HTTP API, so the architecture is slightly different:

OpenCodeCliProvider → OpenCodeRuntimeManager → OpenCodeClient → OpenCode HTTP API
OpenCodeProcessManager → opencode process management

A notable feature of OpenCode is that it uses an SQLite database to persist session bindings. That makes session recovery and prompt-response recovery possible:

private async Task<OpenCodePromptExecutionResult> ExecutePromptAsync(
AIRequest request,
string? embeddedCommandPrompt,
CancellationToken cancellationToken)
{
var prompt = BuildPrompt(request, embeddedCommandPrompt);
var resolvedWorkingDirectory = ResolveWorkingDirectory(request.WorkingDirectory);
var client = await _runtimeManager.GetClientAsync(resolvedWorkingDirectory, cancellationToken);
var bindingSessionId = request.SessionId;
var boundSession = TryGetBinding(bindingSessionId, resolvedWorkingDirectory);
// Try to use the already bound session
if (boundSession is not null)
{
try
{
return await PromptSessionAsync(
client,
boundSession,
BuildPromptRequest(request, prompt, CreatePromptMessageId()),
request.Model ?? _settings.Model,
cancellationToken);
}
catch (OpenCodeApiException ex) when (IsStaleBinding(ex))
{
// The session has expired, remove the binding
RemoveBinding(bindingSessionId);
}
}
// Create a new session
var session = await client.Session.CreateAsync(new OpenCodeSessionCreateRequest
{
Title = BuildSessionTitle(request)
}, cancellationToken);
BindSession(bindingSessionId, session.Id, resolvedWorkingDirectory);
return await PromptSessionAsync(client, session.Id, ...);
}

This implementation has several interesting highlights:

  • Session binding mechanism: the same SessionId reuses the same OpenCode session, avoiding repeated session creation.
  • Expiration handling: when a session is found to be expired, the binding is automatically cleaned up.
  • Database persistence: bindings are stored in SQLite and remain effective after restart.
AspectIFlowCliProviderOpenCodeCliProvider
CommunicationWebSocket (ACP)HTTP API
Process managementACPSessionManagerOpenCodeProcessManager
Port allocationDynamic portNo port (uses HTTP)
Session managementACPSessionOpenCodeSession
PersistenceIn-memory cacheSQLite database
Startup commandiflow --experimental-acp --portopencode
LatencyLower (long-lived connection)Relatively higher (HTTP requests)

Which approach you choose depends mainly on your needs. WebSocket is better for scenarios with high real-time requirements, while an HTTP API is simpler and easier to debug.

First, enable the two providers in the configuration file:

AI:
Providers:
IFlowCli:
Type: "IFlowCli"
Enabled: true
ExecutablePath: "iflow"
Model: null
WorkingDirectory: null
OpenCodeCli:
Type: "OpenCodeCli"
Enabled: true
ExecutablePath: "opencode"
Model: "anthropic/claude-sonnet-4"
WorkingDirectory: null
OpenCode:
Enabled: true
BaseUrl: "http://localhost:38376"
ExecutablePath: "opencode"
StartupTimeoutSeconds: 30
RequestTimeoutSeconds: 120
// Get provider through the factory
var provider = await _providerFactory.GetProviderAsync(AIProviderType.IFlowCli);
// Execute an AI request
var request = new AIRequest
{
Prompt = "请帮我重构这个函数",
WorkingDirectory = "/path/to/project",
Model = "claude-sonnet-4"
};
// Get the complete response
var response = await provider.ExecuteAsync(request, cancellationToken);
Console.WriteLine(response.Content);
// Or use streaming responses
await foreach (var chunk in provider.StreamAsync(request, cancellationToken))
{
if (chunk.Type == StreamingChunkType.ContentDelta)
{
Console.Write(chunk.Content);
}
}
// Get provider through the factory
var provider = await _providerFactory.GetProviderAsync(AIProviderType.OpenCodeCli);
var request = new AIRequest
{
Prompt = "请帮我分析这个错误",
WorkingDirectory = "/path/to/project",
Model = "anthropic/claude-sonnet-4"
};
var response = await provider.ExecuteAsync(request, cancellationToken);
Console.WriteLine(response.Content);

Before startup or before use, you can check whether the provider is available:

var iflowResult = await iflowProvider.PingAsync(cancellationToken);
if (!iflowResult.Success)
{
Console.WriteLine($"IFlow is unavailable: {iflowResult.ErrorMessage}");
return;
}
var openCodeResult = await openCodeProvider.PingAsync(cancellationToken);
if (!openCodeResult.Success)
{
Console.WriteLine($"OpenCode is unavailable: {openCodeResult.ErrorMessage}");
return;
}

Both providers support embedded commands, such as /file:xxx:

var request = new AIRequest
{
Prompt = "分析这个文件的问题",
SystemMessage = "你是一个代码分析专家"
};
await foreach (var chunk in provider.SendMessageAsync(
request,
embeddedCommandPrompt: "/file:src/main.cs",
cancellationToken))
{
Console.Write(chunk.Content);
}

IFlow uses long-lived WebSocket connections, so resource management deserves special attention:

  • Use await using to ensure sessions are released properly.
  • Cancellation triggers process cleanup.
  • ACPSessionManager supports a maximum session count limit.

OpenCode process management is relatively simpler, and OpenCodeRuntimeManager handles it automatically.

Both providers have complete error handling:

  • IFlow errors are propagated through ACP session updates.
  • OpenCode errors are thrown through OpenCodeApiException.
  • It is recommended that the caller catch and handle these exceptions.
  • IFlow WebSocket communication has lower latency than HTTP.
  • OpenCode session reuse can reduce the overhead of HTTP requests.
  • The factory cache mechanism avoids repeatedly creating providers.
  • In high-concurrency scenarios, pay close attention to the limits on process count and connection count.

The executable path is validated at startup, but runtime issues can still happen. PingAsync is a useful tool for verifying whether the configuration is correct:

// Check at startup
var provider = await _providerFactory.GetProviderAsync(providerType);
var result = await provider.PingAsync(cancellationToken);
if (!result.Success)
{
_logger.LogError("Provider {ProviderType} is unavailable: {Error}", providerType, result.ErrorMessage);
}

This article shares the technical approach used by the HagiCode platform when integrating the two AI tools iflow and OpenCode. Through a unified IAIProvider interface, we adapted different communication styles, WebSocket and HTTP, while keeping the upper-layer calling pattern consistent.

The core idea is actually quite simple:

  1. Define a unified interface abstraction.
  2. Build adapter layers for different implementations.
  3. Manage everything uniformly through the factory pattern.

That gives the system good extensibility. When a new AI tool needs to be integrated later, all we need to do is implement the IAIProvider interface without changing too much existing code.

If you are also working on multi-AI-tool integration, I hope this article is helpful.


If this article helped you:

Complete Guide to Codex SDK Console Message Parsing

Complete Guide to Codex SDK Console Message Parsing

Section titled “Complete Guide to Codex SDK Console Message Parsing”

This article explains the Codex SDK event stream mechanism, message type parsing, and best practices in real projects, helping developers quickly master the core skills behind AI execution services.

When building an AI execution service based on the Codex SDK, we inevitably run into a practical question: how should we handle the streamed event messages returned by Codex? These messages contain important information such as execution status, output content, and error details, so they deserve careful handling.

As part of the HagiCode project, we needed a reliable executor for AI coding assistant scenarios. That is exactly why we decided to study the Codex SDK event stream mechanism in depth. After all, only by understanding how the underlying messages work can we build a truly enterprise-grade AI execution platform.

The Codex SDK is a programming-assistance SDK released by OpenAI. It returns execution results through an Event Stream. Unlike the traditional request-response model, Codex uses streamed events so that we can:

  • Get execution progress in real time
  • Handle errors promptly
  • Obtain detailed token usage statistics
  • Support long-running complex tasks

Understanding these event types and parsing them correctly is essential for implementing a fully capable AI executor. In the end, nobody wants to work with a black box.

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project dedicated to providing developers with intelligent coding support. During development, we needed to build a reliable AI execution service to handle user code execution requests, which is the direct reason we introduced the Codex SDK.

As an AI coding assistant, HagiCode needs to deal with a variety of complex code execution scenarios: getting execution progress in real time, handling errors promptly, and collecting detailed token usage statistics. By deeply understanding the Codex SDK event stream mechanism, we can build an executor that meets production environment requirements. Ultimately, whether it is software or real life, everything benefits from steady accumulation and refinement.

The Codex SDK uses the thread.runStreamed() method to return an asynchronous event iterator:

import { Codex } from '@openai/codex-sdk';
const client = new Codex({
apiKey: process.env.CODEX_API_KEY,
baseUrl: process.env.CODEX_BASE_URL,
});
const thread = client.startThread({
workingDirectory: '/path/to/project',
skipGitRepoCheck: false,
});
const { events } = await thread.runStreamed('your prompt here', {
outputSchema: {
type: 'object',
properties: {
output: { type: 'string' },
status: { type: 'string', enum: ['ok', 'action_required'] },
},
required: ['output', 'status'],
},
});
for await (const event of events) {
// Handle each event
}
Event TypeDescriptionKey Data
thread.startedThread started successfullythread_id
item.updatedMessage content updateditem.text
item.completedMessage completeditem.text
turn.completedExecution completedusage (token usage)
turn.failedExecution failederror.message
errorError eventmessage

In real projects, HagiCode’s executor component is built on top of these event types. We need to handle each kind of event carefully to ensure a smooth user experience. Good systems are built by taking details seriously.

Message content is extracted through an event handler:

private handleThreadEvent(event: ThreadEvent, onMessage: (content: string) => void): void {
// Only handle message update and completion events
if (event.type !== 'item.updated' && event.type !== 'item.completed') {
return;
}
// Only handle agent message content
if (event.item.type !== 'agent_message') {
return;
}
// Extract text content
onMessage(event.item.text);
}

Key points:

  • Only handle item.updated and item.completed events
  • Only handle content of type agent_message
  • The message content is in the event.item.text field

Codex supports JSON structured output. You can specify the return format through the outputSchema parameter:

const DEFAULT_OUTPUT_SCHEMA = {
type: 'object',
properties: {
output: { type: 'string' },
status: { type: 'string', enum: ['ok', 'action_required'] },
},
required: ['output', 'status'],
additionalProperties: false,
} as const;

The parsing function attempts to parse JSON, and if that fails it falls back to the raw text.

function toStructuredOutput(raw: string): StructuredOutput {
try {
const parsed = JSON.parse(raw) as Partial<StructuredOutput>;
if (typeof parsed.output === 'string') {
return {
output: parsed.output,
status: parsed.status === 'action_required' ? 'action_required' : 'ok',
};
}
} catch {
// JSON parsing failed, fall back to the raw text
}
return {
output: raw,
status: 'ok',
};
}
private async runWithStreaming(
thread: Thread,
input: CodexStageExecutionInput
): Promise<{ output: string; usage: Usage | null }> {
const abortController = new AbortController();
const timeoutHandle = setTimeout(() => {
abortController.abort();
}, Math.max(1000, input.timeoutMs));
let latestMessage = '';
let usage: Usage | null = null;
let emittedLength = 0;
try {
const { events } = await thread.runStreamed(input.prompt, {
outputSchema: DEFAULT_OUTPUT_SCHEMA,
signal: abortController.signal,
});
for await (const event of events) {
// Handle message content
this.handleThreadEvent(event, (nextContent) => {
const delta = nextContent.slice(emittedLength);
if (delta.length > 0) {
emittedLength = nextContent.length;
input.callbacks?.onChunk?.(delta); // Streaming callback
}
latestMessage = nextContent;
});
// Process different data based on the event type
if (event.type === 'thread.started') {
this.threadId = event.thread_id;
} else if (event.type === 'turn.completed') {
usage = event.usage;
} else if (event.type === 'turn.failed') {
throw new CodexExecutorError('gateway_unavailable', event.error.message, true);
} else if (event.type === 'error') {
throw new CodexExecutorError('gateway_unavailable', event.message, true);
}
}
} catch (error) {
if (abortController.signal.aborted) {
throw new CodexExecutorError(
'upstream_timeout',
`Codex stage timed out after ${input.timeoutMs}ms`,
true
);
}
throw error;
} finally {
clearTimeout(timeoutHandle);
}
const structured = toStructuredOutput(latestMessage);
return { output: structured.output, usage };
}

Map specific error patterns to concrete error codes so the upper layers can handle them more easily:

function mapError(error: unknown): CodexExecutorError {
if (error instanceof CodexExecutorError) {
return error;
}
const message = error instanceof Error ? error.message : String(error);
const normalized = message.toLowerCase();
// Authentication errors - not retryable
if (normalized.includes('401') ||
normalized.includes('403') ||
normalized.includes('api key') ||
normalized.includes('auth')) {
return new CodexExecutorError('auth_invalid', message, false);
}
// Rate limit errors - retryable
if (normalized.includes('429') || normalized.includes('rate limit')) {
return new CodexExecutorError('rate_limited', message, true);
}
// Timeout errors - retryable
if (normalized.includes('timeout') || normalized.includes('aborted')) {
return new CodexExecutorError('upstream_timeout', message, true);
}
// Default error
return new CodexExecutorError('gateway_unavailable', message, true);
}
export type CodexErrorCode =
| 'auth_invalid' // Authentication failure
| 'upstream_timeout' // Upstream timeout
| 'rate_limited' // Rate limited
| 'gateway_unavailable'; // Gateway unavailable
export class CodexExecutorError extends Error {
readonly code: CodexErrorCode;
readonly retryable: boolean;
constructor(code: CodexErrorCode, message: string, retryable: boolean) {
super(message);
this.name = 'CodexExecutorError';
this.code = code;
this.retryable = retryable;
}
}

Working Directory and Environment Configuration

Section titled “Working Directory and Environment Configuration”

The Codex SDK requires the working directory to be a valid Git repository.

export function validateWorkingDirectory(
workingDirectory: string,
skipGitRepoCheck: boolean
): void {
const resolvedWorkingDirectory = path.resolve(workingDirectory);
if (!existsSync(resolvedWorkingDirectory)) {
throw new CodexExecutorError(
'gateway_unavailable',
'Working directory does not exist.',
false
);
}
if (!statSync(resolvedWorkingDirectory).isDirectory()) {
throw new CodexExecutorError(
'gateway_unavailable',
'Working directory is not a directory.',
false
);
}
if (skipGitRepoCheck) {
return;
}
const gitDir = path.join(resolvedWorkingDirectory, '.git');
if (!existsSync(gitDir)) {
throw new CodexExecutorError(
'gateway_unavailable',
'Working directory is not a git repository.',
false
);
}
}

The Codex SDK needs to load environment variables from the login shell so the AI Agent can access system commands:

function parseEnvironmentOutput(output: Buffer): Record<string, string> {
const parsed: Record<string, string> = {};
for (const entry of output.toString('utf8').split('\0')) {
if (!entry) continue;
const separatorIndex = entry.indexOf('=');
if (separatorIndex <= 0) continue;
const key = entry.slice(0, separatorIndex);
const value = entry.slice(separatorIndex + 1);
if (key.length > 0) {
parsed[key] = value;
}
}
return parsed;
}
function tryLoadEnvironmentFromShell(shellPath: string): Record<string, string> | null {
const result = spawnSync(shellPath, ['-ilc', 'env -0'], {
env: process.env,
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 5000,
});
if (result.error || result.status !== 0) {
return null;
}
return parseEnvironmentOutput(result.stdout);
}
export function createExecutorEnvironment(
envOverrides: Record<string, string> = {}
): Record<string, string> {
// Load environment variables from the login shell
const consoleEnv = loadConsoleEnvironmentFromShell();
return {
...process.env,
...consoleEnv,
...envOverrides,
};
}

In the HagiCode project, we use the following approach to initialize the Codex client and execute tasks:

import { Codex } from '@openai/codex-sdk';
async function executeWithCodex(prompt: string, workingDir: string) {
const client = new Codex({
apiKey: process.env.CODEX_API_KEY,
env: { PATH: process.env.PATH },
});
const thread = client.startThread({
workingDirectory: workingDir,
});
const { events } = await thread.runStreamed(prompt);
let result = '';
for await (const event of events) {
if (event.type === 'item.updated' && event.item.type === 'agent_message') {
result = event.item.text;
}
if (event.type === 'turn.completed') {
console.log('Token usage:', event.usage);
}
}
// Try to parse JSON output
try {
const parsed = JSON.parse(result);
return parsed.output;
} catch {
return result;
}
}
export class CodexSdkExecutor {
private readonly config: CodexRuntimeConfig;
private readonly client: Codex;
private threadId: string | null = null;
async executeStage(input: CodexStageExecutionInput): Promise<CodexStageExecutionResult> {
const maxAttempts = Math.max(1, this.config.retryCount + 1);
let attempt = 0;
let lastError: CodexExecutorError | null = null;
while (attempt < maxAttempts) {
attempt += 1;
try {
const thread = this.getThread(input.workingDirectory);
const { output, usage } = await this.runWithStreaming(thread, input);
return {
output,
usage,
threadId: this.threadId!,
attempts: attempt,
latencyMs: Date.now() - startedAt,
};
} catch (error) {
const mappedError = mapError(error);
lastError = mappedError;
// Non-retryable error or max retry attempts reached
if (!mappedError.retryable || attempt >= maxAttempts) {
throw mappedError;
}
// Wait before retrying
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
throw lastError!;
}
}
  • Make sure the working directory is a valid Git repository
  • Use the PROJECT_ROOT environment variable to specify it explicitly
  • During debugging, you can set CODEX_SKIP_GIT_REPO_CHECK=true to skip the check
  • Pass only the required environment variables through a whitelist mechanism
  • Use the login shell to load the full environment
  • Avoid passing sensitive information
  • Set reasonable timeouts based on task complexity
  • Implement exponential backoff for retryable errors
  • Record retry counts and reasons
  • Distinguish between retryable and non-retryable errors
  • Provide clear error messages and suggestions
  • Use unified error codes so upper layers can handle them consistently
  • Implement incremental output callbacks to improve user experience
  • Correctly handle incremental message updates
  • Record token usage for cost analysis

In the actual production environment of the HagiCode project, we have already verified the effectiveness of the best practices above. This approach has helped us build a stable and reliable AI execution service. In the end, practical validation matters more than theory alone.

The Codex SDK event stream mechanism provides strong capabilities for building AI execution services. By correctly parsing different kinds of events, we can:

  • Get execution status and output in real time
  • Implement reliable error handling and retry mechanisms
  • Obtain detailed execution statistics
  • Build a full-featured AI execution platform

The core concepts and code samples introduced in this article can be applied directly in real projects, helping developers get started quickly with Codex SDK integration. If you find this approach valuable, it also reflects the strength of HagiCode’s engineering practice and makes HagiCode itself worth following.


Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by the author, and reflects the author’s own views and positions.

HagiCode Multi-AI Provider Switching and Interoperability Implementation Plan

HagiCode Multi-AI Provider Switching and Interoperability Implementation Plan

Section titled “HagiCode Multi-AI Provider Switching and Interoperability Implementation Plan”

In the modern developer-tooling ecosystem, developers often need to use different AI coding assistants to support their work. Anthropic’s Claude Code CLI and OpenAI’s Codex CLI each have their own strengths: Claude is known for outstanding code understanding and long-context handling, while Codex excels at code generation and tool usage.

This article takes an in-depth look at how the HagiCode project achieves seamless switching and interoperability across multiple AI providers, including the core architectural design, key implementation details, and practical considerations.

The core challenge faced by the HagiCode project is supporting multiple AI CLIs on the same platform, so users can:

  1. Flexibly switch between AI providers based on their needs
  2. Maintain session continuity during provider switching
  3. Unify the API differences across different CLIs behind a common abstraction
  4. Reserve extension points for adding new AI providers in the future
  1. Unifying interface differences: Claude Code CLI is invoked through command-line calls, while Codex CLI uses a JSON event stream
  2. Handling streaming responses: Both providers support streaming responses, but with different data formats
  3. Tool-calling semantics: Claude and Codex differ in how they represent tool calls and manage their lifecycle
  4. Session lifecycle: The system must correctly manage session creation, restoration, and termination for each provider

HagiCode uses the Provider Pattern combined with the Factory Pattern to abstract AI service invocation. The core ideas of this design are:

  1. Unified interface abstraction: Define the IAIProvider interface as the common abstraction for all AI providers
  2. Factory-created instances: Use AIProviderFactory to dynamically create the corresponding provider instance based on type
  3. Intelligent selection logic: Use AIProviderSelector to automatically select the most suitable provider based on scenario and configuration
  4. Session state management: Persist the binding relationship between sessions and CLI threads in the database
ComponentResponsibilityLanguage
IAIProviderUnified provider interfaceC#
AIProviderFactoryCreate and manage provider instancesC#
AIProviderSelectorSelect providers intelligentlyC#
ClaudeCodeCliProviderClaude Code CLI implementationC#
CodexCliProviderCodex CLI implementationC#
AgentCliManagerDesktop-side CLI managementTypeScript

The IAIProvider interface defines the unified provider abstraction:

public interface IAIProvider
{
/// <summary>
/// Provider display name
/// </summary>
string Name { get; }
/// <summary>
/// Whether streaming responses are supported
/// </summary>
bool SupportsStreaming { get; }
/// <summary>
/// Provider capability description
/// </summary>
ProviderCapabilities Capabilities { get; }
/// <summary>
/// Execute a single AI request
/// </summary>
Task<AIResponse> ExecuteAsync(AIRequest request, CancellationToken cancellationToken = default);
/// <summary>
/// Execute a streaming AI request
/// </summary>
IAsyncEnumerable<AIStreamingChunk> StreamAsync(AIRequest request, CancellationToken cancellationToken = default);
/// <summary>
/// Check provider connectivity and responsiveness
/// </summary>
Task<ProviderTestResult> PingAsync(CancellationToken cancellationToken = default);
/// <summary>
/// Send a message with an embedded command
/// </summary>
IAsyncEnumerable<AIStreamingChunk> SendMessageAsync(
AIRequest request,
string? embeddedCommandPrompt = null,
CancellationToken cancellationToken = default);
}

Key characteristics of this interface design:

  • Unified request/response model: All providers use the same AIRequest and AIResponse types
  • Streaming support: Standardize streaming output through IAsyncEnumerable<AIStreamingChunk>
  • Capability description: ProviderCapabilities describes the features supported by the provider (streaming, tools, maximum tokens, and so on)
  • Embedded commands: SendMessageAsync supports embedding OpenSpec commands into prompts
public enum AIProviderType
{
ClaudeCodeCli, // Anthropic Claude Code
OpenCodeCli, // Other CLIs (extensible)
GitHubCopilot, // GitHub Copilot
CodebuddyCli, // Codebuddy
CodexCli // OpenAI Codex
}

This enum provides a type-safe representation for all providers supported by the system.

The AIProviderFactory is responsible for creating and managing provider instances:

public class AIProviderFactory : IAIProviderFactory
{
private readonly ConcurrentDictionary<AIProviderType, IAIProvider> _cache;
private readonly IOptions<AIProviderOptions> _options;
private readonly IServiceProvider _serviceProvider;
public Task<IAIProvider?> GetProviderAsync(AIProviderType providerType)
{
// Use caching to avoid duplicate creation
if (_cache.TryGetValue(providerType, out var cached))
return Task.FromResult<IAIProvider?>(cached);
// Get provider configuration from settings
var aiOptions = _options.Value;
if (!aiOptions.Providers.TryGetValue(providerType, out var config))
{
_logger.LogWarning("Provider '{ProviderType}' not found in configuration", providerType);
return Task.FromResult<IAIProvider?>(null);
}
// Create provider by type
var provider = providerType switch
{
AIProviderType.ClaudeCodeCli =>
_serviceProvider.GetService(typeof(ClaudeCodeCliProvider)) as IAIProvider,
AIProviderType.CodexCli =>
_serviceProvider.GetService(typeof(CodexCliProvider)) as IAIProvider,
AIProviderType.GitHubCopilot =>
_serviceProvider.GetService(typeof(CopilotAIProvider)) as IAIProvider,
_ => null
};
if (provider != null)
{
_cache[providerType] = provider;
}
return Task.FromResult<IAIProvider?>(provider);
}
}

Advantages of the factory pattern:

  • Instance caching: Avoid repeatedly creating the same type of provider
  • Dependency injection: Create instances through IServiceProvider, with dependency injection support
  • Configuration-driven: Read provider settings from configuration files
  • Exception handling: Return null when creation fails, making it easier for upper layers to handle errors

The AIProviderSelector implements provider-selection strategies:

public class AIProviderSelector : IAIProviderSelector
{
private readonly BusinessLayerConfiguration _configuration;
private readonly IAIProviderFactory _providerFactory;
private readonly IMemoryCache _cache;
public async Task<AIProviderType> SelectProviderAsync(
BusinessScenario scenario,
CancellationToken cancellationToken = default)
{
// 1. Try getting a provider from scenario mapping
if (_configuration.ScenarioProviderMapping.TryGetValue(scenario, out var providerType))
{
if (await IsProviderAvailableAsync(providerType, cancellationToken))
{
_logger.LogDebug("Selected provider '{Provider}' for scenario '{Scenario}'",
providerType, scenario);
return providerType;
}
_logger.LogWarning("Configured provider '{Provider}' for scenario '{Scenario}' is not available",
providerType, scenario);
}
// 2. Try the default provider
if (await IsProviderAvailableAsync(_configuration.DefaultProvider, cancellationToken))
{
_logger.LogDebug("Using default provider '{Provider}' for scenario '{Scenario}'",
_configuration.DefaultProvider, scenario);
return _configuration.DefaultProvider;
}
// 3. Try the fallback chain
foreach (var fallbackProvider in _configuration.FallbackChain)
{
if (await IsProviderAvailableAsync(fallbackProvider, cancellationToken))
{
_logger.LogInformation("Using fallback provider '{Provider}' for scenario '{Scenario}'",
fallbackProvider, scenario);
return fallbackProvider;
}
}
// 4. No available provider can be found
throw new InvalidOperationException(
$"No available AI provider found for scenario '{scenario}'");
}
public async Task<bool> IsProviderAvailableAsync(
AIProviderType providerType,
CancellationToken cancellationToken = default)
{
var cacheKey = $"provider_available_{providerType}";
// Use caching to reduce Ping calls
if (_configuration.EnableCache &&
_cache.TryGetValue<bool>(cacheKey, out var cached))
{
return cached;
}
var provider = await _providerFactory.GetProviderAsync(providerType);
var isAvailable = provider != null;
if (_configuration.EnableCache && isAvailable)
{
_cache.Set(cacheKey, isAvailable,
TimeSpan.FromSeconds(_configuration.CacheExpirationSeconds));
}
return isAvailable;
}
}

Selector strategy:

  • Scenario mapping first: First check whether the business scenario has a specific provider mapping
  • Fallback to default provider: Use the default provider if scenario mapping fails
  • Fallback chain as a final safeguard: Try providers in the fallback chain one by one
  • Availability caching: Cache provider availability checks to reduce Ping calls

5. Claude Code CLI Provider Implementation

Section titled “5. Claude Code CLI Provider Implementation”
public class ClaudeCodeCliProvider : IAIProvider
{
private readonly ILogger<ClaudeCodeCliProvider> _logger;
private readonly IClaudeStreamManager _streamManager;
private readonly ProviderConfiguration _config;
public string Name => "ClaudeCodeCli";
public bool SupportsStreaming => true;
public ProviderCapabilities Capabilities { get; }
public async Task<AIResponse> ExecuteAsync(AIRequest request, CancellationToken cancellationToken = default)
{
_logger.LogInformation("Executing AI request with provider: {Provider}", Name);
var sessionOptions = ClaudeRequestMapper.MapToSessionOptions(request, _config);
var messages = _streamManager.SendMessageAsync(request.Prompt, sessionOptions, cancellationToken);
var responseBuilder = new StringBuilder();
ResultMessage? finalResult = null;
await foreach (var streamMessage in messages)
{
switch (streamMessage.Message)
{
case ResultMessage result:
finalResult = result;
responseBuilder.Append(result.Result);
break;
}
}
if (finalResult != null)
{
return ClaudeResponseMapper.MapToAIResponse(finalResult, Name);
}
return new AIResponse
{
Content = responseBuilder.ToString(),
FinishReason = FinishReason.Unknown,
Provider = Name
};
}
}

Characteristics of the Claude Code CLI provider:

  • Streaming manager integration: Use IClaudeStreamManager to communicate with the Claude CLI
  • CessionId session isolation: Use CessionId as the unique session identifier, distinct from the system sessionId
  • Working directory configuration: Support configuration of the working directory, permission mode, and more
  • Tool support: Support tool-permission settings such as AllowedTools and DisallowedTools
public class CodexCliProvider : IAIProvider
{
private readonly ILogger<CodexCliProvider> _logger;
private readonly CodexSettings _settings;
private readonly ConcurrentDictionary<string, string> _sessionThreadBindings;
public string Name => "CodexCli";
public bool SupportsStreaming => true;
public ProviderCapabilities Capabilities { get; }
public async IAsyncEnumerable<AIStreamingChunk> StreamAsync(
AIRequest request,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
_logger.LogInformation("Executing streaming AI request with provider: {Provider}", Name);
var codex = CreateCodexClient();
var thread = ResolveThread(codex, request);
var currentTurn = 0;
var activeToolCalls = new Dictionary<string, AIToolCallDelta>();
await foreach (var threadEvent in thread.RunStreamedAsync(BuildPrompt(request), cancellationToken))
{
if (threadEvent is TurnStartedEvent)
{
currentTurn++;
}
switch (threadEvent)
{
case ItemCompletedEvent { Item: AgentMessageItem message }:
var messageText = message.Text ?? string.Empty;
yield return new AIStreamingChunk
{
Content = messageText,
Type = StreamingChunkType.ContentDelta,
IsComplete = false
};
break;
case ItemStartedEvent or ItemUpdatedEvent or ItemCompletedEvent:
var toolChunk = BuildToolChunk(threadEvent, currentTurn);
if (toolChunk?.ToolCallDelta != null)
{
yield return toolChunk;
}
break;
case TurnCompletedEvent turnCompleted:
activeToolCalls.Clear();
yield return new AIStreamingChunk
{
Content = string.Empty,
Type = StreamingChunkType.Metadata,
IsComplete = true,
Usage = MapUsage(turnCompleted.Usage)
};
break;
}
}
BindSessionThread(request.SessionId, thread.Id);
}
private CodexThread ResolveThread(Codex codex, AIRequest request)
{
var sessionId = request.SessionId;
// Check whether there is already a bound thread
if (!string.IsNullOrWhiteSpace(sessionId) &&
_sessionThreadBindings.TryGetValue(sessionId, out var threadId) &&
!string.IsNullOrWhiteSpace(threadId))
{
_logger.LogInformation("Resuming Codex thread {ThreadId} for session {SessionId}", threadId, sessionId);
return codex.ResumeThread(threadId, threadOptions);
}
_logger.LogInformation("Starting new Codex thread for session {SessionId}", sessionId ?? "(none)");
return codex.StartThread(threadOptions);
}
}

Characteristics of the Codex CLI provider:

  • JSON event-stream handling: Parse Codex JSON event streams (TurnStarted, ItemStarted, TurnCompleted, and so on)
  • Session-thread binding: Persist the binding between sessions and threads with an SQLite database
  • Thread reuse: Support resuming existing threads to maintain session continuity
  • Tool-call tracking: Track active tool-call state and correctly handle the tool lifecycle

Codex CLI uses an SQLite database to persist the binding between sessions and threads:

public class CodexCliProvider : IAIProvider
{
private const int SessionThreadBindingRetentionDays = 30;
private readonly ConcurrentDictionary<string, string> _sessionThreadBindings;
private readonly string _sessionThreadBindingDatabaseConnectionString;
private readonly string _sessionThreadBindingDatabasePath;
private void BindSessionThread(string? sessionId, string? threadId)
{
if (string.IsNullOrWhiteSpace(sessionId) || string.IsNullOrWhiteSpace(threadId))
{
return;
}
// In-memory cache
_sessionThreadBindings.AddOrUpdate(sessionId, threadId, (_, _) => threadId);
// Persist to SQLite
PersistSessionThreadBinding(sessionId, threadId);
}
private void PersistSessionThreadBinding(string sessionId, string threadId)
{
try
{
using var connection = new SqliteConnection(_sessionThreadBindingDatabaseConnectionString);
connection.Open();
using var upsertCommand = connection.CreateCommand();
upsertCommand.CommandText =
"""
INSERT INTO SessionThreadBindings (SessionId, ThreadId, CreatedAtUtc, UpdatedAtUtc)
VALUES ($sessionId, $threadId, $createdAtUtc, $updatedAtUtc)
ON CONFLICT(SessionId) DO UPDATE SET
ThreadId = excluded.ThreadId,
UpdatedAtUtc = excluded.UpdatedAtUtc;
""";
var nowUtc = DateTimeOffset.UtcNow.ToString("O");
upsertCommand.Parameters.AddWithValue("$sessionId", sessionId);
upsertCommand.Parameters.AddWithValue("$threadId", threadId);
upsertCommand.Parameters.AddWithValue("$createdAtUtc", nowUtc);
upsertCommand.Parameters.AddWithValue("$updatedAtUtc", nowUtc);
upsertCommand.ExecuteNonQuery();
}
catch (Exception ex)
{
_logger.LogWarning(
ex,
"Failed to persist Codex session-thread binding for session {SessionId} to {DatabasePath}",
sessionId,
_sessionThreadBindingDatabasePath);
}
}
private void LoadPersistedSessionThreadBindings()
{
using var connection = new SqliteConnection(_sessionThreadBindingDatabaseConnectionString);
connection.Open();
using var loadCommand = connection.CreateCommand();
loadCommand.CommandText = "SELECT SessionId, ThreadId FROM SessionThreadBindings;";
using var reader = loadCommand.ExecuteReader();
while (reader.Read())
{
var sessionId = reader.GetString(0);
var threadId = reader.GetString(1);
_sessionThreadBindings[sessionId] = threadId;
}
}
}

Advantages of session-thread binding:

  • Session restoration: Previous sessions can be restored after a system restart
  • Thread reuse: The same session can reuse an existing Codex thread
  • Automatic cleanup: Bindings older than 30 days are cleaned up automatically

hagicode-desktop manages CLI selection through AgentCliManager:

export enum AgentCliType {
ClaudeCode = 'claude-code',
Codex = 'codex',
// Future extensions: other CLIs such as Aider and Cursor
}
export class AgentCliManager {
private static readonly STORE_KEY = 'agentCliSelection';
private static readonly EXECUTOR_TYPE_MAP: Record<AgentCliType, string> = {
[AgentCliType.ClaudeCode]: 'ClaudeCodeCli',
[AgentCliType.Codex]: 'CodexCli',
};
constructor(private store: any) {}
async saveSelection(cliType: AgentCliType): Promise<void> {
const selection: StoredAgentCliSelection = {
cliType,
isSkipped: false,
selectedAt: new Date().toISOString(),
};
this.store.set(AgentCliManager.STORE_KEY, selection);
}
loadSelection(): StoredAgentCliSelection {
return this.store.get(AgentCliManager.STORE_KEY, {
cliType: null,
isSkipped: false,
selectedAt: null,
});
}
getCommandName(cliType: AgentCliType): string {
switch (cliType) {
case AgentCliType.ClaudeCode:
return 'claude';
case AgentCliType.Codex:
return 'codex';
default:
return 'claude';
}
}
getExecutorType(cliType: AgentCliType | null): string {
if (!cliType) return 'ClaudeCodeCli';
return this.EXECUTOR_TYPE_MAP[cliType] || 'ClaudeCodeCli';
}
}

Example desktop-side IPC handler:

ipcMain.handle('llm:call-api', async (event, manifestPath, region) => {
if (!state.llmInstallationManager) {
return { success: false, error: 'LLM Installation Manager not initialized' };
}
try {
const prompt = await state.llmInstallationManager.loadPrompt(manifestPath, region);
// Determine the CLI command based on the user's selection
let commandName = 'claude';
if (state.agentCliManager) {
const selectedCliType = state.agentCliManager.getSelectedCliType();
if (selectedCliType) {
commandName = state.agentCliManager.getCommandName(selectedCliType);
}
}
// Execute with the selected CLI
const result = await state.llmInstallationManager.callApi(
prompt.filePath,
event.sender,
commandName
);
return result;
} catch (error) {
return {
success: false,
error: error instanceof Error ? error.message : 'Unknown error'
};
}
});

9. Codex’s Internal Model Provider System

Section titled “9. Codex’s Internal Model Provider System”

Codex itself also supports multiple model providers via ModelProviderInfo configuration:

pub const OPENAI_PROVIDER_NAME: &str = "OpenAI";
pub const OLLAMA_OSS_PROVIDER_ID: &str = "ollama";
pub const LMSTUDIO_OSS_PROVIDER_ID: &str = "lmstudio";
pub fn built_in_model_providers() -> HashMap<String, ModelProviderInfo> {
use ModelProviderInfo as P;
[
("openai", P::create_openai_provider()),
(OLLAMA_OSS_PROVIDER_ID, create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Responses)),
(LMSTUDIO_OSS_PROVIDER_ID, create_oss_provider(DEFAULT_LMSTUDIO_PORT, WireApi::Responses)),
]
.into_iter()
.map(|(k, v)| (k.to_string(), v))
.collect()
}
pub struct ModelProviderInfo {
pub name: String,
pub base_url: Option<String>,
pub env_key: Option<String>,
pub query_params: Option<HashMap<String, String>>,
pub http_headers: Option<HashMap<String, String>>,
pub request_max_retries: Option<u64>,
pub stream_max_retries: Option<u64>,
pub stream_idle_timeout_ms: Option<u64>,
pub requires_openai_auth: bool,
pub supports_websockets: bool,
}

Codex model-provider support includes:

  • Built-in providers: OpenAI, Ollama, and LM Studio
  • Custom providers: Users can add custom providers in config.toml
  • Retry strategy: Configurable retry counts for requests and streams
  • WebSocket support: Some providers support WebSocket transport

Configure multiple providers in appsettings.json:

{
"AI": {
"Providers": {
"DefaultProvider": "ClaudeCodeCli",
"Providers": {
"ClaudeCodeCli": {
"Type": "ClaudeCodeCli",
"Model": "claude-sonnet-4-20250514",
"WorkingDirectory": "/path/to/workspace",
"PermissionMode": "acceptEdits",
"AllowedTools": ["file-edit", "command-run", "bash"]
},
"CodexCli": {
"Type": "CodexCli",
"Model": "gpt-4.1",
"ExecutablePath": "codex",
"SandboxMode": "enabled",
"WebSearchMode": "auto",
"NetworkAccessEnabled": false
}
},
"ScenarioProviderMapping": {
"CodeAnalysis": "ClaudeCodeCli",
"CodeGeneration": "CodexCli",
"Refactoring": "ClaudeCodeCli",
"Debugging": "CodexCli"
},
"FallbackChain": ["CodexCli", "ClaudeCodeCli"]
},
"Selector": {
"EnableCache": true,
"CacheExpirationSeconds": 300
}
}
}
public class AIOrchestrator
{
private readonly IAIProviderFactory _providerFactory;
private readonly IAIProviderSelector _providerSelector;
private readonly ILogger<AIOrchestrator> _logger;
public AIOrchestrator(
IAIProviderFactory providerFactory,
IAIProviderSelector providerSelector,
ILogger<AIOrchestrator> logger)
{
_providerFactory = providerFactory;
_providerSelector = providerSelector;
_logger = logger;
}
public async Task<AIResponse> ProcessRequestAsync(
AIRequest request,
BusinessScenario scenario)
{
_logger.LogInformation("Processing request for scenario: {Scenario}", scenario);
try
{
// Select a provider intelligently
var providerType = await _providerSelector.SelectProviderAsync(scenario, request.CancellationToken);
// Get the provider instance
var provider = await _providerFactory.GetProviderAsync(providerType);
if (provider == null)
{
throw new InvalidOperationException($"Provider {providerType} not available");
}
_logger.LogInformation("Using provider: {Provider} for request", provider.Name);
// Execute the request
var response = await provider.ExecuteAsync(request, request.CancellationToken);
_logger.LogInformation("Request completed with provider: {Provider}, tokens used: {Tokens}",
provider.Name,
response.Usage?.TotalTokens ?? 0);
return response;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process request for scenario: {Scenario}", scenario);
throw;
}
}
}
public async IAsyncEnumerable<AIStreamingChunk> StreamResponseAsync(
AIRequest request,
BusinessScenario scenario)
{
var providerType = await _providerSelector.SelectProviderAsync(scenario);
var provider = await _providerFactory.GetProviderAsync(providerType);
if (provider == null)
{
throw new InvalidOperationException($"Provider {providerType} not available");
}
await foreach (var chunk in provider.StreamAsync(request))
{
// Process streaming chunks
switch (chunk.Type)
{
case StreamingChunkType.ContentDelta:
// Show text content in real time
await SendToClientAsync(chunk.Content);
break;
case StreamingChunkType.ToolCallDelta:
// Handle tool calls
await HandleToolCallAsync(chunk.ToolCallDelta);
break;
case StreamingChunkType.Metadata:
// Handle completion events and stats
if (chunk.IsComplete)
{
_logger.LogInformation("Stream completed, usage: {@Usage}", chunk.Usage);
}
break;
case StreamingChunkType.Error:
// Handle errors
_logger.LogError("Stream error: {Error}", chunk.ErrorMessage);
throw new InvalidOperationException(chunk.ErrorMessage);
}
}
}
public async Task<string> ExecuteOpenSpecCommandAsync(
string command,
string arguments,
BusinessScenario scenario)
{
var providerType = await _providerSelector.SelectProviderAsync(scenario);
var provider = await _providerFactory.GetProviderAsync(providerType);
// Build an embedded command prompt
var commandPrompt = $"""
Execute the following OpenSpec command:
Command: {command}
Arguments: {arguments}
Please execute this command and return the results.
""";
var request = new AIRequest
{
Prompt = "Process this command request",
EmbeddedCommandPrompt = commandPrompt,
WorkingDirectory = Directory.GetCurrentDirectory()
};
var response = await provider.SendMessageAsync(request, commandPrompt);
return response.Content;
}

Before switching providers, it is recommended to call PingAsync first to ensure the target provider is available:

public async Task<bool> IsProviderHealthyAsync(AIProviderType providerType)
{
var provider = await _providerFactory.GetProviderAsync(providerType);
if (provider == null) return false;
var testResult = await provider.PingAsync();
return testResult.Success &&
testResult.ResponseTimeMs < 5000; // A response within 5 seconds is considered healthy
}

Use CessionId (Claude) or ThreadId (Codex) to ensure session isolation:

  • Claude Code CLI: use CessionId as the unique session identifier
  • Codex CLI: use ThreadId as the session identifier
// Claude Code CLI session options
var claudeSessionOptions = new ClaudeSessionOptions
{
CessionId = CessionId.New(), // Generate a unique ID
WorkingDirectory = workspacePath,
AllowedTools = allowedTools,
PermissionMode = PermissionMode.acceptEdits
};
// Codex thread options
var codexThreadOptions = new ThreadOptions
{
Model = "gpt-4.1",
SandboxMode = "enabled",
WorkingDirectory = workspacePath
};

Fallback mechanisms must be robust when a provider is unavailable, ensuring that at least one provider remains usable:

public async Task<AIResponse> ExecuteWithFallbackAsync(
AIRequest request,
List<AIProviderType> preferredProviders)
{
Exception? lastException = null;
foreach (var providerType in preferredProviders)
{
try
{
var provider = await _providerFactory.GetProviderAsync(providerType);
if (provider == null) continue;
// Try execution
return await provider.ExecuteAsync(request);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Provider {ProviderType} failed, trying next", providerType);
lastException = ex;
}
}
// All providers failed
throw new InvalidOperationException(
"All preferred providers failed. Last error: " + lastException?.Message,
lastException);
}

Validate settings for all configured providers at startup to avoid runtime errors:

public void ValidateConfiguration(AIProviderOptions options)
{
foreach (var (providerType, config) in options.Providers)
{
// Validate executable paths (for CLI-based providers)
if (IsCliBasedProvider(providerType))
{
if (string.IsNullOrWhiteSpace(config.ExecutablePath))
{
throw new ConfigurationException(
$"Provider {providerType} requires ExecutablePath");
}
if (!File.Exists(config.ExecutablePath))
{
throw new ConfigurationException(
$"Executable not found for {providerType}: {config.ExecutablePath}");
}
}
// Validate API keys (for API-based providers)
if (IsApiBasedProvider(providerType))
{
if (string.IsNullOrWhiteSpace(config.ApiKey))
{
throw new ConfigurationException(
$"Provider {providerType} requires ApiKey");
}
}
// Validate model names
if (string.IsNullOrWhiteSpace(config.Model))
{
_logger.LogWarning("No model configured for {ProviderType}, using default", providerType);
}
}
}

Provider instances are cached, so pay attention to lifecycle management and memory usage:

// Clean up the cache periodically
public void ClearInactiveProviders(TimeSpan inactiveThreshold)
{
var now = DateTimeOffset.UtcNow;
var keysToRemove = new List<AIProviderType>();
foreach (var (type, instance) in _cache)
{
// Assume providers have a LastUsedTime property
if (instance.LastUsedTime.HasValue &&
now - instance.LastUsedTime.Value > inactiveThreshold)
{
keysToRemove.Add(type);
}
}
foreach (var key in keysToRemove)
{
_cache.TryRemove(key, out _);
_logger.LogInformation("Cleared inactive provider: {Provider}", key);
}
}

Log provider selection, switching, and execution in detail to make debugging easier:

public class AIProviderLogging
{
private readonly ILogger _logger;
public void LogProviderSelection(
BusinessScenario scenario,
AIProviderType selectedProvider,
SelectionReason reason)
{
_logger.LogInformation(
"[ProviderSelection] Scenario={Scenario}, Provider={Provider}, Reason={Reason}",
scenario,
selectedProvider,
reason);
}
public void LogProviderSwitch(
AIProviderType fromProvider,
AIProviderType toProvider,
string reason)
{
_logger.LogWarning(
"[ProviderSwitch] From={FromProvider} To={ToProvider}, Reason={Reason}",
fromProvider,
toProvider,
reason);
}
public void LogProviderError(
AIProviderType provider,
Exception error,
AIRequest request)
{
_logger.LogError(error,
"[ProviderError] Provider={Provider}, RequestLength={Length}, Error={Message}",
provider,
request.Prompt.Length,
error.Message);
}
}

Using concurrent collections such as ConcurrentDictionary ensures thread safety:

public class ThreadSafeProviderCache
{
private readonly ConcurrentDictionary<AIProviderType, IAIProvider> _cache;
private readonly ReaderWriterLockSlim _lock = new();
public IAIProvider? GetProvider(AIProviderType type)
{
// Read operations do not require a lock
if (_cache.TryGetValue(type, out var provider))
return provider;
// Creation requires a write lock
_lock.EnterWriteLock();
try
{
// Double-check
if (_cache.TryGetValue(type, out provider))
return provider;
var newProvider = CreateProvider(type);
if (newProvider != null)
{
_cache[type] = newProvider;
}
return newProvider;
}
finally
{
_lock.ExitWriteLock();
}
}
}

When the session-thread binding database schema changes, data migration must be considered:

public class SessionThreadMigration
{
public async Task MigrateAsync(string dbPath)
{
var version = await GetSchemaVersionAsync(dbPath);
if (version >= 2) return; // Already the latest version
using var connection = new SqliteConnection(dbPath);
connection.Open();
// Migrate to v2: add the CreatedAtUtc column
if (version < 2)
{
_logger.LogInformation("Migrating SessionThreadBindings to v2...");
using var addColumnCommand = connection.CreateCommand();
addColumnCommand.CommandText = "ALTER TABLE SessionThreadBindings ADD COLUMN CreatedAtUtc TEXT;";
addColumnCommand.ExecuteNonQuery();
using var backfillCommand = connection.CreateCommand();
backfillCommand.CommandText =
"""
UPDATE SessionThreadBindings
SET CreatedAtUtc = COALESCE(NULLIF(UpdatedAtUtc, ''), $nowUtc)
WHERE CreatedAtUtc IS NULL OR CreatedAtUtc = '';
""";
backfillCommand.Parameters.AddWithValue("$nowUtc", DateTimeOffset.UtcNow.ToString("O"));
backfillCommand.ExecuteNonQuery();
}
await UpdateSchemaVersionAsync(dbPath, 2);
_logger.LogInformation("Migration to v2 completed");
}
}

HagiCode combines the provider pattern, factory pattern, and selector pattern to implement a flexible and extensible multi-AI provider architecture:

  • Unified interface abstraction: The IAIProvider interface hides the differences between CLIs
  • Dynamic instance creation: AIProviderFactory supports runtime creation of provider instances
  • Intelligent selection strategy: AIProviderSelector implements scenario-driven provider selection
  • Session state persistence: Database bindings ensure session continuity
  • Desktop integration: AgentCliManager supports user selection and configuration

The advantages of this architecture are:

  1. Extensibility: Adding a new AI provider only requires implementing the IAIProvider interface
  2. Testability: Providers can be tested and mocked independently
  3. Maintainability: Each provider implementation is isolated and has a single responsibility
  4. User-friendliness: Support both scenario-based automatic selection and manual switching

With this design, HagiCode successfully enables seamless switching and interoperability between Claude Code CLI and Codex CLI, giving developers a flexible and powerful AI coding assistant experience.


Thank you for reading. If you found this article useful, please click the like button below 👍 so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

From TypeScript to C#: Cross-Language Porting Practice for the Codex SDK

From TypeScript to C#: Cross-Language Porting Practice for the Codex SDK

Section titled “From TypeScript to C#: Cross-Language Porting Practice for the Codex SDK”

Put simply, this article is also a bit of a baby of ours: it records the full process of porting the official TypeScript Codex SDK to C#. Calling it a “port” almost makes it sound too easy - it was more like a long adventure, because these two languages have very different personalities, and we had to find a way to make them cooperate.

Codex is the AI Agent CLI tool released by OpenAI, and it is genuinely powerful. The official team provides a TypeScript SDK in the @openai/codex package. It interacts with the Codex CLI by calling the codex exec --experimental-json command and parsing a JSONL event stream.

The problem is that in the HagiCode project, we need to use it in a pure .NET environment - specifically in C# backend services and desktop applications. We could not reasonably introduce a Node.js runtime into a .NET project just to call a CLI tool. That would be far too cumbersome.

So we were left with two choices: maintain a complex Node.js bridge layer, or build a native C# SDK ourselves.

We chose the latter.

This article also comes directly from our hands-on experience in the HagiCode project. HagiCode is an open-source AI coding assistant project. In plain terms, it means maintaining multiple components at once: a VSCode extension on the frontend, AI services on the backend, and a cross-platform desktop client. That multi-language, multi-platform complexity is exactly why we needed a native C# SDK - we really did not want to run Node.js inside a .NET project.

If you find this article helpful, feel free to give us a star on GitHub: github.com/HagiCode-org/site. You can also visit the official website to learn more: hagicode.com. It is always encouraging when an open-source project receives support.

Before translating code one-to-one, we first had to understand the architectural design of both SDKs. You have to understand both sides before you can port them well.

The core architecture of the TypeScript SDK looks like this:

Codex (entry class)
└── CodexExec (executor, manages child processes)
└── Thread (conversation thread)
├── run() / runStreamed() (synchronous/asynchronous execution)
└── event stream parsing

The C# SDK keeps the same architectural layering, but adapts the implementation details. The overall idea is straightforward: preserve API consistency while fully leveraging C# language features in the implementation.

This is the most fundamental and also the most important part of the work. If the foundation is weak, everything that follows becomes harder.

TypeScript’s type system is more flexible than C#‘s, and that is simply a fact. We needed to find an appropriate mapping strategy:

TypeScriptC#Notes
interface / typerecordC# uses record for immutable data structures
string | nullstring?Nullable reference type
boolean | undefinedbool?Nullable Boolean
AsyncGeneratorIAsyncEnumerableAsync iterator

The event type system is a typical example. TypeScript uses union types to define events:

export type ThreadEvent =
| ThreadStartedEvent
| TurnStartedEvent
| TurnCompletedEvent
| ...

In C#, we use an inheritance hierarchy and pattern matching to achieve a similar effect:

public abstract record ThreadEvent(string Type);
public sealed record ThreadStartedEvent(string ThreadId) : ThreadEvent("thread.started");
public sealed record TurnStartedEvent() : ThreadEvent("turn.started");
public sealed record TurnCompletedEvent(Usage Usage) : ThreadEvent("turn.completed");
// ...

We chose record instead of class because event objects should be immutable, which matches the intent behind using plain objects in TypeScript. The sealed keyword also prevents additional inheritance and gives the compiler room to optimize.

Event parsing is the core of the entire SDK, because it determines whether we can correctly understand every message returned by the Codex CLI. If parsing is wrong, everything after that is wasted effort.

The TypeScript version uses JSON.parse() to parse each line of JSON:

export function parseEvent(line: string): ThreadEvent {
const data = JSON.parse(line);
// Handle different event types...
}

The C# version uses System.Text.Json.JsonDocument instead:

public static ThreadEvent Parse(string line)
{
using var document = JsonDocument.Parse(line);
var root = document.RootElement;
var type = GetRequiredString(root, "type", "event.type");
return type switch
{
"thread.started" => new ThreadStartedEvent(GetRequiredString(root, "thread_id", ...)),
"turn.started" => new TurnStartedEvent(),
"turn.completed" => new TurnCompletedEvent(ParseUsage(...)),
// ...
_ => new UnknownThreadEvent(type, root.Clone()),
};
}

There is one small but important trick here: root.Clone() is required, because elements from JsonDocument become invalid after the document is disposed. We need to retain a copy for unknown event types. That is simply one of the differences between C# JSON handling and JavaScript.

This is where the two SDKs differ the most. Node.js and .NET have different runtime conventions, so the implementation has to adapt.

TypeScript uses Node.js’s spawn() function:

const child = spawn(this.executablePath, commandArgs, { env, signal });

C# uses .NET’s System.Diagnostics.Process:

using var process = new Process { StartInfo = startInfo };
process.Start();
// stdin/stdout/stderr must be managed manually

More specifically, the C# version needs to configure the process like this:

var startInfo = new ProcessStartInfo
{
FileName = _executablePath,
RedirectStandardInput = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true,
};

The biggest difference is the cancellation mechanism. TypeScript uses AbortSignal, which is part of the Web API and very convenient to work with:

const child = spawn(cmd, args, { signal: cancellationSignal });

C# uses CancellationToken instead:

public async IAsyncEnumerable<string> RunAsync(
CodexExecArgs args,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Check cancellation status inside the loop
while (!cancellationToken.IsCancellationRequested)
{
// Process output...
}
// Terminate the process when cancellation is requested
if (cancellationToken.IsCancellationRequested)
{
try { process.Kill(entireProcessTree: true); } catch { }
}
}

At a high level, this is just another example of the difference between the Web API ecosystem and the .NET ecosystem.

Both SDKs implement the logic that converts JSON configuration into TOML configuration, because the Codex CLI accepts configuration overrides in TOML format. This part must remain completely consistent, otherwise the same configuration will behave differently in the two SDKs.

That is the kind of detail you cannot compromise on. Success or failure often comes down to details like this.

We created the following project structure:

CodexSdk/
├── CodexSdk.csproj
├── Codex.cs # Entry class
├── CodexThread.cs # Conversation thread
├── CodexExec.cs # Executor
├── Events.cs # Event type definitions
├── Items.cs # Item type definitions
├── EventParser.cs # Event parser
├── OutputSchemaTempFile.cs # Temporary file management
└── ...

It is a fairly clean structure, and that helped a lot during the port.

The basic usage remains consistent with the TypeScript SDK:

using CodexSdk;
// Create a Codex instance
var codex = new Codex();
var thread = codex.StartThread();
// Execute a query
var result = await thread.RunAsync("Summarize this repository.");
Console.WriteLine(result.FinalResponse);

Streaming event handling takes advantage of C# pattern matching:

await foreach (var @event in thread.RunStreamedAsync("Analyze the code."))
{
switch (@event)
{
case ItemCompletedEvent itemCompleted
when itemCompleted.Item is AgentMessageItem msg:
Console.WriteLine($"Assistant: {msg.Text}");
break;
case TurnCompletedEvent completed:
Console.WriteLine($"Tokens: in={completed.Usage.InputTokens}");
break;
case CommandExecutionItem command:
Console.WriteLine($"Command: {command.Command}");
break;
}
}

During implementation, we collected several practical lessons:

  1. Process management: The C# version must manage the full process lifecycle manually, including process termination during cancellation. Use Kill(entireProcessTree: true) to make sure child processes are also cleaned up.

  2. Error handling: We use InvalidOperationException to throw parsing errors, keeping the error handling style similar to the TypeScript SDK.

  3. Resource cleanup: OutputSchemaTempFile implements IAsyncDisposable to ensure temporary files are cleaned up correctly.

  4. Environment variables: The C# version supports fully overriding process environment variables through CodexOptions.Env. It is a small feature, but a very practical one.

  5. Platform differences: The C# version does not include the TypeScript version’s logic for automatically locating binaries inside npm packages. Since .NET projects typically do not depend on npm, the path to the codex executable must be specified via the CODEX_EXECUTABLE environment variable or CodexPathOverride.

Porting a mature TypeScript SDK to C# is not just a matter of syntax conversion - it also requires understanding the design philosophies of both languages. TypeScript’s flexibility and JavaScript ecosystem features such as AbortSignal need appropriate counterparts in C#.

The key takeaway is this: maintaining API consistency matters more than maintaining implementation-level consistency. Users care about whether the interface is easy to use, not whether the internal implementation is identical. That sounds simple, but making those trade-offs takes judgment.

If you are working on a similar cross-language port, our experience is to fully understand the original SDK architecture first, then translate it module by module, and finally use a complete test suite to ensure behavioral consistency. This kind of work cannot be rushed.

Everything will work out in the end.



If this article helped you:


Thank you for reading. If you found this article useful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by the author, and reflects the author’s own views and position.

Guide to Implementing Hotword Support for Doubao Speech Recognition

Guide to Implementing Hotword Support for Doubao Speech Recognition

Section titled “Guide to Implementing Hotword Support for Doubao Speech Recognition”

This article explains in detail how to implement hotword support for Doubao speech recognition in the HagiCode project. By using both custom hotwords and platform hotword tables, you can significantly improve recognition accuracy for domain-specific vocabulary.

Speech recognition technology has developed for many years, yet one problem has consistently bothered developers. General-purpose speech recognition models can cover everyday language, but they often fall short when it comes to professional terminology, product names, and personal names. Think about it: a voice assistant in the medical field needs to accurately recognize terms like “hypertension,” “diabetes,” and “coronary heart disease”; a legal system needs to precisely capture terms such as “cause of action,” “defense,” and “burden of proof.” In these scenarios, a general-purpose model is trying its best, but that is often not enough.

We ran into the same challenge in the HagiCode project. As a multifunctional AI coding assistant, HagiCode needs to handle speech recognition for a wide range of technical terminology. However, the Doubao speech recognition API, in its default configuration, could not fully meet our accuracy requirements for specialized terms. It is not that Doubao is not good enough; rather, every domain has its own terminology system. After some research and technical exploration, we found that the Doubao speech recognition API actually provides hotword support. With a straightforward configuration, it can significantly improve the recognition accuracy of specific vocabulary. In a sense, once you tell it which words to pay attention to, it listens for them more carefully.

What this article shares is the complete solution we used in the HagiCode project to implement Doubao speech recognition hotwords. Both modes, custom hotwords and platform hotword tables, are available, and they can also be combined. With this solution, developers can flexibly configure hotwords based on business scenarios so the speech recognition system can better “recognize” professional, uncommon, yet critical vocabulary.

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project with a modern technology stack, designed to provide developers with an intelligent programming assistance experience. As a complex multilingual, multi-platform project, HagiCode needs to handle speech recognition scenarios involving many technical terms, which in turn drove our research into and implementation of the hotword feature.

If you are interested in HagiCode’s technical implementation, you can visit the GitHub repository for more details, or check out our official documentation for the complete installation and usage guide.

The Doubao speech recognition API provides two ways to configure hotwords, and each one has its own ideal use cases and advantages.

Custom hotword mode lets us pass hotword text directly through the corpus.context field. This approach is especially suitable for scenarios where you need to quickly configure a small number of hotwords, such as temporarily recognizing a product name or a person’s name. In HagiCode’s implementation, we parse the multi-line hotword text entered by the user into a list of strings, then format it into the context_data array required by the Doubao API. This approach is very direct: you simply tell the system which words to pay attention to, and it does exactly that.

Platform hotword table mode uses the corpus.boosting_table_id field to reference a preconfigured hotword table in the Doubao self-learning platform. This approach is suitable for scenarios where you need to manage a large number of hotwords. We can create and maintain hotword tables on the Doubao self-learning platform, then reference them by ID. For a project like HagiCode, where specialized terms need to be continuously updated and maintained, this mode offers much better manageability. Once the number of hotwords grows, having a centralized place to manage them is far better than entering them manually every time.

Interestingly, these two modes can also be used together. The Doubao API supports including both custom hotwords and a platform hotword table ID in the same request, with the combination strategy controlled by the combine_mode parameter. This flexibility allows HagiCode to handle a wide range of complex professional terminology recognition needs. Sometimes, combining multiple approaches produces better results.

In HagiCode’s frontend implementation, we defined a complete set of hotword configuration types and validation logic. The first part is the type definition:

export interface HotwordConfig {
contextText: string; // Multi-line hotword text
boostingTableId: string; // Doubao platform hotword table ID
combineMode: boolean; // Whether to use both together
}

This simple interface contains all configuration items for the hotword feature. Among them, contextText is the part users interact with most directly: we allow users to enter one hotword phrase per line, which is very intuitive. Asking users to enter one term per line is much easier than making them understand a complicated configuration format.

Next comes the validation function. Based on the Doubao API limitations, we defined strict validation rules: at most 100 lines of hotword text, up to 50 characters per line, and no more than 5000 characters in total; boosting_table_id can be at most 200 characters and may contain only letters, numbers, underscores, and hyphens. These limits are not arbitrary; they come directly from the official Doubao documentation. API limits are API limits, and we have to follow them.

export function validateContextText(contextText: string): HotwordValidationResult {
if (!contextText || contextText.trim().length === 0) {
return { isValid: true, errors: [] };
}
const lines = contextText.split('\n').filter(line => line.trim().length > 0);
const errors: string[] = [];
if (lines.length > 100) {
errors.push(`Hotword line count cannot exceed 100 lines; current count is ${lines.length}`);
}
const totalChars = contextText.length;
if (totalChars > 5000) {
errors.push(`Total hotword character count cannot exceed 5000; current count is ${totalChars}`);
}
for (let i = 0; i < lines.length; i++) {
if (lines[i].length > 50) {
errors.push(`Hotword on line ${i + 1} exceeds the 50-character limit`);
}
}
return { isValid: errors.length === 0, errors };
}
export function validateBoostingTableId(boostingTableId: string): HotwordValidationResult {
if (!boostingTableId || boostingTableId.trim().length === 0) {
return { isValid: true, errors: [] };
}
const errors: string[] = [];
if (boostingTableId.length > 200) {
errors.push(`boosting_table_id cannot exceed 200 characters; current count is ${boostingTableId.length}`);
}
if (!/^[a-zA-Z0-9_-]+$/.test(boostingTableId)) {
errors.push('boosting_table_id can contain only letters, numbers, underscores, and hyphens');
}
return { isValid: errors.length === 0, errors };
}

These validation functions run immediately when the user configures hotwords, ensuring that problems are caught as early as possible. From a user experience perspective, this kind of instant feedback is very important. It is always better for users to know what is wrong while they are typing rather than after they submit.

In HagiCode’s frontend implementation, we chose to use the browser’s localStorage to store hotword configuration. There were several considerations behind this design decision. First, hotword configuration is highly personalized, and different users may have different domain-specific needs. Second, this approach simplifies the backend implementation because it does not require extra database tables or API endpoints. Finally, after users configure it once in the browser, the settings can be loaded automatically on subsequent uses, which is very convenient. Put simply, it is the easiest approach.

const HOTWORD_STORAGE_KEYS = {
contextText: 'hotword-context-text',
boostingTableId: 'hotword-boosting-table-id',
combineMode: 'hotword-combine-mode',
} as const;
export const DEFAULT_HOTWORD_CONFIG: HotwordConfig = {
contextText: '',
boostingTableId: '',
combineMode: false,
};
// Load hotword configuration
export function loadHotwordConfig(): HotwordConfig {
const contextText = localStorage.getItem(HOTWORD_STORAGE_KEYS.contextText) || '';
const boostingTableId = localStorage.getItem(HOTWORD_STORAGE_KEYS.boostingTableId) || '';
const combineMode = localStorage.getItem(HOTWORD_STORAGE_KEYS.combineMode) === 'true';
return { contextText, boostingTableId, combineMode };
}
// Save hotword configuration
export function saveHotwordConfig(config: HotwordConfig): void {
localStorage.setItem(HOTWORD_STORAGE_KEYS.contextText, config.contextText);
localStorage.setItem(HOTWORD_STORAGE_KEYS.boostingTableId, config.boostingTableId);
localStorage.setItem(HOTWORD_STORAGE_KEYS.combineMode, String(config.combineMode));
}

The logic in this code is straightforward and clear. We read from localStorage when loading configuration, and write to localStorage when saving it. We also provide a default configuration so the system can still work properly when no configuration exists yet. There has to be a sensible default, after all.

In HagiCode’s backend implementation, we needed to add hotword-related properties to the SDK configuration class. Taking C# language characteristics and usage patterns into account, we used List<string> to store custom hotword contexts:

public class DoubaoVoiceConfig
{
/// <summary>
/// App ID
/// </summary>
public string AppId { get; set; } = string.Empty;
/// <summary>
/// Access token
/// </summary>
public string AccessToken { get; set; } = string.Empty;
/// <summary>
/// Service URL
/// </summary>
public string ServiceUrl { get; set; } = string.Empty;
/// <summary>
/// Custom hotword context list
/// </summary>
public List<string>? HotwordContexts { get; set; }
/// <summary>
/// Doubao platform hotword table ID
/// </summary>
public string? BoostingTableId { get; set; }
}

The design of this configuration class follows HagiCode’s usual concise style. HotwordContexts is a nullable list type, and BoostingTableId is a nullable string, so when there is no hotword configuration, these properties have no effect on the request at all. If you are not using the feature, it should stay out of the way.

Payload construction is the core of the entire hotword feature. Once we have hotword configuration, we need to format it into the JSON structure required by the Doubao API. This process happens before the SDK sends the request:

private void AddCorpusToRequest(Dictionary<string, object> request)
{
var corpus = new Dictionary<string, object>();
// Add custom hotwords
if (Config.HotwordContexts != null && Config.HotwordContexts.Count > 0)
{
corpus["context"] = new Dictionary<string, object>
{
["context_type"] = "dialog_ctx",
["context_data"] = Config.HotwordContexts
.Select(text => new Dictionary<string, object> { ["text"] = text })
.ToList()
};
}
// Add platform hotword table ID
if (!string.IsNullOrEmpty(Config.BoostingTableId))
{
corpus["boosting_table_id"] = Config.BoostingTableId;
}
// Add corpus to the request only when it is not empty
if (corpus.Count > 0)
{
request["corpus"] = corpus;
}
}

This code shows how to dynamically construct the corpus field based on configuration. The key point is that we add the corpus field only when hotword configuration actually exists. This design ensures backward compatibility: when no hotwords are configured, the request structure remains exactly the same as before. Backward compatibility matters; adding a feature should not disrupt existing logic.

Between the frontend and backend, hotword parameters are passed through WebSocket control messages. HagiCode is designed so that when the frontend starts recording, it loads the hotword configuration from localStorage and sends it to the backend through a WebSocket message.

const controlMessage = {
type: 'control',
payload: {
command: 'StartRecognition',
contextText: '高血压\n糖尿病\n冠心病',
boosting_table_id: 'medical_table',
combineMode: false
}
};

There is one detail to note here: the frontend passes multi-line text separated by newline characters, and the backend needs to parse it. The backend WebSocket handler parses these parameters and passes them to the SDK:

private async Task HandleControlMessageAsync(
string connectionId,
DoubaoSession session,
ControlMessage message)
{
if (message.Payload is SessionControlRequest controlRequest)
{
// Parse hotword parameters
string? contextText = controlRequest.ContextText;
string? boostingTableId = controlRequest.BoostingTableId;
bool? combineMode = controlRequest.CombineMode;
// Parse multi-line text into a hotword list
if (!string.IsNullOrEmpty(contextText))
{
var hotwords = contextText
.Split('\n', StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.Where(s => s.Length > 0)
.ToList();
session.HotwordContexts = hotwords;
}
session.BoostingTableId = boostingTableId;
}
}

With this design, passing hotword configuration from frontend to backend becomes clear and efficient. There is nothing especially mysterious about it; the data is simply passed through layer by layer.

In real usage, configuring custom hotwords is very simple. Open the speech recognition settings page in HagiCode and find the “Hotword Configuration” section. In the “Custom Hotword Text” input box, enter one hotword phrase per line.

For example, if you are developing a medical-related application, you could configure it like this:

高血压
糖尿病
冠心病
心绞痛
心肌梗死
心力衰竭

After you save the configuration, these hotwords are automatically passed to the Doubao API every time speech recognition starts. In our tests, once hotwords were configured, the recognition accuracy for related professional terms improved noticeably. The improvement is real, and clearly better than before.

If you need to manage a large number of hotwords, or if the hotwords need frequent updates, the platform hotword table mode is a better fit. First, create a hotword table on the Doubao self-learning platform and obtain the generated boosting_table_id, then enter this ID on the HagiCode settings page.

The Doubao self-learning platform provides capabilities such as bulk import and categorized management for hotwords, which is very practical for teams that need to manage large sets of specialized terminology. By managing hotwords on the platform, you can maintain them centrally and roll out updates consistently. Once the hotword list becomes large, having a single place to manage it is much more practical than manual entry every time.

In some complex scenarios, you may need to use both custom hotwords and a platform hotword table at the same time. In that case, simply configure both in HagiCode and enable the “Combination Mode” switch.

In combination mode, the Doubao API considers both hotword sources at the same time, so recognition accuracy is usually higher than using either source alone. However, it is worth noting that combination mode increases request complexity, so it is best to decide whether to enable it after practical testing. More complexity is only worth it if the real-world results justify it.

Integrating the hotword feature into the HagiCode project is very straightforward. Here are some commonly used code snippets:

import {
loadHotwordConfig,
saveHotwordConfig,
validateHotwordConfig,
parseContextText,
getEffectiveHotwordMode,
type HotwordConfig
} from '@/types/hotword';
// Load and validate configuration
const config = loadHotwordConfig();
const validation = validateHotwordConfig(config);
if (!validation.isValid) {
console.error('Hotword configuration validation failed:', validation.errors);
return;
}
// Parse hotword text
const hotwords = parseContextText(config.contextText);
console.log('Parsed hotwords:', hotwords);
// Get effective hotword mode
const mode = getEffectiveHotwordMode(config);
console.log('Current hotword mode:', mode);

Backend usage is similarly concise:

var config = new DoubaoVoiceConfig
{
AppId = "your_app_id",
AccessToken = "your_access_token",
ServiceUrl = "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async",
// Configure custom hotwords
HotwordContexts = new List<string>
{
"高血压",
"糖尿病",
"冠心病"
},
// Configure platform hotword table
BoostingTableId = "medical_table_v1"
};
var client = new DoubaoVoiceClient(config, logger);
await client.ConnectAsync();
await client.SendFullClientRequest();

There are several points that deserve special attention when implementing and using the hotword feature.

First is the character limit. The Doubao API has strict restrictions on hotwords, including line count, characters per line, and total character count. If any limit is exceeded, the API returns an error. In HagiCode’s frontend implementation, we check these constraints during user input through validation functions, which prevents invalid configurations from being sent to the backend. Catching problems early is always better than waiting for the API to fail.

Second is the format of boosting_table_id. This field allows only letters, numbers, underscores, and hyphens, and it cannot contain spaces or other special characters. When creating a hotword table on the Doubao self-learning platform, be sure to follow the naming rules. That kind of strict format validation is common for APIs.

Third is backward compatibility. Hotword parameters are entirely optional. If no hotwords are configured, the system behaves exactly as it did before. This design ensures that existing users are not affected in any way, and it also makes gradual migration and upgrades easier. Adding a feature should not disrupt the previous logic.

Finally, there is error handling. When hotword configuration is invalid, the Doubao API returns corresponding error messages. HagiCode’s implementation records detailed logs to help developers troubleshoot issues. At the same time, the frontend displays validation errors in the UI to help users correct the configuration. Good error handling naturally leads to a better user experience.

Through this article, we have provided a detailed introduction to the complete solution for implementing Doubao speech recognition hotwords in the HagiCode project. This solution covers the entire process from requirement analysis and technical selection to code implementation, giving developers a practical example they can use for reference.

The key points can be summarized as follows. First, the Doubao API supports both custom hotwords and platform hotword tables, and they can be used independently or in combination. Second, the frontend uses localStorage to store configuration in a simple and efficient way. Third, the backend passes hotword parameters by dynamically constructing the corpus field, preserving strong backward compatibility. Fourth, comprehensive validation logic ensures configuration correctness and avoids invalid requests. Overall, the solution is not complicated; it simply follows the API requirements carefully.

Implementing the hotword feature further strengthens HagiCode’s capabilities in the speech recognition domain. By flexibly configuring business-related professional terms, developers can help the speech recognition system better understand content from specific domains and therefore provide more accurate services. Ultimately, technology should serve real business needs, and solving practical problems is what matters most.

If you found this article helpful, feel free to give HagiCode a Star on GitHub. Your support motivates us to keep sharing technical practice and experience. In the end, writing and sharing technical content that helps others is a pleasure in itself.


Thank you for reading. If you found this article useful, click the like button below 👍 so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and positions.

Solving Browser WebSocket Authentication Challenges: A Practical Proxy Pattern for Doubao Speech Recognition

Solving Browser WebSocket Authentication Challenges: A Practical Proxy Pattern for Doubao Speech Recognition

Section titled “Solving Browser WebSocket Authentication Challenges: A Practical Proxy Pattern for Doubao Speech Recognition”

The browser WebSocket API does not support custom HTTP headers, which creates a challenge for speech recognition services that require authentication data in headers. This article shares how we solved that problem in the HagiCode project with a backend proxy pattern, and how the approach evolved from playground experiments to production use.

When we started building speech recognition for the HagiCode project, we confidently chose ByteDance’s Doubao speech recognition service. The initial design was straightforward: let the frontend connect directly to Doubao’s WebSocket service. How hard could that be? Just open a connection and send some data, right?

Then came the surprise: Doubao’s API requires authentication information to be passed through HTTP headers, including things like accessToken and secretKey. That immediately became awkward, because the browser WebSocket API simply does not support setting custom headers.

So what do you do when the browser will not let you send them?

At the time, we weighed two options:

  1. Put the credentials into URL query parameters - simple and blunt
  2. Add a proxy layer on the backend - more work at first glance

The first option exposes credentials directly in frontend code and local storage. Is that really safe? I was not comfortable with it. On top of that, some APIs require header-based verification, so this approach is not even viable.

In the end, we chose the second option: implement a WebSocket proxy on the backend. Coincidentally, this pattern was first validated in our playground environment, and only after we confirmed its stability did we move it into production. After all, nobody wants production to double as a lab experiment.

The approach shared in this article comes from our practical experience in the HagiCode project.

HagiCode is an AI coding assistant project with voice interaction support. Because we needed to call a speech recognition service from the frontend, we ran straight into this WebSocket authentication problem, which led us to the solution described here. Sometimes these technical roadblocks are frustrating, but they also force you to learn patterns that turn out to be useful later.

The standard WebSocket API looks wonderfully simple:

const ws = new WebSocket('wss://example.com/ws');

But that simplicity is exactly where the problem lies - it only passes parameters in the URL, and it cannot set headers the way an HTTP request can:

// This is not supported in the WebSocket API
const ws = new WebSocket('wss://example.com/ws', {
headers: {
'Authorization': 'Bearer token'
}
});

And that is the core issue. For services like Doubao speech recognition that depend on header-based authentication, this limitation is a hard blocker.

Once you accept that constraint, the architecture has to change.

When designing the solution, we compared the trade-offs carefully.

Decision 1: Choosing the proxy pattern

We compared two approaches:

OptionProsConsDecision
Native WebSocketLightweight, simple, direct forwardingConnection management must be handled manuallyChosen
SignalRAutomatic reconnection, strong typingOverly complex, extra dependenciesRejected

We ultimately chose native WebSocket. To be honest, it was the lightest option and a better fit for simple bidirectional binary stream forwarding. Pulling in SignalR would have felt like overengineering for this use case, and it could add extra latency.

Decision 2: Connection management strategy

We adopted a “one connection, one session” model - each frontend WebSocket connection maps to its own independent Doubao backend connection.

The benefits are straightforward:

  • Simple to implement and aligned with the common usage pattern
  • Easier to debug and troubleshoot
  • Good resource isolation, preventing interference between sessions

Put simply, the direct solution is sometimes the best one. Complexity does not automatically make a design better.

Decision 3: Storing authentication data

Credentials are stored in backend configuration files (appsettings.yml or environment variables) and loaded through dependency injection:

  • Simple configuration model that matches existing backend conventions
  • Sensitive data never reaches the frontend
  • Supports multi-environment setup for development, testing, and production

That level of separation matters. No one wants credentials floating around in places they should not be.

The overall data flow looks like this:

Frontend (browser)
│ ws://backend/api/voice/ws
│ WebSocket (binary)
Backend (proxy)
│ wss://openspeech.bytedance.com/
│ (with authentication headers)
Doubao API

The flow itself is not complicated:

  1. The frontend connects to the backend proxy through WebSocket
  2. The backend proxy receives audio data and connects to the Doubao API with authenticated headers
  3. The Doubao API returns recognition results, and the proxy forwards them to the frontend
  4. The whole process remains fully asynchronous with bidirectional streaming

Once the responsibilities are split clearly, the design becomes quite natural.

app.Map("/ws", async context =>
{
if (context.WebSockets.IsWebSocketRequest)
{
// Read configuration from query parameters
var appId = context.Request.Query["appId"];
var accessToken = context.Request.Query["accessToken"];
// Validate required parameters
if (string.IsNullOrEmpty(appId) || string.IsNullOrEmpty(accessToken))
{
context.Response.StatusCode = 400;
return;
}
// Accept the WebSocket connection
using var webSocket = await context.WebSockets.AcceptWebSocketAsync();
// Message handling loop
var buffer = new byte[4096];
while (!webSocket.CloseStatus.HasValue)
{
var result = await webSocket.ReceiveAsync(buffer, CancellationToken.None);
if (result.MessageType == WebSocketMessageType.Close)
{
await webSocket.CloseAsync(
result.CloseStatus.Value,
result.CloseStatusDescription,
CancellationToken.None);
break;
}
// Process audio data
await HandleAudioDataAsync(buffer, result.Count);
}
}
});
public class DoubaoSessionManager : IDoubaoSessionManager
{
private readonly ConcurrentDictionary<string, DoubaoSession> _sessions = new();
public DoubaoSession CreateSession(string connectionId)
{
var session = new DoubaoSession(connectionId);
_sessions[connectionId] = session;
return session;
}
public async Task SendAudioAsync(string connectionId, byte[] audioData)
{
if (_sessions.TryGetValue(connectionId, out var session))
{
await session.SendAudioAsync(audioData);
}
}
public void RemoveSession(string connectionId)
{
if (_sessions.TryRemove(connectionId, out var session))
{
session.Dispose();
}
}
}

Using ConcurrentDictionary for session management means thread safety is largely handled for us. Each incoming connection gets its own session, and cleanup happens automatically on disconnect.

public class ClientConfigDto
{
public string AppId { get; set; } = null!;
public string Access set; } =Token { get; null!;
public string? ServiceUrl { get; set; }
public string? ResourceId { get; set; }
public int? SampleRate { get; set; }
public int? BitsPerSample { get; set; }
public int? Channels { get; set; }
public void Validate()
{
if (string.IsNullOrWhiteSpace(AppId))
throw new ArgumentException("AppId is required");
if (string.IsNullOrWhiteSpace(AccessToken))
throw new ArgumentException("AccessToken is required");
}
}

Configuration validation helps surface problems during startup instead of letting them fail later at runtime. That safeguard is worth having.

The frontend and backend use JSON text messages for control, while binary messages carry audio data.

Example control message:

{
"type": "control",
"messageId": "msg_123",
"timestamp": "2026-03-03T10:00:00Z",
"payload": {
"command": "StartRecognition",
"parameters": {
"hotwordId": "hotword1",
"boosting_table_id": "table123"
}
}
}

Example recognition result:

{
"type": "result",
"timestamp": "2026-03-03T10:00:03Z",
"payload": {
"text": "Hello world",
"confidence": 0.95,
"duration": 1500,
"isFinal": true,
"utterances": [
{
"text": "Hello",
"startTime": 0,
"endTime": 800,
"definite": true
}
]
}
}

This design separates control signals from audio payloads, which makes the system easier to reason about and implement. Splitting responsibilities cleanly is often the right call.

class DoubaoVoiceClient {
constructor(config) {
this.config = config;
this.ws = null;
}
async connect() {
const url = new URL(this.config.wsUrl);
// Add query parameters
Object.entries(this.config.params).forEach(([key, value]) => {
url.searchParams.set(key, value);
});
this.ws = new WebSocket(url);
return new Promise((resolve, reject) => {
this.ws.onopen = () => {
console.log('[DoubaoVoice] Connected');
resolve();
};
this.ws.onmessage = (event) => {
this._handleMessage(JSON.parse(event.data));
};
this.ws.onerror = reject;
});
}
_handleMessage(message) {
switch (message.type) {
case 'status':
this._handleStatus(message.payload);
break;
case 'result':
this.onResult?.(message.payload);
break;
case 'error':
console.error('[DoubaoVoice] Error:', message.payload);
break;
}
}
}
// Usage example
const client = new DoubaoVoiceClient({
wsUrl: 'ws://localhost:5000/ws',
params: {
appId: 'your-app-id',
accessToken: 'your-access-token',
sampleRate: 16000,
bitsPerSample: 16,
channels: 1
}
});

Using AudioWorkletNode for audio processing gives better performance:

audio-worklet.js
class AudioProcessorWorklet extends AudioWorkletProcessor {
process(inputs, outputs, parameters) {
const input = inputs[0]?.[0];
if (!input) return true;
// Convert to 16-bit PCM
const pcm = new Int16Array(input.length);
for (let i = 0; i < input.length; i++) {
pcm[i] = Math.max(-32768, Math.min(32767, input[i] * 32767));
}
this.port.postMessage({
type: 'audioData',
data: pcm.buffer
}, [pcm.buffer]);
return true;
}
}
registerProcessor('audio-processor', AudioProcessorWorklet);
// Main thread code
async function startAudioRecording() {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true,
sampleRate: 48000
}
});
const audioContext = new AudioContext();
const audioSource = audioContext.createMediaStreamSource(stream);
await audioContext.audioWorklet.addModule('/audio-worklet.js');
const audioWorkletNode = new AudioWorkletNode(audioContext, 'audio-processor');
audioWorkletNode.port.onmessage = (event) => {
if (event.data.type === 'audioData' && ws?.readyState === WebSocket.OPEN) {
ws.send(event.data.data); // Send binary data directly
}
};
audioSource.connect(audioWorkletNode);
}

AudioWorklet performs far better than ScriptProcessorNode and avoids the audio stutter problems that older processing paths often introduce.

{
"Serilog": {
"MinimumLevel": {
"Default": "Information",
"Override": {
"Microsoft": "Warning",
"System": "Warning"
}
},
"WriteTo": [
{ "Name": "Console" },
{
"Name": "File",
"Args": { "path": "logs/log-.txt", "rollingInterval": "Day" }
}
]
},
"Kestrel": {
"Urls": "http://0.0.0.0:5000"
}
}

Logging configuration matters because it makes problems much easier to trace. Serilog’s file sink can roll logs daily, which keeps individual log files at a manageable size.

  • Regularly log session state to trace the full connection lifecycle
  • Monitor the number and duration of audio segments to detect abnormal connections
  • Record connection status and reconnection behavior for the Doubao service

These are basic operational practices, but they make a real difference in production.

  • Capture and log all WebSocket exceptions
  • Use IAsyncDisposable to ensure resources are cleaned up
  • Implement graceful connection shutdown and timeout handling

In short, favor robustness.

  • Sample rate: 16000 Hz (recommended) or 8000 Hz
  • Bit depth: 16-bit
  • Channels: mono
  • Encoding: PCM (raw)

If the format is wrong, recognition may fail or quality may degrade significantly. These requirements matter.

  • Keep sensitive credentials only in backend configuration
  • Enforce connection limits to prevent resource exhaustion
  • Use HTTPS/WSS in production

Security is never a minor concern.

  • Use asynchronous operations to avoid blocking
  • Tune buffer sizes as needed (default: 4096 bytes)
  • Consider connection pooling and reuse strategies

Apply these optimizations where they make sense for your workload.

  1. Docker deployment: Package the proxy service as a container for easier scaling and management
  2. Load balancing: Use Nginx or Envoy as a reverse proxy for WebSocket traffic
  3. Health checks: Implement heartbeat-based checks to monitor service availability
  4. Log aggregation: Send logs to a centralized logging system such as ELK or Loki

Deployment can be simple or complex depending on the team and environment, so adjust accordingly.

The WebSocket proxy pattern solves the core problem that the browser WebSocket API does not support custom headers. In the HagiCode project, this pattern proved both feasible and stable as it moved from playground validation into production deployment.

Key takeaways:

  • A backend proxy can pass authentication information securely
  • Native WebSocket is lightweight and efficient for simple scenarios
  • The “one connection, one session” model simplifies both implementation and debugging
  • The frontend-backend protocol should separate control signals from audio data

If you are building a feature that also depends on WebSocket authentication, I hope this pattern gives you a useful starting point.

If you have questions, feel free to discuss them with us. Technical progress happens faster when people compare notes.


Thank you for reading. If you found this article helpful, please click the like button below 👍 so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

AI Compose Commit: Using AI to Intelligently Refactor the Git Commit Workflow

AI Compose Commit: Using AI to Intelligently Refactor the Git Commit Workflow

Section titled “AI Compose Commit: Using AI to Intelligently Refactor the Git Commit Workflow”

In the software development process, committing code is a routine task every programmer faces every day. But have you ever run into this situation: at the end of a workday, you open Git, see dozens of unstaged modified files, and have no idea how to organize them into sensible commits?

The traditional approach is to manually stage files in batches, commit them one by one, and write commit messages. This process is both time-consuming and error-prone. We often waste quite a bit of time on this, and after all, nobody wants to worry about these tedious chores late at night when they are already tired.

In the HagiCode project, we introduced a new feature - AI Compose Commit - designed to completely transform this workflow. By using AI to intelligently analyze all uncommitted changes in the working tree, it automatically groups them into multiple logical commits and performs standards-compliant commit operations. In this article, we will take a deep dive into the implementation principles, technical architecture, and the challenges and solutions we encountered in practice.

The approach shared in this article comes from our practical experience in the HagiCode project.

As a version control system, Git gives developers powerful code management capabilities. But in real-world usage, committing often becomes a bottleneck in the development workflow:

  1. Manual grouping is time-consuming: When there are many file changes, developers need to inspect each file one by one and decide which changes belong to the same feature. That takes a lot of mental effort.
  2. Inconsistent commit message quality: Writing commit messages that follow the Conventional Commits specification requires experience and skill, and beginners often produce non-standard commits.
  3. Complex multi-repository management: In a monorepo environment, switching between different repositories adds operational complexity.
  4. Interrupted workflow: Committing code interrupts your train of thought and hurts coding efficiency.

These issues are especially obvious in large projects and collaborative team environments. A good development tool should let developers focus on core coding work instead of getting bogged down in a cumbersome commit workflow.

In recent years, AI has been used more and more widely in software development. From code completion and bug detection to automatic documentation generation, AI is gradually reaching every stage of the development process. In Git workflows, while some tools already support commit message generation, most are limited to single-commit scenarios and lack the ability to intelligently analyze and group changes across the entire working tree.

HagiCode encountered these pain points during development as well. We tried many tools, but each had one limitation or another. Either the functionality was incomplete, or the user experience was not good enough. That is why we ultimately decided to implement AI Compose Commit ourselves.

HagiCode’s AI Compose Commit feature was created to fill that gap. It does not just generate commit messages - it takes over the entire process from file analysis to commit execution.

While implementing AI Compose Commit, we faced several technical challenges:

  1. File semantic understanding: The AI needs to understand semantic relationships between file changes and decide which files belong to the same functional module. This requires deep analysis of file content, directory structure, and change context.

  2. Commit grouping strategy: How should a reasonable grouping standard be defined? By feature, by module, or by file type? Different projects may need different strategies.

  3. Real-time feedback and asynchronous processing: Git operations can take a long time, especially when handling a large number of files. How can we complete complex operations while preserving a good user experience?

  4. Multi-repository support: In a monorepo architecture, operations need to be routed correctly between the main repository and sub-repositories.

  5. Error handling and rollback: If one commit fails, how should already executed commits be handled? Do already staged files need to be rolled back?

  6. Commit message consistency: Generated commit messages need to match the project’s existing style and remain consistent with historical commits.

AI processing over a large number of file changes consumes significant time and compute resources. We needed to optimize in the following areas:

  • Reduce unnecessary AI calls
  • Optimize how file context is constructed
  • Implement efficient Git operation batching

These issues all appeared in real HagiCode usage, and we only arrived at a relatively complete solution through repeated iteration and optimization. If you are building a similar tool, we hope our experience gives you some inspiration.

We adopted a layered architecture to implement AI Compose Commit, ensuring good scalability and maintainability:

GitController provides the POST /api/git/auto-compose-commit endpoint as the entry point. To optimize user experience, we adopted a fire-and-forget asynchronous pattern:

  • After the client sends a request, the server immediately returns HTTP 202 Accepted
  • The actual AI processing runs asynchronously in the background
  • When processing finishes, the client is notified through SignalR

This design ensures that even if AI processing takes several minutes, users still get an immediate response and do not feel that the system is frozen.

2. Application Service Layer (Application Layer)

Section titled “2. Application Service Layer (Application Layer)”

GitAppService is responsible for the core business logic:

  • Repository detection: supports multi-repository management in a monorepo
  • Lock management: prevents conflicts caused by concurrent operations
  • File staging coordination: interacts with the AI processing flow
  • Error rollback: restores state when failures occur

3. Distributed Computing Layer (Orleans Grains)

Section titled “3. Distributed Computing Layer (Orleans Grains)”

AIGrain serves as the core execution unit for AI operations. It implements the AutoComposeCommitAsync method from the IAIGrain interface:

// Define the interface method for AI-powered automatic commit composition
// Parameter notes:
// - projectId: unique project identifier
// - unstagedFiles: list of unstaged files, including file paths and status information
// - projectPath: project root directory path (optional), used to access project context
// Return value: a response object containing execution results, including success/failure status and detailed information
[Alias("AutoComposeCommitAsync")]
[ResponseTimeout("00:20:00")] // 20-minute timeout, suitable for handling large change sets
Task<AutoComposeCommitResponseDto> AutoComposeCommitAsync(
string projectId,
GitFileStatusDto[] unstagedFiles,
string? projectPath = null);

This method sets a 20-minute timeout to handle large change sets. In real-world HagiCode usage, we found that some projects can involve hundreds of changed files in a single pass, requiring more processing time.

Through the abstract IAIService interface, we implemented a pluggable AI service architecture. We currently use the Claude Helper service, but it can be easily switched to other AI providers.

The AI needs to understand the state of each file before it can make intelligent decisions. We build file context through the BuildFileChangesXml method:

/// <summary>
/// Build an XML representation of file changes to provide the AI with complete file context information
/// </summary>
/// <param name="stagedFiles">List of staged files, including file path, status, and old path (for rename operations)</param>
/// <returns>A formatted XML string containing metadata for all files</returns>
private static string BuildFileChangesXml(GitFileStatusDto[] stagedFiles)
{
var sb = new StringBuilder();
sb.AppendLine("<files>");
foreach (var file in stagedFiles)
{
sb.AppendLine(" <file>");
// Use XML escaping to ensure special characters do not break the XML structure
sb.AppendLine($" <path>{System.Security.SecurityElement.Escape(file.Path)}</path>");
sb.AppendLine($" <status>{System.Security.SecurityElement.Escape(file.Status)}</status>");
// Handle file rename scenarios and record the old path so the AI can understand change relationships
if (!string.IsNullOrEmpty(file.OldPath))
{
sb.AppendLine($" <oldPath>{System.Security.SecurityElement.Escape(file.OldPath)}</oldPath>");
}
sb.AppendLine(" </file>");
}
sb.AppendLine("</files>");
return sb.ToString();
}

This XML-based context includes file paths, statuses, and old paths for rename operations, giving the AI complete metadata. With a structured XML format, we ensure that the AI can accurately understand the state and change type of each file.

To let the AI execute Git operations directly, we configured comprehensive tool permissions:

// Define the set of tools the AI can use, including file operations and Git command execution permissions
// Read/Write/Edit: file reading, writing, and editing capabilities
// Bash(git:*): permission to execute all Git commands
// Other Bash commands: used to inspect file contents and directory structure so the AI can understand context
var allowedTools = new[]
{
"Read", "Write", "Edit",
"Bash(git:*)", "Bash(cat:*)", "Bash(ls:*)", "Bash(find:*)",
"Bash(grep:*)", "Bash(head:*)", "Bash(tail:*)", "Bash(wc:*)"
};
// Build the complete AI request object
var request = new AIRequest
{
Prompt = prompt, // Complete prompt template, including task instructions and constraints
WorkingDirectory = projectPath ?? GetTempDirectory(), // Working directory, ensuring the AI runs in the correct project context
AllowedTools = allowedTools, // Allowed tool set
PermissionMode = PermissionMode.bypassPermissions, // Bypass permission checks so Git operations can run directly
LanguagePreference = languagePreference // Language preference setting, ensuring commit messages match user expectations
};

Here we use PermissionMode.bypassPermissions, which allows the AI to execute Git commands directly without user confirmation. This is central to the feature design, but it also requires strict input validation to prevent abuse. In HagiCode’s production deployment, we ensured the safety of this mechanism through backend parameter validation and log monitoring.

After the AI finishes execution, it returns structured results. We implemented a dual parsing strategy to ensure compatibility:

/// <summary>
/// Parse commit execution results returned by the AI, supporting both delimiter format and regex format
/// </summary>
/// <param name="aiResponse">Raw response content returned by the AI</param>
/// <returns>A parsed list of commit results, where each result includes the commit hash and execution status</returns>
private List<CommitResultDto> ParseCommitExecutionResults(string aiResponse)
{
var results = new List<CommitResultDto>();
// Prefer delimiter-based parsing (new format), which is more explicit and reliable
if (aiResponse.Contains("---"))
{
logger.LogDebug("Using delimiter-based parsing for AI response");
results = ParseDelimitedFormat(aiResponse);
if (results.Count > 0)
{
return results; // Successfully parsed, return the results directly
}
logger.LogWarning("Delimiter-based parsing produced no results, falling back to regex");
}
else
{
logger.LogDebug("No delimiter found, using legacy regex-based parsing");
}
// Fall back to regex parsing (old format) to ensure backward compatibility
return ParseLegacyFormat(aiResponse);
}

The delimiter format uses --- to separate commits, making the structure clear and easy to parse:

---
Commit 1: abc123def456
feat(auth): add user login functionality
Implement JWT-based authentication with login form and API endpoints.
Co-Authored-By: Hagicode <noreply@hagicode.com>
---
Commit 2: 789ghi012jkl
docs(readme): update installation instructions
Add new setup steps for Docker environment.
Co-Authored-By: Hagicode <noreply@hagicode.com>
---

This format makes parsing simple and reliable, while also remaining easy for humans to read.

To prevent state conflicts caused by concurrent operations, we implemented a repository lock mechanism:

// Acquire the repository lock to prevent concurrent operations
// Parameter notes:
// - fullPath: full repository path, used to identify different repository instances
// - requestedBy: requester identifier, used for tracking and logging
await _autoComposeLockService.AcquireLockAsync(fullPath, requestedBy);
try
{
// Execute the AI Compose Commit operation
// This section calls an Orleans Grain method to perform the actual AI processing and Git operations
await aiGrain.AutoComposeCommitAsync(projectId, unstagedFiles, projectPath);
}
finally
{
// Ensure the lock is released whether the operation succeeds or fails
// Using a finally block guarantees lock release even when exceptions occur, preventing deadlocks
await _autoComposeLockService.ReleaseLockAsync(fullPath);
}

The lock has a 20-minute timeout, matching the timeout used for AI operations. If the operation fails or times out, the system automatically releases the lock to avoid permanent blocking. In real HagiCode usage, we found this lock mechanism to be extremely important, especially in collaborative environments where multiple developers may trigger AI Compose Commit at the same time.

After processing completes, the system sends a notification to the frontend through SignalR:

/// <summary>
/// Send a notification when automatic commit composition is complete
/// </summary>
/// <param name="projectId">Project identifier, used to route the notification to the correct client</param>
/// <param name="totalCount">Total number of commits, including successes and failures</param>
/// <param name="successCount">Number of successful commits</param>
/// <param name="failureCount">Number of failed commits</param>
/// <param name="success">Whether the overall operation succeeded</param>
/// <param name="error">Error message (if the operation failed)</param>
private async Task SendAutoComposeCommitNotificationAsync(
string projectId,
int totalCount,
int successCount,
int failureCount,
bool success,
string? error)
{
try
{
// Build the notification DTO containing detailed execution results
var notification = new AutoComposeCommitCompletedDto
{
ProjectId = projectId,
TotalCount = totalCount,
SuccessCount = successCount,
FailureCount = failureCount,
Success = success,
Error = error
};
// Broadcast the notification to all connected clients through the SignalR Hub
await messageService.SendAutoComposeCommitCompletedAsync(notification);
logger.LogInformation(
"Auto compose commit notification sent for project {ProjectId}: {SuccessCount}/{TotalCount} succeeded",
projectId, successCount, totalCount);
}
catch (Exception ex)
{
// Log notification errors without affecting the main operation flow
// A notification failure should not cause the entire operation to fail
logger.LogError(ex, "Failed to send auto compose commit notification for project {ProjectId}", projectId);
}
}

After the frontend receives the notification, it can update the UI to show whether the commit succeeded or failed, improving the user experience. This real-time feedback mechanism received strong feedback from HagiCode users, who can clearly see when the operation finishes and what the outcome is.

AI behavior is entirely determined by the prompt, so we carefully designed the prompt template for Auto Compose Commit. Taking the Chinese version as an example (auto-compose-commit.zh-CN.hbs):

At the beginning of the prompt, we explicitly declare support for non-interactive execution mode, which is a critical requirement for CI/CD and automation scripts:

**Important Note**: This prompt may run in a non-interactive environment (such as CI/CD or automation scripts).
**Non-Interactive Mode**:
- Do not use AskUserQuestion or any interactive tools
- When user input is required:
- Use sensible defaults (for example, use feat as the commit type)
- Skip optional confirmation steps
- Record any assumptions made

This design ensures that AI Compose Commit can be used not only in interactive IDE environments, but also integrated into CI/CD pipelines to deliver a fully automated commit workflow.

To prevent the AI from executing dangerous operations, we added strict branch protection rules to the prompt:

**Branch Protection**:
- Do not perform any branch switching operations (git checkout, git switch)
- All `git commit` commands must run on the current branch
- Do not create, delete, or rename branches
- Do not modify untracked files or unstaged changes
- If branch switching is required to complete the operation, return an error instead of executing it

By constraining the AI’s tool usage scope, these rules ensure operational safety. In HagiCode’s practical testing, we verified the effectiveness of these constraints: when the AI encounters a situation that would require a branch switch, it safely returns an error instead of taking dangerous action.

The prompt defines the decision logic for file grouping in detail:

**File Grouping Decision Tree**:
├── Is it a configuration file (package.json, tsconfig.json, .env, etc.)?
│ ├── Yes -> separate commit (type: chore or build)
│ └── No -> continue
├── Is it a documentation file (README.md, *.md, docs/**)?
│ ├── Yes -> separate commit (type: docs)
│ └── No -> continue
├── Is it related to the same feature?
│ ├── Yes -> merge into the same commit
│ └── No -> commit separately
└── Is it a cross-module change?
├── Yes -> group by module
└── No -> group by feature

This decision tree gives the AI clear grouping logic, ensuring the generated commits remain semantically reasonable. In real HagiCode usage, we found that this decision tree can handle the vast majority of common scenarios, and the grouping results match developer expectations.

To keep commit messages consistent with project history, the prompt requires the AI to analyze recent commit history before generation:

**Historical Format Consistency**: Before generating commit messages, you **must** analyze the current repository's commit history to match the existing style.
1. Use `git log -n 15 --pretty=format:"%H|%s|%b%n---%n"` to get the recent commit history
2. Analyze the commits to identify:
- Structural patterns: does the project use multi-paragraph messages? Are there `Changes:` or `Capabilities:` sections?
- Language patterns: are commit messages in English, Chinese, or mixed?
- Common types: which commit types are most often used (`feat`, `fix`, `docs`, etc.)?
- Special formatting: are there `Co-Authored-By` lines? Any other project-specific conventions?
3. Generate commit messages that follow the detected patterns

This analysis ensures that AI-generated commit messages do not feel out of place, but instead remain stylistically aligned with the project’s history. In HagiCode’s multilingual projects, this feature is especially important because it can automatically choose the appropriate language and format based on commit history.

Every commit must include Co-Authored-By information:

**Important**: Every commit must include Co-Authored-By information
- Use the following format: `git commit -m "type(scope): subject" -m "" -m "Co-Authored-By: Hagicode <noreply@hagicode.com>"`
- Or include the `Co-Authored-By` line directly in the commit message

This is not only for contribution compliance, but also for tracing AI-assisted commit history. HagiCode treats this as a mandatory rule to ensure that all AI-generated commits carry a clear source marker.

The full AI Compose Commit workflow is as follows:

  1. User trigger: The user clicks the “AI Auto Compose Commit” button in the Git Status panel or Quick Actions Zone.
  2. API request: The frontend sends a POST request to the /api/git/auto-compose-commit endpoint.
  3. Immediate response: The server returns HTTP 202 Accepted without waiting for processing to finish.
  4. Background processing:
    • GitAppService acquires the repository lock
    • Calls AIGrain.AutoComposeCommitAsync
    • Builds the file context XML
    • Executes the AI prompt so the AI can analyze and perform commits
  5. AI execution:
    • Uses Git commands to obtain all unstaged changes
    • Reads file contents to understand the nature of the changes
    • Groups files by semantic relationship
    • Executes git add and git commit for each group
  6. Result parsing: Parses the execution results returned by the AI.
  7. Notification delivery: Notifies the frontend through SignalR.
  8. Lock release: Releases the repository lock whether the operation succeeds or fails.

This workflow is designed so that users can continue with other work immediately after initiating the operation, without waiting for the AI to finish. Feedback from HagiCode users shows that this asynchronous processing model greatly improves the workflow experience.

We implemented multi-layer error handling:

// Validate request parameters to prevent invalid requests from reaching backend processing logic
if (request.UnstagedFiles == null || request.UnstagedFiles.Count == 0)
{
return BadRequest(new
{
message = "No unstaged files provided. Please make changes in the working directory first.",
status = "validation_failed"
});
}

If an error occurs during AI processing, the system performs a rollback operation and unstages files that were already staged, preventing an inconsistent state from being left behind. In real HagiCode usage, this mechanism saved us from multiple unexpected interruptions and ensured repository state integrity.

The 20-minute timeout ensures that long-running operations do not block resources indefinitely. After a timeout, the system releases the lock and notifies the user that the operation failed. In real HagiCode usage, we found that most operations complete within 2 to 5 minutes, and only extremely large change sets approach the timeout limit.

Best Practices for Using AI Compose Commit

Section titled “Best Practices for Using AI Compose Commit”

AI Compose Commit is best suited for the following scenarios:

  • At the end of a workday, when you need to process changes across many files in one batch
  • After a refactoring operation, when several related files need to be committed separately
  • After a feature is completed, when related changes need to be grouped into commits

It is not suitable for the following scenarios:

  • Quick commits for a single file (a normal commit is faster)
  • Scenarios requiring precise control over commit content
  • Commits containing sensitive information that require human review

Although AI-powered intelligent grouping is powerful, developers should still review the generated commits:

  • Check whether the grouping matches expectations
  • Verify the accuracy of commit messages
  • Confirm that no files were omitted or incorrectly included

If you find an unreasonable grouping, you can use git reset --soft HEAD~N to undo it and regroup. HagiCode’s experience shows that even when AI grouping is smart, manual review is still valuable, especially for important feature commits.

Make sure your project’s Git configuration supports Conventional Commits:

Terminal window
# Install commitlint
npm install -g @commitlint/cli @commitlint/config-conventional
# Configure commitlint
echo "module.exports = {extends: ['@commitlint/config-conventional']}" > commitlint.config.js

This lets you validate commit message format in CI/CD workflows and keeps it aligned with the format generated by AI Compose Commit.

If you want to implement a similar AI-assisted commit feature in your own project, here are our suggestions:

Begin with single commit message generation, then gradually expand to multi-commit grouping. This makes it easier to validate and iterate. HagiCode followed the same path: early versions only supported single commits, and later expanded to intelligent grouping across multiple commits.

Do not implement AI invocation logic from scratch. Using an existing SDK reduces development time and potential bugs. We used the Claude Helper service, which provides a stable interface and robust error handling.

Prompt quality directly determines output quality. Spend time designing a detailed prompt, including:

  • Clear task descriptions
  • Specific output format requirements
  • Rules for handling edge cases
  • Illustrative examples

HagiCode invested heavily in prompt design, and this was one of the key reasons the feature succeeded.

AI operations can fail for many reasons, such as network issues, API rate limits, or content moderation. Make sure your system can handle these errors gracefully and provide meaningful error information.

Do not automate everything completely. Leave users in control. Provide options to review grouping results, adjust groups, and manually edit commit messages to balance automation and flexibility. Although HagiCode supports automatic execution, it still preserves preview and adjustment capabilities.

When constructing file context, filter out files that do not need AI analysis:

// Filter out generated files and excessively large files to reduce the AI processing burden
var relevantFiles = stagedFiles
.Where(f => !IsGeneratedFile(f.Path))
.Where(f => !IsLargeFile(f.Path))
.ToArray();

If multiple independent repositories are supported, commits in different repositories can be processed in parallel to improve overall efficiency.

Cache project commit history analysis results to avoid re-analyzing them every time. Historical format preferences can be stored in configuration files to reduce AI calls.

AI Compose Commit represents a deep application of AI technology in software development tools. By intelligently analyzing file changes, automatically grouping commits, and generating standards-compliant commit messages, it significantly improves the efficiency of Git workflows and allows developers to focus more on core coding work.

During implementation, we learned several important lessons:

  1. User feedback is critical: Early versions used synchronous waiting, and users reported a poor experience. After switching to a fire-and-forget model, satisfaction improved significantly.
  2. Prompt design determines quality: A carefully designed prompt does more to guarantee AI output quality than a complex algorithm.
  3. Safety always comes first: Granting the AI permission to execute Git commands directly improves efficiency, but it must be paired with strict constraints and validation.
  4. Progressive improvement works best: Starting with simple scenarios and gradually increasing complexity is more likely to succeed than trying to implement everything at once.

In the future, we plan to further optimize AI Compose Commit, including:

  • Supporting more commit grouping strategies (by time, by developer, and so on)
  • Integrating code review workflows to trigger review automatically before commits
  • Supporting custom commit message templates to meet the personalized needs of different projects

If you find the approach shared in this article valuable, give HagiCode a try and experience how this feature works in real development. After all, practice is the only criterion for testing truth.


Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by me, and reflects my own views and positions.