Blog

How to Implement Automatic Retry for Agent CLIs Like Claude Code and Codex

Apr 18, 2026

How to Implement Automatic Retry for Agent CLIs Like Claude Code and Codex

The phrase automatic retry looks like a small toggle switch, but once you put it into a real engineering environment, it is nothing like that. Hello everyone, I am HagiCode creator Yu Kun. Today, I do not want to trade in empty talk. I want to talk about how automatic retry for Agent CLIs such as Claude Code and Codex should actually be done, so it can both recover from exceptions and avoid dragging the system into endless repeated execution.

Background

If you have also been working on AI coding lately, you have probably already run into this kind of problem: the task does not fail immediately, but breaks halfway through execution.

In an ordinary HTTP request, that often just means sending it again, maybe with some exponential backoff. But Agent CLIs are different. Tools like Claude Code and Codex usually execute in a streaming manner, pushing output out chunk by chunk. During that process, they may also bind to a thread, session, or resume token. In other words, the question is not simply, “Did this request fail or not?” It becomes:

Does the content that was already emitted still count?
Can the current context continue running?
Should this failure be recovered automatically?
If it should be recovered, how long should we wait before retrying, what should we send during the retry, and should we still reuse the original context?

The first time many teams build this part, they instinctively write the most naive version: if an error occurs, try once more. That idea is perfectly natural, but once it reaches a real project, one problem after another starts surfacing.

Some errors are clearly temporary failures, yet get treated as final failures
Some errors are not worth retrying at all, yet the system replays them over and over
Requests with a thread and requests without a thread get treated exactly the same
The backoff strategy has no boundary, and background requests overload themselves

While integrating multiple Agent CLIs, HagiCode also stepped into these traps. On the Codex side in particular, the first issue we exposed was that a certain type of reconnect message was not recognized as a retryable terminal state, so the recovery mechanism we already had never got a chance to take effect. To put it plainly, it was not that the system lacked automatic retry. The system simply failed to recognize that this particular failure was worth retrying.

So the core point of this article is very clear: automatic retry is not a button, but a layered design.

About HagiCode

The approach shared in this article comes from real practice in our HagiCode project. What HagiCode is trying to do is not just connect one model and call it a day. It is about unifying the streaming messages, tool calls, failure recovery, and session context of multiple Agent CLIs into one execution model that can be maintained over the long term.

One of the things I care about most is how to make AI coding truly land in real engineering work. Writing a demo is not hard. The hard part is turning that demo into something a team is genuinely willing to use for a long time. HagiCode takes automatic retry seriously not because the feature looks sophisticated, but because if long-running, streaming, resumable CLI execution is not stable, what users see is not an intelligent assistant, but a command wrapper that drops the connection halfway through every other run.

If you want to look at the project entry points first, here are two:

GitHub: github.com/HagiCode-org/site
Official website: hagicode.com

Taking it one step further, HagiCode is also on Steam now. If you use Steam, feel free to add it to your wishlist first:

Steam store page (add to wishlist / view details)

Why Automatic Retry for Agent CLIs Is Harder Than Ordinary Retry

This is a very practical question, so let us go straight to the conclusion: the difficulty of automatic retry for Agent CLIs is not “try again after a few seconds,” but “can it still continue in the original context?”

You can think of it as a long conversation. Ordinary API retry is more like redialing when the phone line is busy. Agent CLI retry is more like the signal dropping while the other party is halfway through a sentence, and then you have to decide whether to call back, whether to start over when you do, and whether the other party still remembers where the conversation stopped. These are not the same kind of engineering problem at all.

More concretely, there are four especially typical difficulties.

1. It is streaming

Once output has already been sent to the user, you can no longer handle failure the way you would with an ordinary request, where you silently swallow it and quietly try again. That is because the earlier content has already been seen. If the replay strategy is wrong, the frontend can easily show duplicated text and inconsistent state, and the lifecycle of tool calls can become tangled as well. This is not metaphysics. It is engineering.

2. It usually binds to session context

Providers like Codex bind to a thread, and implementations like Claude Code also have a continuation target or an equivalent resumable context. The real prerequisite for automatic retry is not just that the error looks like a temporary failure, but also that there is still a carrier that allows this execution to continue.

3. Not every error is worth retrying

Network jitter, SSE idle timeout, and temporary upstream failures are usually worth another try. But if what you are facing is authentication failure, lost context, or a provider that has no resume capability at all, then retrying is usually not recovery. It is noise generation.

4. It needs boundaries

Unlimited automatic retry is almost always wrong. Technology trends can be noisy for a while, but engineering laws often remain stable for many years. One of them is that failure recovery must have boundaries. The system has to know how many times it can retry at most, how long it should wait each time, and when it should stop and admit that this one is really not going to recover.

Because of these characteristics, HagiCode ultimately did not implement automatic retry as a few lines of try/catch inside a specific provider. Instead, we extracted it into a shared capability layer. In the end, engineering problems still need to be solved with engineering methods.

HagiCode’s Approach: Pull Retry Out of the Provider

HagiCode’s current real-world implementation can be compressed into one sentence:

The shared layer manages the retry flow uniformly, and each concrete provider is only responsible for answering two questions: is this terminal state worth retrying, and can the current context still continue?

This is not complicated, but it is critical. Once responsibilities are separated this way, Claude Code, Codex, and even other Agent CLIs can all reuse the same skeleton. Models will change, tools will evolve, workflows will be upgraded, but the engineering foundation remains there.

Layer 1: Use a unified coordinator to manage the retry loop

The core implementation fragment in the project looks roughly like this:

internal static class ProviderErrorAutoRetryCoordinator
{
    public static async IAsyncEnumerable<CliMessage> ExecuteAsync(
        string prompt,
        ProviderErrorAutoRetrySettings? settings,
        Func<string, IAsyncEnumerable<CliMessage>> executeAttemptAsync,
        Func<bool> canRetryInSameContext,
        Func<TimeSpan, CancellationToken, Task> delayAsync,
        Func<CliMessage, bool> isRetryableTerminalMessage,
        [EnumeratorCancellation] CancellationToken cancellationToken)
    {
        var normalizedSettings = ProviderErrorAutoRetrySettings.Normalize(settings);
        var retrySchedule = normalizedSettings.Enabled
            ? normalizedSettings.GetRetrySchedule()
            : [];

        for (var attempt = 0; ; attempt++)
        {
            var attemptPrompt = attempt == 0
                ? prompt
                : ProviderErrorAutoRetrySettings.ContinuationPrompt;

            CliMessage? terminalFailure = null;

            await foreach (var message in executeAttemptAsync(attemptPrompt)
                               .WithCancellation(cancellationToken))
            {
                if (isRetryableTerminalMessage(message))
                {
                    terminalFailure = message;
                    break;
                }

                yield return message;
            }

            if (terminalFailure is null)
            {
                yield break;
            }

            if (attempt >= retrySchedule.Count || !canRetryInSameContext())
            {
                yield return terminalFailure;
                yield break;
            }

            await delayAsync(retrySchedule[attempt], cancellationToken);
        }
    }
}

What this code does is actually very straightforward, but also very effective.

Do not pass intermediate failures through directly at first; the coordinator decides whether recovery is still possible
Only when the retry budget is exhausted does the final failure actually return to the upper layer
Starting from the second attempt, the original prompt is no longer sent; a continuation prompt is sent uniformly instead

That is why I kept stressing earlier that automatic retry is not simply “make the request again.” It is not just patching an exception branch. It is managing the life cycle of an execution. That may sound like product-manager language, but in engineering terms, that is exactly what it is.

Layer 2: Snapshot the retry policy

Another issue that is very easy to overlook is this: who decides whether automatic retry is enabled for this request?

HagiCode’s answer is not to depend on some “current global configuration,” but to turn the policy into a snapshot and let it travel together with this request. That way, session queuing, message persistence, execution forwarding, and provider adaptation will not lose the policy along the way. One successful run is not a system. Sustained success is a system.

The core structure can be simplified into this:

public sealed record ProviderErrorAutoRetrySnapshot
{
    public const string DefaultStrategy = "default";

    public bool Enabled { get; init; }

    public string Strategy { get; init; } = DefaultStrategy;

    public static ProviderErrorAutoRetrySnapshot Normalize(bool? enabled, string? strategy)
    {
        return new ProviderErrorAutoRetrySnapshot
        {
            Enabled = enabled ?? true,
            Strategy = string.IsNullOrWhiteSpace(strategy)
                ? DefaultStrategy
                : strategy.Trim()
        };
    }
}

Then on the execution side, it is mapped into the settings object actually consumed by the provider. The value of this approach is very direct:

The business layer decides whether retry should be allowed
The runtime decides how retry should be performed

Each side manages its own concern without colliding with the other. Many problems are not impossible to solve. Their cost simply has not been made explicit. Turning the policy into a snapshot is essentially a way of accounting for that cost in advance.

Layer 3: Providers only decide terminal state and context viability

Once we reach the concrete Claude Code or Codex provider, the responsibility here actually becomes very thin. You can think of it as enhancement, not replacement.

Taking Codex as an example, when it hooks into the shared coordinator, it really only needs to provide three things:

await foreach (var message in ProviderErrorAutoRetryCoordinator.ExecuteAsync(
                   prompt,
                   options.ProviderErrorAutoRetry,
                   retryPrompt => ExecuteCodexAttemptAsync(...),
                   () => !string.IsNullOrWhiteSpace(resolvedThreadId),
                   DelayAsync,
                   IsRetryableTerminalFailure,
                   cancellationToken))
{
    yield return message;
}

You will notice that the provider-specific decisions are really only these two:

IsRetryableTerminalFailure
canRetryInSameContext

Codex checks whether the thread can still continue, while Claude Code checks whether the continuation target still exists. Backoff policy, retry count, and follow-up prompts should not be reinvented by every provider separately.

Once this layer is separated out, the cost of integrating more CLIs into HagiCode drops a lot. You do not have to duplicate an entire retry state machine. You only need to plug in the boundary conditions of that provider. Writing quickly is not the same as writing robustly. Being able to connect something is not the same as connecting it well. Getting it to run is also not the same as making it maintainable over time.

An Easy Mistake to Make: Do Not Treat Every Error as Retryable

In this analysis, the point I most want to single out is not “how to implement retry,” but “how to avoid the wrong retries.”

The original entry point into the problem was that Codex failed to recognize one reconnect message. By intuition, many people would pick the smallest possible fix: add one more string prefix to the whitelist. That idea is not exactly wrong, but it feels more like a demo-stage solution than a long-term maintainable one.

From the current HagiCode implementation, the system has already taken a step in a more robust direction. It no longer stares only at one literal string. Instead, it hands recoverable terminal states over to the shared coordinator uniformly. The benefits are obvious:

It is less likely to fail completely because of a small wording change in one message
Test coverage can be built around the terminal-state envelope rather than a single hard-coded text line
Retry logic becomes more consistent within the same provider

Of course, there needs to be a firm boundary here: being more general does not mean being more permissive. If the current context cannot continue, then even if the error looks like a temporary failure, it should not be replayed blindly.

This point is critical. What really makes people trust a system is not that it occasionally works, but that it is reliable most of the time. If a flow can only be maintained by experts, then it is still a long way from real adoption.

The Three Most Valuable Lessons to Keep in Practice

At this point, it makes sense to start bringing the discussion back down to implementation practice. If you are planning to build a similar capability in your own project, these are the three rules I most strongly recommend protecting first.

1. The retry budget must have boundaries

HagiCode’s current default backoff rhythm is:

10 seconds
20 seconds
60 seconds

This rhythm may not fit every system, but the existence of boundaries must remain. Otherwise, automatic retry quickly stops being a recovery mechanism and turns into an incident amplifier. Do not rush to give it an impressive name. First make sure the thing can survive two iterations inside a real team.

2. The continuation prompt should be unified

The project uses a fixed continuation prompt so that later attempts clearly follow the path of continuing the current context rather than starting a brand-new complete request. This capability is not flashy, but when you build a real project, you cannot do without it. Many things that look like magic are, once broken apart, just a polished engineering process.

3. Both the shared library and the adapter layer need mirrored tests

I especially want to say a little more about this point. Many teams will write one layer of tests in the shared runtime and think that is probably enough. It is not.

The reason I feel relatively confident about HagiCode’s implementation is that both layers have test coverage:

The shared provider tests whether automatic continuation really happened
The adapter layer tests whether final errors and streaming messages were preserved correctly

This time I also reran two related test groups, and all 31 test cases passed in both of them. That result alone does not prove the design is perfect, but it proves at least one thing: the current automatic retry is not a paper design. It is a capability constrained by both code and tests. Talk is cheap. Show me the code. It fits perfectly here.

Summary

If the entire article had to be compressed into one sentence, it would be this:

For Agent CLIs such as Claude Code and Codex, automatic retry should not be implemented as a local trick hidden inside one provider. It should be built as a combination of a shared coordinator, policy snapshot, context viability checks, and mirrored tests.

The benefits of doing it this way are very practical:

The logic is written once and reused across multiple providers
Whether a request is allowed to retry can travel stably with the execution chain
Continue running when context exists, and stop in time when it does not
What the frontend ultimately sees is a stable completed state or failed state, not a pile of abandoned intermediate noise

This solution was polished little by little while HagiCode was integrating multiple Agent CLIs in real scenarios. Who says AI-assisted programming is not the new era of pair programming? Models help you get started, complete code, and branch out, but what often determines the upper bound of the experience is still context, process, and constraints.

If this article was helpful to you, you are also welcome to look at HagiCode’s public entry points:

GitHub: github.com/HagiCode-org/site
Official website: hagicode.com
30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Desktop installation entry: hagicode.com/desktop/
Steam: Steam store page (add to wishlist / view details)

HagiCode is already on Steam now. This is not vaporware, and I have put the link right here. If you use Steam, go ahead and add it to your wishlist. Clicking in to take a look yourself is more direct than hearing me say ten more lines about it here.

That is enough on this topic for now. We will keep meeting inside real projects.

References

HagiCode project homepage: https://hagicode.com
HagiCode GitHub repository: https://github.com/HagiCode-org/site
Official demo video: https://www.bilibili.com/video/BV1pirZBuEzq/
Desktop installation instructions: https://hagicode.com/desktop/

Copyright Notice

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-02-11-agent-cli-automatic-retry/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please indicate the source when reposting.

SQLite Sharding in Practice: An In-Depth Comparison of Three Sharding Strategies

Apr 17, 2026

SQLite Sharding in Practice: An In-Depth Comparison of Three Sharding Strategies

When a single-file SQLite database hits concurrency bottlenecks, how do we break through? This article shares three SQLite sharding approaches from the HagiCode project across different scenarios, helping you understand how to choose the right sharding strategy.

Hello everyone, I am Yu Kun, the creator of HagiCode.

Background

When building high-performance applications, single-file SQLite databases run into very practical problems. Once user count and data volume grow, these issues start lining up one after another:

Write operations start queueing up, and response times visibly increase
Query performance drops as data volume grows
Frequent database is locked errors appear under multithreaded access

Many people instinctively ask: should we just migrate directly to PostgreSQL or MySQL? That can solve the problem, but deployment complexity rises sharply. Is there a lighter-weight option?

The answer is sharding. In the end, engineering problems should still be solved with engineering methods. By distributing data across multiple SQLite files, we can significantly improve concurrency and query performance while preserving SQLite’s lightweight characteristics.

About HagiCode

The approaches shared in this article come from our practical experience in the HagiCode project. As an AI coding assistant project, HagiCode needs to handle a large volume of conversation messages, state persistence, and event history records. It was through solving these real problems that we summarized three sharding approaches for different scenarios.

Good tools matter, but how you use them depends on the work you actually need to do.

Our code repository is at github.com/HagiCode-org/site. Feel free to take a deeper look if you are interested.

Overview of Three Sharding Approaches

After analyzing the HagiCode codebase, we identified three SQLite sharding approaches for different business scenarios:

Session Message sharded storage: storage for AI conversation messages, characterized by high-frequency writes and session-based isolated queries
Orleans Grain sharded storage: state persistence for a distributed framework, characterized by cross-node access and the need for deterministic routing
Hero History sharded storage: historical event records for a gamified system, characterized by event sourcing and the need for migration compatibility

Although their business scenarios differ, all three follow the same core design principles:

Deterministic routing: calculate the shard directly from the business ID, without a metadata table
Transparent access: upper layers use a unified interface and remain unaware of the underlying shards
Independent storage: each shard is a fully independent SQLite file
Concurrency optimization: WAL mode plus busy_timeout reduces lock contention

Many people ask: why not build one generic sharding solution? That is a very practical question, and the conclusion is straightforward: in engineering, there is no universal solution, only the one that best fits the current business scenario. Next, we will compare the concrete implementations of these three approaches in depth.

Comparison of Sharding Strategies

Shard Count and Naming Rules

Aspect	Session Message	Orleans Grain	Hero History
Shard count	256 (16²)	100	10
Naming rule	Hexadecimal (00-ff)	Decimal (00-99)	Decimal (0-9)
Storage directory	`DataDir/messages/`	`DataDir/orleans/grains/`	`DataDir/hero-history/`
Filename pattern	`{shard}.db`	`grains-{shard}.db`	`{shard}.db`

Why is there such a large difference in shard counts? It depends on business characteristics. Put another way, models will change, tools will evolve, and workflows will be upgraded, but the engineering fundamentals remain the same: first understand the problem you are actually trying to solve.

Session Message uses 256 shards because conversation messages have the highest write frequency and need more shards to spread the load
Orleans Grain uses 100 shards, balancing concurrency performance and operational complexity
Hero History uses only 10 shards because historical event writes are less frequent and migration cost must be considered

Differences in Routing Algorithms

The routing algorithm is the core of a sharding scheme. It determines how data is distributed across shards. The three approaches use different routing strategies:

// Session Message: last two hexadecimal characters of the GUID
var normalized = Guid.Parse(sessionId.Value).ToString("N").ToLowerInvariant();
return normalized[^2..];  // Take the last two hexadecimal characters

// Orleans Grain: extract digits, then use the last two digits modulo shard count
var digits = ExtractDigits(grainId);  // Extract all digits
var lastTwoDigits = (digits[^2] * 10) + digits[^1];
return lastTwoDigits % shardCount;

// Hero History: modulo 10 using the ASCII value of the last character
return heroId[^1] % 10;

Design analysis:

Session Message IDs are GUIDs. After converting to hexadecimal, taking the last two characters gives an even distribution across 256 shards
Orleans Grain IDs do not have a consistent format and may contain both letters and digits, so all digits are extracted before taking the modulo
Hero History IDs are strings, so the ASCII value of the last character is used directly with modulo. It is simple, but the distribution may be less uniform

Key point: regardless of which algorithm you use, the same ID must always map to the same shard. This is one of the most fundamental requirements in distributed systems. Otherwise, data inconsistency is inevitable. If routing is unstable, every other effort collapses to zero.

Differences in Initialization Strategies

Aspect	Session Message	Orleans Grain	Hero History
Initialization timing	Lazy-loaded on demand	Full parallel initialization at startup	Lazy-loaded on demand
Concurrency control	`Lazy<Task>` prevents duplicate initialization	`Parallel.ForEachAsync`	`Lazy<Task>` prevents duplicate initialization

Why does Orleans Grain choose full initialization at startup?

Because Orleans is a distributed framework, a Grain may be scheduled to any node. If a shard file is discovered to be missing only at runtime, requests can fail. Full initialization at startup extends startup time, but it guarantees runtime stability. Getting it running is only the beginning; keeping it maintainable is the real skill.

Advantages of lazy loading:

For Session Message and Hero History, lazy loading reduces startup time. Files and schema are created only when a shard is actually needed. Using Lazy<Task> also prevents race conditions during concurrent initialization. The design looks simple, but in real projects it saves a lot of unnecessary trouble.

Schema Design Characteristics

The schema designs of the three approaches reflect their respective business characteristics:

Session Message:

Supports the Event Sourcing model (event table plus snapshot table)
Includes a child table for message content blocks (MessageContentBlocks)
Has compression and compression-flag fields to support future optimizations

Orleans Grain:

Minimalist design: a single GrainState table
Stores state as serialized JSON
Uses ETag-based optimistic concurrency control

Hero History:

Timeline query optimization indexes
A unique DedupeKey constraint prevents duplication
Supports multiple event types and statuses

These designs show that schema design should stay tightly aligned with business requirements rather than chasing genericity. Orleans Grain is simple precisely because it only needs to store serialized state and does not require complex query capabilities. This is not mysticism. It is engineering. Do not rush to give something a grand name before checking whether it can survive two iterations inside a real team.

Concurrency Configuration Comparison

All three approaches use the same SQLite concurrency optimization settings:

PRAGMA journal_mode=WAL;      -- Write-ahead logging mode
PRAGMA synchronous=NORMAL;     -- Reduce persistence overhead
PRAGMA busy_timeout=5000;      -- 5-second busy wait
PRAGMA foreign_keys=ON;        -- Foreign key constraints

Advantages of WAL mode:

Traditional rollback journal mode causes lock contention during writes, while WAL mode allows reads and writes to proceed concurrently. In large-data scenarios, this can significantly improve performance. Many developers overlook this setting, but it matters far more than they think.

The tradeoff of synchronous=NORMAL:

Setting it to FULL provides maximum safety, but it significantly reduces performance. NORMAL strikes a balance between safety and performance, making it the right choice for most applications. There is no need to overthink this one. NORMAL is enough.

How to Choose a Sharding Strategy

Based on the analysis of HagiCode’s three approaches, we can summarize the following decision matrix:

High-throughput scenarios -> more shards (for example, Message uses 256)
Simple maintainability   -> fewer shards (for example, Hero History uses 10)
Mostly numeric IDs       -> modulo algorithm (Orleans Grain)
Mostly GUIDs             -> hexadecimal suffix (Session Message)
String IDs               -> ASCII modulo (Hero History)

Rules of thumb for choosing shard counts:

Too few (< 10): limited concurrency improvement, making sharding less meaningful
Too many (> 1000): file management becomes complex and connection-pool overhead rises
Rule of thumb: 10 to 100 shards fit most scenarios
Extremely high concurrency scenarios: 256 shards can be considered

If you only look at demos, it is easy to get carried away. But once you enter production, every cost has to be calculated carefully. Many things are not impossible, just not honestly priced.

Practical Guide

Implement a Standardized Shard Resolver

public interface IShardResolver<TId>
{
    string ResolveShardKey(TId id);
}

// Hexadecimal sharding (for GUIDs)
public class HexSuffixShardResolver : IShardResolver<string>
{
    private readonly int _suffixLength;

    public HexSuffixShardResolver(int suffixLength = 2)
    {
        _suffixLength = suffixLength;
    }

    public string ResolveShardKey(string id)
    {
        var normalized = id.Replace("-", "").ToLowerInvariant();
        return normalized[^_suffixLength..];
    }
}

// Numeric modulo sharding (for purely numeric IDs)
public class NumericModuloShardResolver : IShardResolver<long>
{
    private readonly int _shardCount;

    public NumericModuloShardResolver(int shardCount)
    {
        _shardCount = shardCount;
    }

    public string ResolveShardKey(long id)
    {
        return (id % _shardCount).ToString("D2");
    }
}

Unified Connection Factory Pattern

public class ShardedConnectionFactory<TOptions>
{
    private readonly ConcurrentDictionary<string, Lazy<Task>> _initializationTasks = new();
    private readonly TOptions _options;
    private readonly IShardSchemaInitializer _initializer;

    public ShardedConnectionFactory(
        TOptions options,
        IShardSchemaInitializer initializer)
    {
        _options = options;
        _initializer = initializer;
    }

    public async Task<TDbContext> CreateAsync(string shardKey, CancellationToken ct)
    {
        var connectionString = BuildConnectionString(shardKey);

        // Use Lazy<Task> to prevent concurrent initialization
        var initTask = _initializationTasks.GetOrAdd(
            connectionString,
            _ => new Lazy<Task>(() => InitializeShardAsync(connectionString, ct))
        );

        await initTask.Value;
        return CreateDbContext(connectionString);
    }

    private async Task InitializeShardAsync(string connectionString, CancellationToken ct)
    {
        await _initializer.InitializeAsync(connectionString, ct);
    }

    private string BuildConnectionString(string shardKey)
    {
        var shardPath = Path.Combine(_options.BaseDirectory, $"{shardKey}.db");
        return $"Data Source={shardPath}";
    }

    private TDbContext CreateDbContext(string connectionString)
    {
        // Create the DbContext according to the specific ORM
        return Activator.CreateInstance(typeof(TDbContext), connectionString) as TDbContext;
    }
}

Best Practices for Schema Initialization

public class SqliteShardInitializer : IShardSchemaInitializer
{
    public async Task InitializeAsync(string connectionString, CancellationToken ct)
    {
        await using var connection = new SqliteConnection(connectionString);
        await connection.OpenAsync(ct);

        // Concurrency optimization settings
        await connection.ExecuteAsync("""
            PRAGMA journal_mode=WAL;
            PRAGMA synchronous=NORMAL;
            PRAGMA busy_timeout=5000;
            PRAGMA foreign_keys=ON;
        """);

        // Create table schema
        await connection.ExecuteAsync("""
            CREATE TABLE IF NOT EXISTS Entities (
                Id TEXT PRIMARY KEY,
                CreatedAt TEXT NOT NULL,
                UpdatedAt TEXT NOT NULL,
                Data TEXT NOT NULL,
                ETag TEXT
            );
        """);

        // Create indexes
        await connection.ExecuteAsync("""
            CREATE INDEX IF NOT EXISTS IX_Entities_CreatedAt
            ON Entities(CreatedAt DESC);

            CREATE INDEX IF NOT EXISTS IX_Entities_UpdatedAt
            ON Entities(UpdatedAt DESC);
        """);
    }
}

Key Considerations

1. Routing stability

The routing algorithm must guarantee that the same ID always maps to the same shard. Avoid random or time-dependent calculations, and do not introduce mutable parameters into the algorithm.

2. Choosing the shard count

The number of shards should be decided during the design phase. Changing it later is extremely difficult. Consider:

Current and future concurrency volume
The management cost of each shard
The complexity of data migration

3. Migration planning

The Hero History approach demonstrates a complete migration path:

Build the new sharded storage infrastructure
Implement a migration service to copy data from the primary database into the shards
Verify query compatibility after migration
Switch read and write paths to the shards
Clean up legacy tables in the primary database

Future migration requirements need to be considered while designing the sharding scheme. Talk is cheap. Show me the code. But code alone is not enough. You also need a complete migration path. A one-time success is not a system; sustained success is.

4. Monitoring and operations

Monitor size distribution across shards to detect data skew early
Set alerts for shard hot spots to prevent a single shard from becoming the bottleneck
Regularly inspect WAL file sizes to avoid excessive disk usage
Establish shard health-check mechanisms

5. Test coverage

Test boundary conditions such as empty IDs, special characters, and overly long IDs
Verify routing determinism to ensure the same ID always maps to the same shard
Run concurrent write stress tests to confirm lock contention is effectively reduced
Run migration tests to ensure data integrity and consistency

Summary

By comparing the three SQLite sharding approaches in the HagiCode project, we can see that:

There is no universal solution: different business scenarios need different sharding strategies
The core principles are shared: deterministic routing, transparent access, independent storage, and concurrency optimization
Design should face the future: consider migration paths and operational costs

If your project is using SQLite and has started hitting concurrency bottlenecks, I hope this article gives you some useful ideas. There is no need to rush into migrating to a heavyweight database. Sometimes the right sharding strategy is enough to solve the problem.

Of course, sharding is not a silver bullet. Before choosing a sharding strategy, first make sure that:

You have already optimized single-table query performance
You have already added appropriate indexes
You have already enabled WAL mode

Only after these optimizations are done, and a performance bottleneck still remains, should you consider introducing sharding. Doing simple things well is a capability in itself.

Sometimes doing the work once says more than explaining it ten times. From here, let the engineering results speak for themselves.

References

HagiCode project repository: github.com/HagiCode-org/site
SQLite WAL mode documentation: sqlite.org/wal.html
Orleans distributed framework: dotnet.github.io/orleans

Copyright Notice

Thank you for reading. If you found this article useful, feel free to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-17-sqlite-sharding-strategies-comparison/
Copyright: Unless otherwise stated, all blog articles are licensed under BY-NC-SA. Please cite the source when reposting.

How to Automate Steam Releases with GitHub Actions

Apr 16, 2026

How to Automate Steam Releases with GitHub Actions

This article shares the complete solution we implemented for automated Steam releases in the HagiCode Desktop project, covering the end-to-end automation flow from GitHub Release to the Steam platform, including key technical details such as Steam Guard authentication and multi-platform Depot uploads.

Background

The release workflow on Steam is actually quite different from traditional application distribution. Steam has its own complete update delivery system. Developers need to use SteamCMD to upload build artifacts to Steam’s CDN network, rather than simply dropping in a download link like on other platforms.

The HagiCode Desktop project is preparing for a Steam release, which introduced a few new challenges to our release workflow:

We needed to convert existing build artifacts into a Steam-compatible format.
We had to upload them to Steam through SteamCMD.
We also had to handle Steam Guard authentication.
We needed to support multi-platform Depot uploads for Linux, Windows, and macOS.
We wanted a fully automated flow from GitHub Release to Steam.

The project had already implemented “portable version mode,” which allows the application to detect fixed service payloads packaged in the extra directory. Our goal was to integrate that portable version mode seamlessly with Steam distribution.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI coding assistant that supports desktop usage. Since we are actively working toward launching on Steam, we needed to establish a reliable automated release workflow.

Architecture Design

The core of the entire Steam release process is a GitHub Actions workflow that divides the process into three main stages:

┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions Workflow (Steam Release)                    │
├─────────────────────────────────────────────────────────────┤
│ 1. Preparation Stage:                                      │
│    - Check out portable-version code                       │
│    - Download build artifacts from GitHub Release          │
│    - Extract and prepare the Steam content directory       │
│                                                             │
│ 2. SteamCMD Setup:                                         │
│    - Install or reuse SteamCMD                             │
│    - Authenticate with Steam Guard                         │
│                                                             │
│ 3. Release Stage:                                          │
│    - Generate Depot VDF configuration files                │
│    - Generate App Build VDF configuration files            │
│    - Invoke SteamCMD to upload to Steam                    │
└─────────────────────────────────────────────────────────────┘

This design offers several advantages:

It reuses existing GitHub Release artifacts and avoids rebuilding the same outputs.
It uses self-hosted runners for secure isolation.
It supports switching between preview mode and formal release branches.
It includes complete error handling and logging, which makes failures easier to diagnose.

Workflow Implementation

Trigger Parameter Design

Our workflow supports the following key parameters:

inputs:
  release:           # Portable Version release tag
    description: 'Version tag to release (for example v1.0.0)'
    required: true
  steam_preview:     # Whether to generate a preview build
    description: 'Whether to use preview mode'
    required: false
    default: 'false'
  steam_branch:      # Steam branch to set live
    description: 'Target Steam branch'
    required: false
    default: 'preview'
  steam_description: # Build description override
    description: 'Build description'
    required: false

Self-hosted Runner Configuration

For security reasons, we use a self-hosted runner with the steam label:

runs-on:
  - self-hosted
  - Linux
  - X64
  - steam

This ensures that Steam releases run on a dedicated runner and keeps sensitive credentials safely isolated.

Concurrency Control

To prevent releases of the same version from interfering with each other, we configured concurrency control:

concurrency:
  group: portable-version-steam-${{ github.event.inputs.release }}
  cancel-in-progress: false

Notice that cancel-in-progress: false is set here because Steam releases can take a while, and we do not want a newly triggered run to cancel one that is already uploading.

Core Script Implementation

Preparing Release Inputs

The prepare-steam-release-input.mjs script is responsible for preparing the inputs required for release:

// Download the GitHub Release build manifest and artifact inventory
const buildManifest = await downloadBuildManifest(releaseTag);
const artifactInventory = await downloadArtifactInventory(releaseTag);

// Download compressed archives for each platform
for (const platform of ['linux-x64', 'win-x64', 'osx-universal']) {
  const artifactUrl = getArtifactUrl(artifactInventory, platform);
  await downloadArtifact(artifactUrl, platform);
}

// Extract into the Steam content directory structure
await extractToSteamContent(sources, contentRoot);

Steam Guard Authentication

Steam requires accounts to be protected with Steam Guard, so we implemented a shared-secret-based code generation algorithm:

function generateSteamGuardCode(sharedSecret, timestamp = Date.now()) {
  const secret = decodeSharedSecret(sharedSecret);
  const time = Math.floor(timestamp / 1000 / 30);

  const timeBuffer = Buffer.alloc(8);
  timeBuffer.writeBigUInt64BE(BigInt(time));

  // Use HMAC-SHA1 to generate a time-based one-time code
  const hash = crypto.createHmac('sha1', secret)
    .update(timeBuffer)
    .digest();

  // Convert it into a 5-character Steam Guard code
  const code = steamGuardCode(hash);
  return code;
}

This implementation is based on Steam Guard’s TOTP (Time-based One-Time Password) mechanism, generating a new verification code every 30 seconds.

VDF Configuration Generation

VDF (Valve Data Format) is the configuration format used by Steam. We need to generate two types of VDF files:

Depot VDF is used to configure content for each platform:

function buildDepotVdf(depotId, contentRoot) {
  return [
    '"DepotBuildConfig"',
    '{',
    `  "DepotID" "${escapeVdf(depotId)}"`,
    `  "ContentRoot" "${escapeVdf(contentRoot)}"`,
    '  "FileMapping"',
    '  {',
    '    "LocalPath" "*"',
    '    "DepotPath" "."',
    '    "recursive" "1"',
    '  }',
    '}'
  ].join('\n');
}

App Build VDF is used to configure the entire application build:

function buildAppBuildVdf(appId, depotBuilds, description, setLive) {
  const vdf = [
    '"appbuild"',
    '{',
    `  "appid" "${appId}"`,
    `  "desc" "${escapeVdf(description)}"`,
    `  "contentroot" "${escapeVdf(contentRoot)}"`,
    '  "buildoutput" "build_output"',
    '  "depots"',
    '  {'
  ];

  for (const [depotId, depotVdfPath] of Object.entries(depotBuilds)) {
    vdf.push(`    "${depotId}" "${depotVdfPath}"`);
  }

  if (setLive) {
    vdf.push(`  }`);
    vdf.push(`  "setlive" "${setLive}"`);
  }

  vdf.push('}');
  return vdf.join('\n');
}

SteamCMD Invocation

Finally, the upload is performed by invoking SteamCMD:

await runCommand(steamcmdPath, [
  '+login', steamUsername, steamPassword, steamGuardCode,
  '+run_app_build', appBuildPath,
  '+quit'
]);

This is the final jump in the whole workflow. Once it succeeds, the release is done.

Multi-platform Depot Handling

Steam uses the Depot system to manage content for different platforms. We support three main Depots:

Platform	Depot Identifier	Architecture Support
Linux	`linux-x64`	x64_64
Windows	`win-x64`	x64_64
macOS	`osx-universal`	universal, x64_64, arm64

Each Depot has its own content directory and VDF configuration file. This ensures that users on different platforms only download the content they actually need.

Release Workflow

Step 1: Prepare the GitHub Release

First, create a GitHub Release in the portable-version repository that includes:

Compressed archives for each platform
Build manifest ({tag}.build-manifest.json)
Artifact inventory ({tag}.artifact-inventory.json)

Step 2: Trigger the Steam Release Workflow

Trigger the workflow manually through GitHub Actions and fill in the required parameters:

release: the version tag to publish, such as v1.0.0
steam_branch: the target branch, such as preview or public
steam_preview: whether to use preview mode

Step 3: Execute the Release Flow Automatically

The workflow automatically performs the following steps:

Download and extract GitHub Release artifacts.
Install or update SteamCMD.
Generate Steam VDF configuration files.
Authenticate with Steam Guard.
Upload content to the Steam CDN.
Set the specified branch live.

Once this sequence completes, the whole release path is covered.

Configuration Guide

Required Secrets Configuration

Configure the following secrets in the GitHub repository settings:

Secret Name	Description
`STEAM_USERNAME`	Steam account username
`STEAM_PASSWORD`	Steam account password
`STEAM_SHARED_SECRET`	Steam Guard shared secret (optional)
`STEAM_GUARD_CODE`	Steam Guard code (optional)
`STEAM_APP_ID`	Steam application ID
`STEAM_DEPOT_ID_LINUX`	Linux Depot ID
`STEAM_DEPOT_ID_WINDOWS`	Windows Depot ID
`STEAM_DEPOT_ID_MACOS`	macOS Depot ID

There is nothing especially unusual about these settings. You simply need all of the expected values in place.

Environment Variable Configuration

Variable Name	Description	Default Value
`PORTABLE_VERSION_STEAMCMD_ROOT`	SteamCMD installation directory	`~/.local/share/portable-version/steamcmd`

Best Practices

Steam Guard Authentication Management

On the first run, you need to enter the Steam Guard code manually. After that, it is recommended to configure the shared secret so the code can be generated automatically. This avoids manual intervention on every release.

SteamCMD saves the login token and can reuse it on later runs. You still need to keep an eye on token expiration, because re-authentication is required after the token expires.

Content Directory Structure

Make sure the Steam content directory structure is correct:

steam-content/
├── linux-x64/     # Linux platform content
├── win-x64/       # Windows platform content
└── osx-universal/ # macOS universal binary content

Each directory should contain the complete application files for its corresponding platform.

Preview Mode Usage

Preview mode does not set any branch live, so it is suitable for testing and validation:

if [ "$STEAM_PREVIEW_INPUT" = 'true' ]; then
  cmd+=(--preview)
fi

This lets you upload to Steam first for verification, then switch to the formal branch after everything checks out.

Error Handling and Logging

The scripts include complete error handling and logging:

Validate that the GitHub Release exists.
Check required metadata files.
Ensure platform content is present.
Generate a GitHub Actions summary report.

This information is highly valuable for both debugging and auditing.

Artifact Management

The workflow generates two kinds of artifacts:

portable-steam-release-preparation-{tag}: release preparation metadata
portable-steam-build-metadata-{tag}: Steam build metadata

These artifacts can be used for later auditing and debugging. A retention period of 30 days is a practical default.

Real-world Usage

In the HagiCode project, this automated release workflow has already run successfully across multiple versions. The entire path from GitHub Release to the Steam platform is fully automated and requires no manual intervention.

This significantly improved both release efficiency and reliability. In the past, manually publishing one version took more than 30 minutes. Now the entire process finishes in just a few minutes.

More importantly, the automated workflow reduces the chance of human error. Every release follows the same standardized process, which makes the results more predictable.

Summary

With the approach described in this article, we achieved:

Full automation from GitHub Release to the Steam platform.
Multi-platform Depot uploads.
Secure authentication based on Steam Guard.
Flexible switching between preview mode and formal release.
Complete error handling and logging.

This solution is not only suitable for the HagiCode project, but can also serve as a reference for other projects planning to launch on Steam. If you are also considering Steam release automation, I hope this practical experience is useful to you.

Technology can feel complex or simple depending on the path you choose. The key is finding a workflow that fits your needs.

If this article helped you, feel free to star the HagiCode GitHub repository or visit the official website to learn more.

References

Copyright Notice

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-04-16-steam-release-automation-github-actions/
Copyright notice: Unless otherwise stated, all articles in this blog are licensed under BY-NC-SA. Please cite the source when reposting.

How to Build a Fast Download Distribution Station with Low-Cost Cloud Servers

Apr 15, 2026

How to Build a Fast Download Distribution Station with Low-Cost Cloud Servers

Cloud storage bandwidth is absurdly expensive, cross-border access is painfully slow, and CDN pricing is enough to scare anyone away… If you handle file distribution, you probably know these problems well. In this post, I want to share a low-cost approach we worked out while building HagiCode: a cloud server plus an Nginx caching layer. The cost dropped by about half, while download speed improved quite a bit, which was at least a little comforting.

Background

When it comes to the internet, download speed and stability are really part of the user experience. Whether you are running an open-source project or a commercial product, you still need to provide users with a reliable way to download files.

Downloading files directly from cloud storage, such as Azure Blob Storage or AWS S3, looks simple, but it comes with quite a few practical issues:

Network latency: Cross-border and cross-region access can be slow enough to make you want to smash your keyboard. If users have to wait forever, the experience is obviously not going to be great.

Bandwidth cost: Cloud storage egress traffic is painfully expensive. Accessing Azure Blob Storage from mainland China costs about CNY 0.5 per GB, which means 1 TB per month adds up to roughly CNY 500. For a small team, that is not an insignificant amount. After all, nobody’s money comes from the wind.

Access restrictions: In some regions, access to overseas cloud services is unstable, and sometimes it is simply unavailable. Users want to download the files but cannot, which is frustrating for everyone.

CDN cost: Commercial CDNs can solve these problems, but the price is just as real. Most small teams simply cannot justify it.

So is there a solution that is both affordable and practical? Yes. Use a cloud server, a reverse proxy, and a caching layer. It is a straightforward approach, but it works. The cost drops by about half, and the speed improves as well, which is a decent trade-off.

About HagiCode

We did not come up with this architecture out of thin air. It came from our real-world experience working on HagiCode.

HagiCode is an AI coding assistant, and we need to provide downloads for both server-side and desktop-side distributions. Since it is a tool for developers, it is important that users around the world can download it quickly and reliably. That is exactly why we had to figure out a low-cost distribution strategy in the first place.

If you think this solution looks useful, then maybe our engineering is at least decent enough… and if that is the case, HagiCode itself might also be worth checking out.

Architecture Design

Overall Architecture

Let us start with the full architecture:

User request
    ↓
DNS resolution
    ↓
┌─────────────────────────────────────┐
│   Reverse proxy layer (Traefik/Bunker Web) │ ← SSL termination, routing, security protection
├─────────────────────────────────────┤
│   Ports: 80/443                     │
│   Features: Automatic Let's Encrypt certificates │
│             Host routing            │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│   Cache layer (Nginx)               │ ← File caching, Gzip compression
├─────────────────────────────────────┤
│   Ports: 8080(server) / 8081(desktop) │
│   Cache strategy:                   │
│   - index.json: 1 hour              │
│   - other files: 7 days             │
│   Cache size: 1GB                   │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│   Origin (Azure Blob Storage)       │ ← File storage
└─────────────────────────────────────┘

The core idea of this architecture is simple: put a cache between users and cloud storage.

User requests first arrive at the reverse proxy layer on the cloud server, and then the Nginx cache layer takes over. If the requested file is already cached, it is returned immediately. If not, Nginx fetches it from cloud storage and stores a local copy at the same time. The next time someone requests the same file, cloud storage does not need to be involved again.

Why Choose This Architecture?

Advantages of cloud servers:

Predictable cost: providers like Alibaba Cloud offer low-cost cloud servers, with 1-2 vCPU and 2 GB RAM instances priced around CNY 50-100 per month
Flexible deployment: you can configure reverse proxy rules and caching policies freely
Flexible geography: you can choose server regions closer to your users
Good scalability: you can upgrade the server specification as traffic grows

Reverse proxy + cache architecture:

Reduce origin pressure: cache hot files to reduce direct access to cloud storage
Lower cost: cloud server traffic is much cheaper than cloud storage egress
Improve speed: nearby access and server bandwidth are usually better than direct cloud storage delivery

Why choose Nginx as the cache layer?

This was not a random choice. Nginx has several real advantages here:

High performance: Nginx is widely recognized for excellent reverse proxy performance
Mature caching: the built-in proxy_cache feature is stable and reliable
Low resource usage: it can run with as little as 256 MB of memory
Flexible configuration: you can apply different cache policies to different file types

Reverse Proxy Layer: Traefik vs Bunker Web

HagiCode’s deployment solution supports two reverse proxy options, and each one has its own strengths:

Option	Characteristics	Suitable Scenarios
Traefik	Lightweight, automatic SSL, simple configuration	Basic deployment, low-traffic scenarios
Bunker Web	Built-in WAF, DDoS protection, anti-bot protection	High-security, high-traffic scenarios

Traefik: The Lightweight First Choice

Traefik is a modern HTTP reverse proxy and load balancer. Its biggest advantage is that configuration is simple, and it can obtain Let’s Encrypt certificates automatically.

For initial deployments or low-traffic scenarios, Traefik is often a very good choice:

It uses relatively few resources; 1.5 CPU and 512 MB memory is enough
SSL certificates are configured automatically, so you do not need to manage them yourself
Routing is configured through Docker labels, which is convenient enough

Bunker Web: For High-Security Scenarios

Bunker Web is an Nginx-based web application firewall with more comprehensive security protection.

When should you consider switching to Bunker Web? Usually in cases like these:

You are under DDoS attack
You need ModSecurity protection
You want anti-bot protection
You have stricter security requirements

HagiCode provides the switch-deployment.sh script so you can switch quickly between the two options:

# Switch to Bunker Web
./switch-deployment.sh bunkerweb

# Switch back to Traefik
./switch-deployment.sh traefik

# Check current status
./switch-deployment.sh status

The script performs pre-checks, health checks, and automatic rollback, so the switch process is fairly safe and reliable.

Nginx Cache Layer Configuration

The cache layer is the core of the whole architecture, so Nginx configuration makes a huge difference in cache performance.

Cache Path Configuration

# Cache path configuration
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=azure_cache:10m
                   max_size=1g inactive=7d use_temp_path=off;

Parameter details:

levels=1:2: cache directory hierarchy with two levels to improve file access efficiency
keys_zone=azure_cache:10m: cache key storage zone; 10 MB is enough for a large number of keys
max_size=1g: maximum cache size is 1 GB
inactive=7d: delete cached files if they have not been accessed for 7 days
use_temp_path=off: write directly into the cache directory for better performance

Tiered Cache Strategy

Different file types need different cache strategies:

# Server download service
server {
    listen 8080;

    # Short-term cache for index.json (to allow timely updates)
    location /index.json {
        proxy_cache azure_cache;
        proxy_cache_valid 200 1h;
        proxy_cache_key "$scheme$server_port$request_uri";
        add_header X-Cache-Status $upstream_cache_status;
        add_header Cache-Control "public, max-age=3600";

        # Reverse proxy to Azure OSS
        proxy_pass https://${SERVER_DL_HOST}/${SERVER_DL_CONTAINER}$uri?${SERVER_DL_SAS_TOKEN};
        proxy_ssl_server_name on;
        proxy_ssl_protocols TLSv1.2 TLSv1.3;
    }

    # Long-term cache for static files such as installation packages
    location / {
        proxy_cache azure_cache;
        proxy_cache_valid 200 7d;
        proxy_cache_key "$scheme$server_port$request_uri";
        add_header X-Cache-Status $upstream_cache_status;
        add_header Cache-Control "public, max-age=604800";

        proxy_pass https://${SERVER_DL_HOST}/${SERVER_DL_CONTAINER}$uri?${SERVER_DL_SAS_TOKEN};
        proxy_ssl_server_name on;
        proxy_ssl_protocols TLSv1.2 TLSv1.3;
    }
}

Why is it designed this way?

index.json is the version check file, so it needs to update promptly. With a 1-hour cache window, users can detect a new release within at most one hour after publication.

Static files such as installation packages change infrequently, so caching them for 7 days greatly reduces origin access. When an update is needed, you can just clear the cache manually.

X-Cache-Status response header:

This header helps you inspect cache hit behavior:

HIT: cache hit
MISS: cache miss, fetched from origin
EXPIRED: cache expired, fetched from origin again
BYPASS: cache bypassed

How to check it:

curl -I https://server.dl.hagicode.com/app.zip

Cost Analysis

Assume 1 TB of download traffic per month. Let us do the math:

Option	Traffic Cost	Server Cost	Total
Direct Azure OSS	About CNY 500	CNY 0	CNY 500
Cloud server + OSS (80% cache hit ratio)	CNY 100 + CNY 80	CNY 60	CNY 240
Commercial CDN	CNY 300-500	CNY 0	CNY 300-500

Conclusion: adding a cache layer can reduce distribution cost by roughly 50%.

This estimate assumes an 80% cache hit ratio. In practice, if files do not change often, the hit ratio may be even higher.

Deployment Practice

Environment Preparation

First, configure the environment variables:

cd /path/to/hagicode_aliyun_deployment/docker
cp .env.example .env
vi .env  # Fill in the Azure OSS SAS URL and Lark Webhook URL

Important: the .env file contains sensitive information such as the SAS Token and Webhook URL. Never commit it to version control.

DNS Configuration

Add the following DNS A records:

server.dl.hagicode.com → server IP
desktop.dl.hagicode.com → server IP

Initialize the Server

Use Ansible to initialize the server automatically:

cd /path/to/hagicode_aliyun_deployment
ansible-playbook -i ./ansible/inventory/hosts.yml ./ansible/playbooks/init.yml

This playbook handles the following tasks automatically:

Create the deployment user
Install Docker and Docker Compose
Configure SSH keys
Set firewall rules

That is the main setup work, and automation saves a lot of time.

Deploy the Services

./deploy.sh

The deployment script helps you do the following:

Check environment configuration
Pull the latest code
Start Docker containers
Run health checks
Send deployment notifications (Lark)

One command is enough, which keeps the process convenient.

Verify the Deployment

# Check container status
docker ps

# Test the download domains
curl -I https://server.dl.hagicode.com/index.json
curl -I https://desktop.dl.hagicode.com/index.json

Operations Tips

Cache Management

Caches also need maintenance from time to time:

Check cache disk usage:

docker volume inspect docker_nginx-cache
du -sh /var/lib/docker/volumes/docker_nginx-cache/_data

Clear the cache manually:

./clear-cache.sh

Or run the manual commands directly if needed:

docker exec nginx sh -c "rm -rf /var/cache/nginx/*"
docker restart nginx

Resource Limits

On a 1-core, 2 GB server, the resource limit configuration looks like this:

services:
  traefik:
    deploy:
      resources:
        limits:
          cpus: '1.50'
          memory: 512M

  nginx:
    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 256M

To monitor resource usage, you can occasionally run:

docker stats

SAS Token Security Practices

The SAS Token is the credential used to access Azure Blob Storage, so leaking it would be serious:

Do not commit the .env file to version control; it is already in .gitignore
Set an appropriate SAS Token expiration time, with 1 year recommended
Limit SAS Token permissions to read-only
Rotate SAS Tokens regularly

Monitoring and Alerts

HagiCode integrates Lark/Feishu Webhook notifications, which can send alerts for the following events:

Deployment success or failure
Cache clearing status
Service exceptions

Notifications include server information, timestamps, and error details, making troubleshooting much faster.

High Availability Extensions

When one server is no longer enough, you can consider the following:

Horizontal scaling: deploy multiple nodes and distribute traffic with DNS round-robin or a load balancer
CDN in front: put a CDN in front of the cloud servers for even faster access
Cache warming: use scripts to preload hot files into the cache

Notes

There are a few things worth keeping in mind:

SSL certificates: Let’s Encrypt has rate limits, so do not switch deployments too frequently or certificate issuance may fail
Cache clearing: after updating important files, remember to clear the cache or users may still download the old version
Log management: clean up Docker logs regularly, or the disk may fill up
Backup strategy: back up files such as Traefik acme.json and Bunker Web configuration
Monitoring and alerts: configure Feishu notifications so you can track deployment status and respond quickly to issues

Conclusion

A cloud server plus an Nginx caching layer is all it takes. HagiCode uses this solution with a fairly low monthly cost, around CNY 60-100 for the server, and the results have been very solid. The main advantages are:

Predictable cost: roughly 50% cheaper than using cloud storage directly or paying for a commercial CDN
Flexible deployment: choose Traefik or Bunker Web depending on your needs
Strong scalability: you can scale horizontally or add a CDN later if needed
Simple operations: Shell scripts plus Ansible make automated deployment straightforward

For small teams and independent developers who need file distribution, this is definitely a practical option worth trying.

HagiCode has been running this architecture stably in production for a while, and global user downloads have remained reliable. If you are looking for a similar solution, it is well worth a try.

Technology Stack Recap

To wrap up, here is a summary of the technologies involved:

Component	Choice	Purpose
Cloud server	Alibaba Cloud ECS	Base runtime environment
Reverse proxy	Traefik / Bunker Web	SSL termination, routing, security protection
Cache layer	Nginx	Reverse proxy caching, Gzip compression
File storage	Azure Blob Storage	File origin
Containerization	Docker Compose	Service orchestration
Automation	Ansible	Server configuration management
Notifications	Lark/Feishu Webhook	Deployment status notifications

References

Here are the reference materials mentioned in this post:

HagiCode project: github.com/HagiCode-org/site
HagiCode website: hagicode.com
30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Docker Compose one-click installation: docs.hagicode.com/installation/docker-compose
Desktop quick installation: hagicode.com/desktop/

If this post helped you, that already makes it worthwhile:

Give it a like so more people can find it
Star the project on GitHub: github.com/HagiCode-org/site
Visit the official website for more information: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Try the one-click installation: docs.hagicode.com/installation/docker-compose
Install the desktop app quickly: hagicode.com/desktop/
Public beta has started, and you are welcome to try it out

That is about it for this post. I hope this solution helps you. If you have better ideas, feel free to share them. Technology is always easier to improve when people learn from each other.

Copyright Notice

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final version was reviewed and approved by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-15-low-cost-cloud-server-download-distribution-station/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please include the source when reposting.

Hermes Agent Integration Practice: From Protocol to Production

Apr 14, 2026

Hermes Agent Integration Practice: From Protocol to Production

Sharing the complete HagiCode experience of integrating Hermes Agent, including core lessons around ACP protocol adaptation, session pool management, and front-end/back-end contract synchronization.

Background

While building HagiCode, an AI-assisted coding platform, our team needed to integrate an Agent framework that could run locally and also scale to the cloud. After research, Hermes Agent from Nous Research was chosen as the underlying engine for our general-purpose Agent capabilities.

In truth, technology selection is neither especially hard nor especially easy. There are plenty of strong Agent frameworks on the market, but Hermes stood out because its ACP protocol and tool system fit HagiCode’s demanding requirements particularly well: local development, team collaboration, and cloud expansion. Still, bringing Hermes into a real production system meant solving a long list of engineering problems. This part was anything but trivial.

HagiCode’s stack uses Orleans to build a distributed system, while the front end is built with React + TypeScript. Integrating Hermes meant preserving architectural consistency while making Hermes a first-class executor alongside ClaudeCode and OpenCode. It sounds simple enough, but implementation always tells the real story.

This article shares our practical experience integrating Hermes Agent into HagiCode, and we hope it offers useful reference material for teams facing similar needs. After all, once you’ve fallen into a pit, there is no reason to let someone else fall into the same one.

About HagiCode

The solution described in this article comes from our hands-on work in the HagiCode project. HagiCode is an AI-driven coding assistance platform that supports unified access to and management of multiple AI Providers. During the Hermes Agent integration, we designed a generic Provider abstraction layer so new Agent types could plug into the existing system seamlessly.

If you’re interested in HagiCode, feel free to visit GitHub to learn more. The more people who pay attention, the stronger the momentum.

Architecture Design

Layered design approach

HagiCode’s Hermes integration uses a clear layered architecture, with each layer focused on its own responsibilities:

Back-end core layer

HermesCliProvider: implements the IAIProvider interface as the unified AI Provider entry point
HermesPlatformConfiguration: manages Hermes executable path, arguments, authentication, and related settings
ICliProvider<HermesOptions>: the low-level CLI abstraction provided by HagiCode.Libs for handling subprocess lifecycles

Transport layer

StdioAcpTransport: communicates with the Hermes ACP subprocess through standard input and output
ACP protocol methods: initialize, authenticate, session/new, session/prompt

Runtime layer

HermesGrain: Orleans Grain implementation that handles distributed session execution
CliAcpSessionPool: session pool that reuses ACP subprocesses to avoid frequent startup overhead

Front-end layer

ExecutorAvatar: Hermes visual identity and icon
executorTypeAdapter: Provider type mapping logic
SignalR real-time messaging: maintains Hermes identity consistency throughout the message stream

This layered design allows each layer to evolve independently. For example, if we want to add a new transport mechanism in the future, such as WebSocket, we only need to modify the transport layer. There is no need to turn over the whole system just because one transport changes.

Unified interface abstraction

All AI Providers implement the IAIProvider interface, which is one of the core design choices in HagiCode’s architecture:

public interface IAIProvider
{
    string Name { get; }
    ProviderCapabilities Capabilities { get; }

    IAsyncEnumerable<AIStreamingChunk> StreamAsync(
        AIRequest request,
        CancellationToken cancellationToken = default);

    Task<AIResponse> ExecuteAsync(
        AIRequest request,
        CancellationToken cancellationToken = default);
}

HermesCliProvider implements this interface and stands on equal footing with ClaudeCodeProvider, OpenCodeProvider, and others. The benefits of this design include:

Replaceability: switching Providers does not affect upper-layer business logic
Testability: Providers can be mocked easily for unit testing
Extensibility: adding a new Provider only requires implementing the interface

In the end, interfaces are a lot like rules. Once the rules are in place, everyone can coexist harmoniously, play to their strengths, and avoid stepping on each other. There is a certain elegance in that.

Core Implementation

Provider layer implementation

HermesCliProvider is the core of the entire integration. It coordinates the various components needed to complete a single AI invocation:

public sealed class HermesCliProvider : IAIProvider, IVersionedAIProvider
{
    private readonly ICliProvider<LibsHermesOptions> _provider;
    private readonly ConcurrentDictionary<string, string> _sessionBindings;

    public ProviderCapabilities Capabilities { get; } = new()
    {
        SupportsStreaming = true,
        SupportsTools = true,
        SupportsSystemMessages = true,
        SupportsArtifacts = false
    };

    public async IAsyncEnumerable<AIStreamingChunk> StreamAsync(
        AIRequest request,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        // 1. Resolve the session binding key
        var bindingKey = ResolveBindingKey(request.CessionId);

        // 2. Get or create a Hermes session through the session pool
        var options = new HermesOptions
        {
            ExecutablePath = _platformConfiguration.ExecutablePath,
            Arguments = _platformConfiguration.Arguments,
            SessionId = _sessionBindings.TryGetValue(bindingKey, out var sessionId) ? sessionId : null,
            WorkingDirectory = request.WorkingDirectory,
            Model = request.Model
        };

        // 3. Execute and collect the streaming response
        await foreach (var message in _provider.ExecuteAsync(options, request.Prompt, cancellationToken))
        {
            // 4. Map ACP messages to AIStreamingChunk
            if (_responseMapper.TryConvertToStreamingChunk(message, out var chunk))
            {
                yield return chunk;
            }
        }
    }
}

Several design points are especially important here:

Session binding: uses CessionId to bind multiple requests to the same Hermes subprocess, preserving context continuity across multi-turn conversations
Response mapping: converts Hermes ACP message format into the unified AIStreamingChunk format
Streaming support: uses IAsyncEnumerable to support true streaming responses

Session binding is a bit like human relationships. Once a connection is established, future communication has context, so you do not need to start from zero each time. Of course, that relationship still has to be maintained.

ACP protocol adaptation

Hermes uses ACP (Agent Communication Protocol), which differs from a traditional HTTP API. ACP is a protocol based on standard input and output, and it has several characteristics:

Startup marker: after the Hermes process starts, it outputs the //ready marker
Dynamic authentication: authentication methods are not fixed and must be negotiated through the protocol
Session reuse: established sessions are reused through SessionId
Fragmented responses: a complete response may be split across multiple session/update notifications

HagiCode handles these characteristics through StdioAcpTransport:

public class StdioAcpTransport
{
    public async Task InitializeAsync(CancellationToken cancellationToken)
    {
        // Wait for the //ready marker
        var readyLine = await _outputReader.ReadLineAsync(cancellationToken);
        if (readyLine != "//ready")
        {
            throw new InvalidOperationException("Hermes did not send ready signal");
        }

        // Send the initialize request
        await SendRequestAsync(new
        {
            jsonrpc = "2.0",
            id = 1,
            method = "initialize",
            @params = new
            {
                protocolVersion = "2024-11-05",
                capabilities = new { },
                clientInfo = new { name = "HagiCode", version = "1.0.0" }
            }
        }, cancellationToken);
    }
}

Protocols are a bit like mutual understanding between people. Once that understanding is there, communication flows much more smoothly. Building it just takes time.

Session pool management

Starting Hermes subprocesses frequently is expensive, so we implemented a session pool mechanism:

services.AddSingleton(static _ =>
{
    var registry = new CliProviderPoolConfigurationRegistry();
    registry.Register("hermes", new CliPoolSettings
    {
        MaxActiveSessions = 50,
        IdleTimeout = TimeSpan.FromMinutes(10)
    });
    return registry;
});

Key session pool parameters:

MaxActiveSessions: controls the concurrency limit to avoid exhausting resources
IdleTimeout: idle timeout that balances startup cost against memory usage

In practice, we found that:

If the idle timeout is too short, sessions restart frequently; if it is too long, memory remains occupied
The concurrency limit must be tuned according to actual load, because setting it too high can make the system sluggish
Session pool utilization needs monitoring so parameters can be adjusted in time

This is much like many choices in life: being too aggressive creates problems, while being too conservative misses opportunities. The goal is simply to find the right balance.

Front-end Integration

Type mapping

The front end needs to correctly identify the Hermes Provider and display the corresponding visual elements:

export const resolveExecutorVisualTypeFromProviderType = (
  providerType: PCode_Models_AIProviderType | null | undefined
): ExecutorVisualType => {
  switch (providerType) {
    case PCode_Models_AIProviderType.HERMES_CLI:
      return 'Hermes';
    default:
      return 'Unknown';
  }
};

Visual presentation

Hermes has its own icon and color identity:

const renderExecutorGlyph = (executorType: ExecutorVisualType, iconSize: number) => {
  switch (executorType) {
    case 'Hermes':
      return (
        <svg viewBox="0 0 24 24" fill="none" className="h-4 w-4">
          <rect x="4" y="4" width="16" height="16" rx="4" fill="currentColor" opacity="0.16" />
          <path d="M8 7v10M16 7v10M8 12h8" stroke="currentColor" strokeWidth="2" strokeLinecap="round" />
        </svg>
      );
    default:
      return <DefaultAvatar />;
  }
};

After all, beautiful things deserve beautiful presentation. Making sure that beauty is actually visible still depends on front-end craftsmanship.

Contract synchronization

The front end and back end keep their contract aligned through OpenAPI generation. The back end defines the AIProviderType enum:

public enum AIProviderType
{
    Unknown,
    ClaudeCode,
    OpenCode,
    HermesCli  // Newly added
}

The front end generates the corresponding TypeScript type through OpenAPI, ensuring enum values stay consistent. This is the key to avoiding the front end displaying Unknown.

A contract is a lot like a promise. Once agreed, it has to be honored, otherwise you end up in awkward situations like Unknown.

Configuration Management

Hermes configuration is managed through appsettings.json:

{
  "Providers": {
    "HermesCli": {
      "ExecutablePath": "hermes",
      "Arguments": "acp",
      "StartupTimeoutMs": 10000,
      "ClientName": "HagiCode",
      "Authentication": {
        "PreferredMethodId": "api-key",
        "MethodInfo": {
          "api-key": "your-api-key-here"
        }
      },
      "SessionDefaults": {
        "Model": "claude-sonnet-4-20250514",
        "ModeId": "default"
      }
    }
  }
}

This configuration-driven design brings flexibility:

executable paths can be overridden, which is convenient for development and testing
startup arguments can be customized to match different Hermes versions
authentication information can be configured to support multiple authentication methods

Configuration is a bit like multiple-choice questions in life. If enough options are available, there is usually one that fits. That said, too many options can create decision fatigue of their own.

Practical Experience

Health checks

Building a reliable Provider requires comprehensive health checks:

public async Task<ProviderTestResult> PingAsync(CancellationToken cancellationToken = default)
{
    var response = await ExecuteAsync(new AIRequest
    {
        Prompt = "Reply with exactly PONG.",
        CessionId = null,
        AllowedTools = Array.Empty<string>(),
        WorkingDirectory = ResolveWorkingDirectory(null)
    }, cancellationToken);

    var success = string.Equals(response.Content.Trim(), "PONG", StringComparison.OrdinalIgnoreCase);
    return new ProviderTestResult
    {
        ProviderName = Name,
        Success = success,
        ResponseTimeMs = stopwatch.ElapsedMilliseconds,
        ErrorMessage = success ? null : $"Unexpected Hermes ping response: '{response.Content}'."
    };
}

Points to watch in health checks:

Use simple test cases and avoid overly complex scenarios
Set reasonable timeout values
Record response time to support performance analysis

Just as people need physical checkups, systems need health checks too. The sooner issues are found, the easier they are to fix.

Validation tool

HagiCode provides a dedicated console for validating Hermes integration:

# Basic validation
HagiCode.Libs.Hermes.Console --test-provider

# Full suite (including repository analysis)
HagiCode.Libs.Hermes.Console --test-provider-full --repo .

# Custom executable
HagiCode.Libs.Hermes.Console --test-provider-full --executable /path/to/hermes

This tool is extremely useful during development because it lets us quickly verify whether the integration is correct. After all, no one wants to wait until a problem surfaces before remembering to test.

Handling common issues

Authentication failure

Check whether Authentication.PreferredMethodId matches the authentication method Hermes actually supports
Confirm the authentication information format is correct, such as API Key or Bearer Token

Session timeout

Increase the StartupTimeoutMs value
Check MCP server reachability
Review system resource utilization

Incomplete response

Ensure session/update notifications and the final result are aggregated correctly
Check cancellation logic in streaming handling
Verify error handling is complete

Front end displays Unknown

Confirm OpenAPI generation already includes the HermesCli enum value
Check whether type mapping is correct
Clear browser cache and regenerate types

Problems will always exist. When they appear, the important thing is not to panic. Trace the cause step by step, and in the end, most of them can be solved.

Performance optimization suggestions

Use the session pool: reuse ACP subprocesses to reduce startup overhead
Set timeouts appropriately: balance memory use against startup cost
Reuse session IDs: use the same CessionId for batch tasks
Configure MCP on demand: avoid unnecessary tool invocations

Performance is a lot like efficiency in daily life. When you get it right, you achieve more with less; when you get it wrong, effort multiplies while results shrink. Finding that “right” point takes both experience and luck.

Conclusion

Integrating Hermes Agent into a production system requires considering problems across multiple dimensions:

Architecture: design a unified Provider interface and implement a replaceable component architecture
Protocol: correctly handle ACP-specific behavior such as startup markers and dynamic authentication
Performance: reuse resources through the session pool and balance startup cost against memory usage
Front end: ensure contract synchronization and provide a consistent visual experience

HagiCode’s experience shows that with good layered design and configuration-driven implementation, a complex Agent system can be integrated seamlessly into an existing architecture.

These principles sound simple when described in words, but actual implementation always introduces many different kinds of problems. That is fine. If a problem gets solved, it becomes experience. If it does not, it becomes a lesson. Either way, it still has value.

Beautiful things or people do not need to be possessed; as long as they remain beautiful, it is enough to quietly appreciate that beauty. Technology is much the same. If it helps make the system better, then the specific framework or protocol matters far less than people sometimes think.

References

Copyright Notice

Thank you for reading. If you found this article helpful, feel free to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-04-14-hermes-agent-integration-practice/
Copyright: Unless otherwise stated, all articles in this blog are licensed under BY-NC-SA. Please indicate the source when reprinting.

How to Install and Use Hermes: A Quick Start from the Local CLI to Feishu Integration

Apr 13, 2026

If you want to install Hermes and start using it, the shortest path is really just three steps:

Run the official installation command
Start the CLI in your terminal with hermes
If you want to keep using it in Feishu, then configure hermes gateway setup

This article does not try to explain every Hermes capability all at once. Instead, it helps you complete the most important beginner loop first: install it, get it running, start using it, and then connect it to one of the most common messaging-platform scenarios.

What Hermes is and who it is for

Hermes Agent is an AI agent that you can use either from a local terminal or through a messaging-platform gateway.

For most developers, it has two common entry points:

CLI: Type hermes in your terminal to enter the interactive interface directly.
Messaging Gateway: Run hermes gateway, then chat with it from platforms such as Feishu, Telegram, Discord, and Slack.

If your goal right now is simply to get started quickly, do not reverse the order. Start with this path instead:

Install Hermes first
Verify it works from the CLI first
Then decide whether you want to connect a messaging platform

This makes problems easier to diagnose and is more suitable for people using Hermes for the first time.

What to know before installing Hermes

According to the Hermes README, the official quick-install path supports these environments:

Linux
macOS
WSL2
Android via Termux

Notes for Windows users

Hermes does not currently support running directly on native Windows. If you are using Windows, the recommended approach is to install WSL2 first and then run the installation command inside WSL2.

It is best to make this clear at the beginning, because many installation failures are not caused by the command itself, but by using an unsupported runtime environment.

How to install Hermes

The quick installation command provided in the Hermes README is:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

This command runs the official installation script and handles platform-specific initialization steps.

Reload your shell after installation

Once installation finishes, reload your shell environment first. The most common command is:

source ~/.bashrc

If you use zsh, you can use:

source ~/.zshrc

How to confirm Hermes is installed correctly

The most direct way to check is to run:

hermes

If you want additional confirmation that your configuration and dependencies are working, you can also run:

hermes doctor

hermes doctor is especially useful in these situations:

The command behaves abnormally after installation
Model configuration fails
The gateway fails to start
You are not sure whether your environment dependencies are complete

How to start using Hermes for the first time

Start with the CLI

If you just want to confirm as quickly as possible that Hermes works, the simplest method is:

hermes

This launches the interactive Hermes CLI. For first-time Hermes users, it is also the most recommended starting point, because you can verify the most essential things first:

Whether the command is actually available
Whether the current model configuration works properly
Whether the terminal toolchain is working correctly
Whether the interaction style matches what you need

These commands are enough for your first round of setup

The Hermes README lists several high-frequency commands, and together they form a practical first-use path:

hermes model
hermes tools
hermes config set
hermes setup
hermes update
hermes doctor

If you are not sure what each one does, remember them like this:

hermes model: choose or switch models
hermes tools: view and configure currently available tools
hermes config set: change specific configuration items
hermes setup: run the full initialization wizard once
hermes update: update Hermes
hermes doctor: troubleshoot problems

For beginners, the most practical order is usually:

Run hermes model first
If you want to configure all common options at once, then run hermes setup

The two most common ways to use Hermes

1. Use Hermes in the terminal as a daily development assistant

CLI mode is a good fit for these scenarios:

Ask questions directly while writing code locally
Inspect projects, edit files, and run commands
Do one-off debugging or review work
Collaborate continuously in the current working directory

Its biggest advantage is that it is the shortest path: no extra platform integration, no bot configuration to handle up front, and it is the best way to build your first set of usage habits.

2. Use Hermes through a messaging platform

If you want to chat with Hermes on platforms such as Feishu, Telegram, or Discord, you need to use the messaging gateway.

The most common entry commands are:

hermes gateway setup
hermes gateway

Specifically:

hermes gateway setup is used for interactive platform configuration
hermes gateway is used to start the gateway process

According to the official documentation, the gateway is a unified background process that connects your configured platforms, manages sessions, and handles features such as cron jobs.

Using Feishu as an example: how to connect Hermes to a messaging platform

If most of your daily work happens in Feishu, then Feishu/Lark is a very natural way to use Hermes.

The minimum viable integration path

The official documentation recommends this entry command for Feishu/Lark:

hermes gateway setup

After you run it, simply choose Feishu / Lark in the wizard.

The Feishu documentation describes two connection modes:

websocket: recommended
webhook: optional

If Hermes runs on your laptop, workstation, or private server, using websocket first is usually simpler because you do not need to expose a public callback URL.

If you configure it manually, at least know these variables

If you are not using the wizard and are writing the configuration manually, the Feishu documentation lists these core variables:

FEISHU_APP_ID=cli_xxx
FEISHU_APP_SECRET=***
FEISHU_DOMAIN=feishu
FEISHU_CONNECTION_MODE=websocket

FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy
FEISHU_HOME_CHANNEL=oc_xxx

Two of them deserve special attention:

FEISHU_ALLOWED_USERS: recommended, so not everyone who can reach the bot can use it directly
FEISHU_HOME_CHANNEL: lets you predefine a home chat to receive cron results or default notifications

Why Hermes sometimes does not reply in Feishu group chats

This detail is easy to miss: in Feishu group chats, Hermes does not respond to every message by default.

The official documentation clearly states:

In direct messages, Hermes responds to messages
In group chats, you must explicitly @ the bot before it will process the message

If you want to set a Feishu conversation as the home channel, you can also use this in the chat:

/set-home

Or define it in the configuration ahead of time:

FEISHU_HOME_CHANNEL=oc_xxx

The Hermes commands beginners should remember first

Whether you use Hermes in the CLI or on a messaging platform, remembering the following commands is already enough to get started:

/new or /reset: start a new session
/model: view or switch the model
/retry: retry the previous turn
/undo: undo the previous interaction
/compress: manually compress the context
/help: view help

If you mainly use Hermes on a messaging platform, remember one more:

/sethome or /set-home: set the current chat as the home channel

These commands cover the most common beginner-stage operations: restarting, adjusting, rolling back, checking, and continuing.

Frequently asked questions

Can Windows install Hermes directly?

No. The current official documentation clearly states that native Windows is not supported, and WSL2 is recommended.

What should I do if typing `hermes` does nothing after installation?

It is best to troubleshoot in this order:

Reload your shell first, for example with source ~/.bashrc
Run hermes again
If it is still abnormal, run hermes doctor

Why does the bot not reply in a Feishu group?

Check these three things first:

Whether you @ mentioned Hermes in the group
Whether FEISHU_ALLOWED_USERS restricts the current user
Whether the current group-chat policy allows handling group messages

According to the official Feishu documentation, explicitly using an @mention is required in group-chat scenarios.

Recommended next steps

If you simply want to start using Hermes as quickly as possible, this is the most recommended order:

Run the installation command first
Start with hermes in the local CLI first
Use hermes model and hermes setup to complete the basic configuration
If you want to keep using it in Feishu, then configure hermes gateway setup

If this article is the first part of a series, its best role is not to explain every advanced feature all at once, but to get users in the door first.

The following topics are better split into follow-up articles:

A complete Hermes Feishu integration guide
A guide to common Hermes slash commands
A guide to Hermes gateway configuration and troubleshooting

If you plan to keep creating Hermes content, this article can serve as the starting point for later posts, while you gradually build out the internal link structure.

VSCode and code-server: Choosing a Browser-Based Code Editing Solution

Apr 13, 2026

VSCode and code-server: Choosing a Browser-Based Code Editing Solution

When building browser-based code editing capabilities, developers face a key choice: use VSCode’s official code serve-web feature, or adopt the community-driven code-server solution? This decision affects not only the technical architecture, but also license compliance and deployment flexibility.

Background

Technical selection is a lot like choosing a path in life. Once you pick one, you usually have to keep walking it, and switching later can become very expensive.

In the era of AI-assisted programming, browser-based code editing is becoming increasingly important. Users expect that after an AI assistant finishes analyzing code, they can immediately open an editor in the same browser session and make changes without switching applications. That kind of seamless experience should simply be there when you need it.

However, when implementing this feature, developers face a critical technical choice: should they use VSCode’s official code serve-web feature, or the community-driven code-server solution?

Each option has its own strengths and trade-offs, and choosing poorly can create a lot of trouble later. Licensing is one example: if you only discover after launch that your product is not license-compliant, it is already too late. Deployment is another: a solution might work perfectly in development, then run into all kinds of problems once moved into containers. These are exactly the kinds of pitfalls teams want to avoid.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI-driven coding assistant. While implementing browser-based code editing, we studied both solutions in depth and ultimately designed our architecture to support both, while choosing code-server as the default.

Project repository: github.com/HagiCode-org/site

Licensing Differences (Most Important)

This is the most fundamental difference between the two solutions, and the first factor we considered during evaluation. When making a technical choice, it is important to understand the legal risks up front.

code-server

MIT license, fully open source
Maintained by Coder.com with an active community
Free to use commercially, modify, and distribute
No restrictions on usage scenarios

VSCode code serve-web

Part of the Microsoft VSCode product
Uses Microsoft’s license (the VS Code license includes restrictions on commercial use)
Primarily intended for individual developers
Enterprise deployment may require additional commercial licensing review

From a licensing perspective, code-server is more friendly to commercial projects. This is something you need to think through during product planning, because migrating later can become very costly.

Deployment Differences

Once licensing is settled, the next issue is deployment. That directly affects your operations cost and architectural design.

code-server

A standalone Node.js application that can be deployed independently
Supports multiple runtime sources:
- Directly specifying the executable path
- Looking it up through the system PATH
- Automatic detection of an NVM Node.js 22.x environment
No need to install the VSCode desktop application on the server
Easier to deploy in containers

VSCode code serve-web

Must depend on a locally installed VSCode CLI
Requires an available code command on the host machine
The system filters out VS Code Remote CLI wrappers
Primarily designed for local development scenarios

code-server is better suited for server and container deployment scenarios. If your product needs to run in Docker, or your users do not have VSCode installed, code-server is usually the right choice.

Feature Parameter Differences

The two solutions also differ in a few feature parameters. The differences are not huge, but they can create integration friction in real-world usage.

Feature	code-server	code serve-web
Public base path	`/` (configurable)	`/vscode-server` (fixed)
Authentication	`--auth` parameter with multiple modes	`--connection-token` / `--without-connection-token`
Data directory	`{DataDir}/code-server`	`{DataDir}/vscode-serve-web`
Telemetry	Disabled by default with `--disable-telemetry`	Depends on VSCode settings
Update checks	Can be disabled with `--disable-update-check`	Depends on VSCode settings

These differences need special attention during integration. For example, different URL paths mean your frontend code needs dedicated handling.

Availability Detection Differences

When implementing editor switching, the availability detection logic also differs.

code-server

Always returned as a visible implementation
Still shown even when unavailable, with an install-required status
Supports automatic detection of an NVM Node.js 22.x environment

code serve-web

Only visible when a local code CLI is detected
If unavailable, the frontend automatically hides this option
Depends on the local VSCode installation state

This difference directly affects the user experience. code-server is more transparent: users can see the option and understand that installation is still required. code serve-web is more hidden: users may not even realize the option exists. Which approach is better depends on the product positioning.

HagiCode’s Dual-Implementation Architecture

After in-depth analysis, the HagiCode project adopted a dual-implementation architecture that supports both solutions at the architectural level.

Defaulting to code-server

// The default active implementation is code-server
// If an explicit activeImplementation is saved, try that implementation first
// If the requested implementation is unavailable, the resolver tries the other one
// If a fallback occurs, return fallbackReason

We default to code-server mainly because of licensing and deployment flexibility. However, for users who already have a local VSCode environment, code serve-web is also a solid option.

Implementation Selector

CodeServerImplementationResolver is responsible for unifying:

Implementation selection during startup warm-up
Implementation selection when reading status
Implementation selection when opening projects
Implementation selection when opening Vaults

This design allows the system to respond flexibly to different scenarios, and users can choose the implementation that best matches their environment.

Frontend Adaptation Rules

// When localCodeAvailable=false, do not show code serve-web
// When localCodeAvailable=true, show the code serve-web configuration

The frontend automatically shows available options based on the environment, so users are not confused by features they cannot use.

Practical Guide

After all that theory, what should you pay attention to during actual deployment? In the end, implementation is what matters.

Docker Deployment Recommendation

For containerized deployment, code-server is the better choice:

# Use the official code-server image directly
FROM codercom/code-server:latest

# Or install through npm
RUN npm install -g code-server

This solves the problem in a single layer without requiring an additional VSCode installation.

Configuration Examples

code-server configuration

{
  "vscodeServer": {
    "enabled": true,
    "activeImplementation": "code-server",
    "codeServer": {
      "host": "0.0.0.0",
      "port": 8080,
      "executablePath": "",
      "authMode": "none"
    }
  }
}

code serve-web configuration

{
  "vscodeServer": {
    "enabled": true,
    "activeImplementation": "serve-web",
    "serveWeb": {
      "host": "0.0.0.0",
      "port": 8080,
      "executablePath": "/usr/local/bin/code"
    }
  }
}

Configuration can be a bit tedious the first time, but once it is in place, things become much easier to maintain.

URL Construction Differences

code-server

http://localhost:8080/?folder=/path/to/project&vscode-lang=zh-CN

code serve-web

http://localhost:8080/vscode-server/?folder=/path/to/project&tkn=xxx&vscode-lang=zh-CN

Pay attention to the differences in paths and parameters. You need to handle them separately during integration.

Switching Implementations

The system supports runtime switching and automatically stops the previous implementation when switching:

// VsCodeServerManager automatically handles mutual exclusion
// When switching activeImplementation, the old implementation will not keep running in the background

This design lets users try different implementations at any time and find the option that works best for them.

Status Monitoring

const { settings, runtime } = await getVsCodeServerSettings();

// runtime.activeImplementation: "code-server" | "serve-web"
// runtime.fallbackReason: reason for switching
// runtime.status: "running" | "starting" | "stopped" | "unhealthy"

When status is visible, users can quickly determine whether a problem comes from the server side or from their own operation.

Conclusion

Comparison Dimension	code-server	code serve-web	Recommendation
License	MIT (commercial-friendly)	Microsoft (restricted)	code-server
Deployment flexibility	Independent deployment	Depends on local VSCode	code-server
Server suitability	Designed for servers	Mainly for local development	code-server
Containerization	Native support	Requires VSCode installation	code-server
Feature completeness	Close to desktop edition	Official complete version	code serve-web
Maintenance activity	Active community	Officially maintained by Microsoft	Both have strengths

Recommended strategy: Use code-server first, and consider code serve-web when you need full official functionality and already have a local VSCode environment.

The approach shared in this article is distilled from HagiCode’s real development experience. If you find this solution valuable, that is also a good sign that HagiCode itself is worth paying attention to.

References

HagiCode GitHub: github.com/HagiCode-org/site
HagiCode website: hagicode.com
code-server website: coder.com/code-server
VSCode official documentation: code.visualstudio.com/docs

If this article helped you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Try the one-click installation: docs.hagicode.com/installation/docker-compose
Quick install for the Desktop app: hagicode.com/desktop/
Public beta has started, and you are welcome to try it

Copyright Notice

Thank you for reading. If you found this article useful, feel free to like, bookmark, and share it. This content was created with AI-assisted collaboration, with the final version reviewed and approved by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-13-vscode-web-integration-browser-editing/
Copyright: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please include the source when reposting.

Fast Code Editing in the Browser: VSCode Web Integration in Practice

Apr 12, 2026

Fast Code Editing in the Browser: VSCode Web Integration in Practice

After AI finishes analyzing code, how do you immediately open an editor in the browser and start making changes? This article shares our practical experience integrating code-server in the HagiCode project to create a seamless bridge between the AI assistant and the code editing experience.

Background

In the era of AI-assisted programming, developers often need to inspect and edit code quickly. The traditional workflow is simple: open the project in a desktop IDE, locate the file, edit it, and save. But in some situations, that flow always feels slightly off.

Scenario one: remote development. When using an AI assistant like HagiCode, the backend may be running on a remote server or inside a container, and local machines cannot directly access the project files. Every time you need to inspect or modify code, you have to connect through SSH or another method, and the experience feels fragmented. It is like wanting to meet someone through a thick pane of glass: you can see them, but you cannot reach them.

Scenario two: quick previews. After the AI assistant analyzes the code, the user may only want to quickly browse a file or make a small change. Launching a full desktop IDE feels heavy, while a lightweight in-browser editor better fits the need for a “quick look.” After all, who wants to mobilize an entire toolchain just to take a glance?

Scenario three: cross-device collaboration. When working across different devices, a browser-based editor provides a unified access point without requiring every machine to be configured with a development environment. That alone saves a lot of trouble. Life is short; why repeat the same setup work over and over?

To solve these pain points, we integrated VSCode Web into the HagiCode project. This lets the AI assistant and the code editing experience connect seamlessly: after AI analyzes the code, users can immediately open an editor and make changes in the same browser session, without switching applications. It is the kind of experience where, when you need it, it is simply there.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI-driven coding assistant designed to improve development efficiency through natural language interaction. During development, we found that users often need to switch quickly between AI analysis and code editing, which pushed us to explore how to integrate the editor directly into the browser.

Project repository: github.com/HagiCode-org/site

Technical Choice: Why code-server?

Among the many VSCode Web solutions available, we chose code-server. There were a few concrete reasons behind that decision.

Feature completeness. code-server is the web version of VSCode and supports most desktop features, including the extension system, intelligent suggestions, debugging, and more. That means users can get an editing experience in the browser that is very close to the desktop version. After all, who really wants to compromise on functionality?

Flexible deployment. code-server can run as an independent service and also supports Docker-based deployment, which fits well with HagiCode’s architecture. Our backend is written in C#, the frontend uses React, and the two communicate with the code-server service through REST APIs. It is like building with blocks: every piece has its place.

Secure authentication. code-server includes a built-in connection-token mechanism to prevent unauthorized access. Each session has a unique token so that only authorized users can open the editor. Security is one of those things you only fully appreciate once you have it.

Architecture Design

HagiCode’s VSCode Web integration uses a front-end/back-end separated architecture.

Frontend Service Layer

The frontend wraps interactions with the backend through vscodeServerService.ts:

// Open project
export async function openProjectInCodeServer(
  id: string,
  currentInterfaceLanguage?: string,
): Promise<VsCodeServerLaunchResponseDto>

// Open vault
export async function openVaultInCodeServer(
  id: string,
  path?: string,
  currentInterfaceLanguage?: string,
): Promise<VsCodeServerLaunchResponseDto>

The difference between these two methods is straightforward: openProjectInCodeServer opens the entire project, while openVaultInCodeServer opens a specific path inside a Vault. For MonoSpecs multi-repository projects, the system automatically creates a workspace file. Clear responsibilities are often enough when each part does its own job well.

Backend Service Layer

The backend VaultAppService.cs implements the core logic:

public async Task<VsCodeServerLaunchResponseDto> OpenInCodeServerAsync(
    string id,
    string? relativePath = null,
    string? currentInterfaceLanguage = null,
    CancellationToken cancellationToken = default)
{
    // 1. Get settings and check whether the feature is enabled
    var settings = await _vsCodeServerSettingsService.GetResolvedSettingsAsync(cancellationToken);
    if (!settings.Enabled) {
        throw new BusinessException(VsCodeServerErrorCodes.Disabled, "VSCode Server is disabled.");
    }

    // 2. Get vault and resolve the launch directory
    var vault = await RequireVaultAsync(id, cancellationToken);
    var launchDirectory = ResolveLaunchDirectory(vault, relativePath);

    // 3. Ensure code-server is running and get runtime info
    var runtime = await _vsCodeServerManager.EnsureStartedAsync(settings, cancellationToken);

    // 4. Resolve language settings
    var language = _vsCodeServerSettingsService.ResolveLaunchLanguage(
        settings.Language,
        currentInterfaceLanguage);

    // 5. Build launch URL
    return new VsCodeServerLaunchResponseDto {
        LaunchUrl = AppendQueryString(runtime.BaseUrl, new Dictionary<string, string?> {
            ["folder"] = launchDirectory,
            ["tkn"] = runtime.ConnectionToken,
            ["vscode-lang"] = language,
        }),
        ConnectionToken = runtime.ConnectionToken,
        OpenMode = "folder",
        Runtime = VsCodeServerSettingsService.MapRuntime(
            await _vsCodeServerManager.GetRuntimeSnapshotAsync(cancellationToken)),
    };
}

This method has a very clear responsibility: check settings, resolve paths, start the service, and build the URL. Among them, the ResolveLaunchDirectory method performs path security checks to prevent path traversal attacks. Code can feel a little like poetry when every line has a purpose.

Automatic Runtime Management

The backend manages the code-server process through VsCodeServerManager:

Check process status
Automatically start stopped services
Return runtime snapshots such as port, process ID, and start time

This design lets the system automatically handle the code-server lifecycle, so users do not need to manage service processes manually. Life is already complicated enough; anything that can be automated should be.

Language Synchronization

HagiCode supports a multilingual interface, and code-server needs to follow that setting. The system supports three language modes:

follow: follow the current interface language
zh-CN: fixed to Chinese
en-US: fixed to English

The setting is passed to code-server through the vscode-lang URL parameter so that the editor language stays consistent with the HagiCode interface. Language feels best when it is unified.

MonoSpecs Multi-Repository Workspace

For MonoSpecs projects, which contain multiple sub-repositories inside one monorepo, the system automatically creates a .code-workspace file:

private async Task<string> CreateWorkspaceFileAsync(Project project, Guid projectId)
{
    var folders = await ResolveWorkspaceFoldersAsync(project.Path);
    var workspaceDocument = new {
        folders = folders.Select(path => new { path }).ToArray(),
    };
    // Generate workspace file...
}

This makes it possible to edit multiple sub-repositories in the same code-server instance, which is especially practical for large monorepo projects. Multiple repositories in one window can feel like multiple stories gathered in the same book.

Frontend Integration

The HagiCode frontend uses React + TypeScript, and integrating code-server is not especially complicated.

Quick Action Button

Add a Code Server button to the project card:

<Button
  size="sm"
  variant="default"
  onClick={() => onAction({ type: 'open-code-server' })}
>
  <Globe className="h-3 w-3 mr-1" />
  <span className="text-xs">{t('project.openCodeServer')}</span>
</Button>

This button triggers the open action and calls the backend API to obtain the launch URL. One button, one action, direct and simple.

Handling the Open Action

const handleAction = async (action: ProjectAction) => {
  if (action.type === 'open-code-server') {
    const response = await openProjectInCodeServer(project.id, i18n.language);
    window.open(response.launchUrl, '_blank', 'noopener,noreferrer');
  }
};

Use window.open to open code-server in a new tab. The noopener,noreferrer parameters provide extra security. When it comes to security, there is no such thing as being too careful.

Vault Editing Entry

Add a similar edit button in the Vault list:

const handleEditVault = async (vault: VaultItemDto) => {
  const response = await openVaultInCodeServer(vault.id);
  window.open(response.launchUrl, '_blank', 'noopener,noreferrer');
};

Projects and Vaults use the same open mechanism, which keeps the interaction consistent. Consistency matters almost as much as the feature itself.

URL Construction Logic

The URL format for code-server has a few details worth noting.

Folder mode:

http://{host}:{port}/?folder={path}&tkn={token}&vscode-lang={lang}

Workspace mode:

http://{host}:{port}/?workspace={workspacePath}&tkn={token}&vscode-lang={lang}

Here, tkn is the connection token. It is generated automatically every time code-server starts, ensuring secure access. The vscode-lang parameter controls the editor UI language. Every one of these parameters has a role to play.

Usage Scenarios

Scenario One: AI-Assisted Code Review

The user talks with HagiCode, the AI analyzes the project code and finds a potential issue, and then the user clicks the “Open in Code Server” button to open the editor directly in the browser, inspect the affected file, fix it, and return to HagiCode to continue the conversation. The entire flow happens in the browser without switching applications. It feels smooth in the way running water feels smooth.

Scenario Two: Editing Study Materials in a Vault

A user creates a Vault for studying an open source project and wants to add study notes under the docs/ directory. With code-server, they can edit Markdown files directly in the browser, save them, and let HagiCode immediately read the updated notes. This is especially useful for building a personal knowledge base. Knowledge only becomes more valuable the more you accumulate it.

Scenario Three: MonoSpecs Multi-Repository Development

A MonoSpecs project contains multiple sub-repositories, and code-server automatically creates a multi-folder workspace. In the browser, users can edit code across several repositories at once and then commit changes back to their respective Git repositories. This workflow is particularly well suited for changes that need to span multiple repositories. Editing several repositories together takes a bit of technique, just like handling multiple tasks at the same time.

Security Considerations

When implementing code-server integration, security deserves special attention. If security goes wrong, you always notice too late.

Connection Token

The connection-token is generated randomly and should not be exposed. It is best used under HTTPS to prevent the token from being intercepted by a man-in-the-middle. Sensitive information is worth protecting properly.

File Path Security

The backend implements path traversal checks:

private static string ResolveLaunchDirectory(VaultRegistryEntry vault, string? relativePath)
{
    var vaultRoot = EnsureTrailingSeparator(Path.GetFullPath(vault.PhysicalPath));
    var combinedPath = Path.GetFullPath(Path.Combine(vaultRoot, relativePath ?? "."));
    if (!combinedPath.StartsWith(vaultRoot, StringComparison.OrdinalIgnoreCase))
    {
        throw new BusinessException(VaultRelativePathTraversalCode, "Relative path traversal detected.");
    }
    return combinedPath;
}

This code ensures that users cannot use ../ or similar patterns to access files outside the Vault directory. Boundary checks are always better done than skipped.

Permission Control

The code-server process should run with appropriate user permissions so that it cannot access sensitive system files. It is best to run the code-server service under a dedicated user. Permission control is one of those fundamentals you should always keep in place.

Performance Optimization

code-server consumes server resources, so here are a few optimization suggestions:

Monitor CPU and memory usage, and adjust resource limits when necessary
Large projects may require longer timeouts
Implement automatic session timeout cleanup to release resources
Consider caching to reduce repeated computation

HagiCode provides a runtime status monitoring API, and the frontend can call getVsCodeServerSettings() to retrieve the current state:

const { settings, runtime } = await getVsCodeServerSettings();
// runtime.status: 'disabled' | 'stopped' | 'starting' | 'running' | 'unhealthy'
// runtime.baseUrl: "http://localhost:8080"
// runtime.processId: 12345

This design allows users to clearly understand the health status of code-server and quickly locate problems when something goes wrong. When the status is visible, people feel more in control.

User Experience Details

During implementation, we discovered a few details that noticeably affect the user experience and deserve extra attention.

Opening code-server for the first time may require waiting for startup, and that delay can range from a few seconds to half a minute. It is a good idea to show a loading state in the frontend so users know the system is still working. Waiting is easier when there is feedback.

Browsers may block the popup, so users should be prompted to allow it manually. On first launch, HagiCode displays guidance that explains how to grant the necessary browser permissions. User experience often lives in exactly these small details.

It is also a good idea to display runtime status such as starting, running, or error, so that when problems occur, users can quickly tell whether the issue is on the server side or in their own operation. Knowing where the problem is at least gives you a place to start.

Configuration Example

The configuration for code-server is not complicated:

{
  "vscodeServer": {
    "enabled": true,
    "host": "0.0.0.0",
    "port": 8080,
    "language": "follow"
  }
}

enabled controls whether the feature is turned on, host and port define the listening address, and language sets the language mode. These settings can be modified through the UI and take effect immediately. Simple things are often the easiest to use.

Conclusion

HagiCode’s VSCode Web integration provides an elegant solution: it lets the AI assistant and the code editing experience connect seamlessly. By integrating code-server into the browser, users can quickly act on AI analysis results and complete the full flow from analysis to editing in the same browser session.

This solution brings several key advantages: a unified experience, because projects and Vaults use the same open mechanism; multi-repository support, because MonoSpecs projects automatically create workspaces; and controllable security, thanks to runtime status monitoring and path safety checks.

The approach shared in this article is something HagiCode distilled from real development work. If you find this solution valuable, that suggests our engineering practice is doing something right, and HagiCode itself may be worth a closer look. Good tools deserve to be seen by more people.

References

HagiCode GitHub: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
code-server official website: coder.com/code-server
Related code files:
- repos/web/src/services/vscodeServerService.ts
- repos/hagicode-core/src/PCode.Application/Services/VaultAppService.cs
- repos/hagicode-core/src/PCode.Application/ProjectAppService.VsCodeServer.cs

If this article helped you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
One-click installation guide: docs.hagicode.com/installation/docker-compose
Quick install for the Desktop app: hagicode.com/desktop/
Public beta has started, and you are welcome to try it

Copyright Notice

Thank you for reading. If you found this article useful, feel free to like it, save it, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and approved by the author.

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-04-12-vscode-web-integration-browser-editing/
Copyright notice: Unless otherwise stated, all articles in this blog are licensed under BY-NC-SA. Please include attribution when reposting.

Guide to Creating a Border Light Sweep Animation Effect

Apr 11, 2026

Guide to Creating a Border Light Sweep Animation Effect

How do you build that important element users notice at a glance using pure CSS? It is actually not that hard. The trick is just taking a slightly roundabout path. In this article, I will walk you through how to build a border light sweep animation from scratch, and also share a few of the pitfalls we ran into while building HagiCode.

Background

If you work on the frontend, you have probably had this experience before: a product manager walks over with that “this is definitely simple” expression and says, “Can we add some kind of special effect to this running task so users can spot it immediately?”

You say sure, we can just change the border color. Then they shake their head with that look that says, “You do not get it.” They reply, “That is not obvious enough. I want the kind of effect where light runs around the border, like in a sci-fi movie.”

At that point you might start wondering how to build it. Canvas? SVG? Or can CSS handle it on its own? After all, nobody wants to admit they do not know how.

In modern web applications, border light sweep animations are actually very common. They are mainly used in a few scenarios like these:

Status indicators: Marking tasks in progress or active items
Visual focus: Highlighting important content areas
Brand enhancement: Creating a sleek, modern, tech-forward visual style
Seasonal themes: Building a celebratory atmosphere for special occasions

We ran into exactly this requirement while building HagiCode. Users needed to see at a glance which sessions were running and which proposals were currently being processed. We tried several different approaches. Some paths were smoother, some were a bit more winding. In the end, we settled on a fairly mature implementation strategy.

About HagiCode

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI-driven coding assistant project, and the interface makes extensive use of border light animations to indicate different runtime states. Examples include the running state of items in the session list, status transitions in the proposal flow diagram, and intensity indicators for throughput.

These effects are not especially complicated in principle, but we definitely ran into plenty of pitfalls while implementing them. If you want to see the real thing, you can visit our GitHub repository or head to the official website. In the end, what matters most is what actually works.

Core Implementation Ideas

After analyzing the HagiCode codebase, we summarized several core implementation patterns below. Each one fits a different scenario, or in other words, each one exists for a reason.

1. Rotating glow with a conic gradient (most common)

This is the classic way to implement a border light sweep effect. The core idea is to use CSS conic-gradient to create a conic gradient, then rotate it continuously. Like a streetlight turning in the night, it just keeps circling.

Key elements:

Use the ::before pseudo-element to create the glow layer
Use conic-gradient to define the gradient color distribution
Use the ::after pseudo-element to mask the center area (optional)
Use @keyframes to implement the rotation animation

2. Side glow line

This works well for status indicators in list items. You create a thin glowing line on one side of the element instead of animating the entire border. Sometimes a little light is enough. You do not need to illuminate the whole world.

Key elements:

A thin absolutely positioned line element
Use box-shadow to create the glow effect
Use scale and opacity for a breathing animation

3. Glowing background with box-shadow

If you do not need the full sweep effect and just want a soft background glow, layering multiple box-shadow values is enough. Sometimes the simpler option is the better one.

4. Accessibility support

This part is easy to overlook, but it is extremely important. Every animation should account for the prefers-reduced-motion media query and provide a static alternative for users who do not want animation. Not everyone enjoys constant motion, and respecting that preference matters.

Implementation Options

Option 1: Rotating conic-gradient border (recommended)

This is the most complete implementation of the sweeping border light effect, and it is also the option we use most often in HagiCode. After all, if something works well, why replace it?

/* Parent container */
.glow-border-container {
  position: relative;
  overflow: hidden;
}

/* Rotating glow layer */
.glow-border-container::before {
  content: '';
  position: absolute;
  top: -50%;
  left: -50%;
  width: 200%;
  height: 200%;
  background: conic-gradient(
    transparent 0deg,
    rgba(59, 130, 246, 0.6) 60deg,
    rgba(59, 130, 246, 0.3) 120deg,
    rgba(59, 130, 246, 0.6) 180deg,
    transparent 240deg
  );
  animation: border-rotate 3s linear infinite;
  z-index: -1;
}

/* Mask layer (optional, for creating a hollow border effect) */
.glow-border-container::after {
  content: '';
  position: absolute;
  inset: 2px;
  background: inherit;
  border-radius: inherit;
  z-index: -1;
}

@keyframes border-rotate {
  from {
    transform: rotate(0deg);
  }
  to {
    transform: rotate(360deg);
  }
}

The principle behind this option is fairly simple: create a pseudo-element larger than the parent container, draw a conic gradient on it, and rotate it continuously. The parent container uses overflow: hidden, so only the light passing around the border remains visible. It is a bit like watching a streetlight through a window. You only ever see the small slice that passes by.

Option 2: Simplified rotating light border

If you do not need the full effect, HagiCode also includes a lighter utility-class version. Sometimes the simpler approach really is better.

/* Rotating light border utility class */
.running-light-border {
  position: absolute;
  inset: -2px;
  background: conic-gradient(
    from 0deg,
    transparent 0deg 270deg,
    var(--theme-running-color) 270deg 360deg
  );
  border-radius: inherit;
  animation: lightRayRotate 3s linear infinite;
  will-change: transform;
  z-index: 0;
}

@keyframes lightRayRotate {
  from {
    transform: rotate(0deg);
  }
  to {
    transform: rotate(360deg);
  }
}

/* Accessibility support */
@media (prefers-reduced-motion: reduce) {
  .running-light-border {
    animation: none;
  }
}

Notice the will-change: transform here. It tells the browser, “This element is going to keep changing,” so the browser can prepare some optimizations ahead of time and keep the animation smoother. Preparing in advance is usually better than scrambling at the last minute.

Option 3: Side glow line

This is especially suitable for list-item status indicators, and it is exactly what the HagiCode session list uses. One thin line can still stand out among many items. That feels like a life lesson in its own way.

.side-glow {
  position: relative;
  isolation: isolate;
}

.side-glow::before {
  content: '';
  position: absolute;
  left: 0;
  top: 14px;
  bottom: 14px;
  width: 1px;
  border-radius: 999px;
  background: var(--theme-running-color);
  box-shadow:
    0 0 16px var(--theme-running-color),
    0 0 28px var(--theme-running-color);
  z-index: 1;
  pointer-events: none;
  animation: sidePulse 2.6s ease-in-out infinite;
}

.side-glow > * {
  position: relative;
  z-index: 2;
}

@keyframes sidePulse {
  0%, 100% {
    opacity: 0.55;
    transform: scaleY(0.96);
  }
  50% {
    opacity: 0.95;
    transform: scaleY(1);
  }
}

This uses isolation: isolate to create a new stacking context, then relies on z-index to control the display order of each layer. pointer-events: none is also essential. Otherwise the pseudo-element would block user clicks. Some things can look nice, but they still should not get in the way.

Option 4: Wrap it in a React component

If your project uses React, you can wrap this logic in a component, especially the accessibility handling. Write it once, use it many times. That is the whole point.

import React from 'react';
import { useReducedMotion } from 'framer-motion';
import styles from './GlowBorder.module.css';

interface GlowBorderProps {
  isActive: boolean;
  children: React.ReactNode;
  className?: string;
}

export const GlowBorder = React.memo<GlowBorderProps>(
  ({ isActive, children, className = '' }) => {
    const prefersReducedMotion = useReducedMotion();

    if (!isActive) {
      return <div className={className}>{children}</div>;
    }

    if (prefersReducedMotion) {
      return (
        <div className={`${styles.glowStatic} ${className}`}>
          {children}
        </div>
      );
    }

    return (
      <div className={`${styles.glowAnimated} ${className}`}>
        {children}
      </div>
    );
  }
);

The matching CSS module:

/* Animated version */
.glowAnimated {
  position: relative;
  overflow: hidden;
}

.glowAnimated::before {
  content: '';
  position: absolute;
  top: -50%;
  left: -50%;
  width: 200%;
  height: 200%;
  background: conic-gradient(
    from 0deg,
    transparent,
    rgba(59, 130, 246, 0.6),
    transparent,
    rgba(59, 130, 246, 0.6),
    transparent
  );
  animation: rotateGlow 3s linear infinite;
  z-index: -1;
}

.glowAnimated::after {
  content: '';
  position: absolute;
  inset: 2px;
  background: inherit;
  border-radius: inherit;
  z-index: -1;
}

/* Static version (accessibility) */
.glowStatic {
  position: relative;
  border: 1px solid rgba(59, 130, 246, 0.5);
  box-shadow: 0 0 15px rgba(59, 130, 246, 0.3);
}

@keyframes rotateGlow {
  from {
    transform: rotate(0deg);
  }
  to {
    transform: rotate(360deg);
  }
}

The useReducedMotion hook from framer-motion automatically detects the user’s system preference. If the user has enabled reduced motion, it returns true, and the component shows the static version instead. Respecting the user’s preference matters more than forcing a flashy effect.

Practical Lessons Learned

These are some of the lessons we learned while building HagiCode. You could also call them battle scars. Hopefully they help you avoid some detours.

1. Theme variable system

CSS variables make multi-theme support especially convenient. Nobody wants to edit a pile of code every time the theme changes.

:root {
  --glow-color-light: rgb(16, 185, 129);
  --glow-color-dark: rgb(16, 185, 129);
  --theme-glow-color: var(--glow-color-light);
}

html.dark {
  --theme-glow-color: var(--glow-color-dark);
}

/* Usage */
.glow-effect {
  background: var(--theme-glow-color);
  box-shadow: 0 0 20px var(--theme-glow-color);
}

That way, switching themes only requires changing the class on the html element, and every animation color updates automatically. One codebase, two styles. That is exactly what we want.

2. Performance optimization

Use will-change to hint the browser to optimize:

.animated-glow {
  will-change: transform, opacity;
}

Tell the browser in advance, and it will help you optimize. A lot of things in life work better with a little preparation.

Avoid using complex box-shadows on large elements:

/* Not ideal - using a blurred shadow on a large element */
.large-card {
  box-shadow: 0 0 50px rgba(0, 0, 0, 0.5);
}

/* Better - use a pseudo-element to limit the glowing area */
.large-card::before {
  content: '';
  position: absolute;
  inset: 0;
  border-radius: inherit;
  box-shadow: 0 0 20px var(--glow-color);
  pointer-events: none;
}

We tested this in HagiCode. Adding a blurry shadow directly to a large card dropped scrolling frame rates below 30fps. Switching to a pseudo-element brought things back to a steady 60fps. Users can absolutely feel that difference.

3. Accessibility

You really should not skip this. Some users find animation dizzying or distracting, and respecting their preferences is part of building a good product. Beautiful things do not need to be imposed on everyone.

CSS media query:

@media (prefers-reduced-motion: reduce) {
  .glow-animation {
    animation: none;
  }

  .glow-animation::before {
    /* Provide a static fallback */
    opacity: 1;
  }
}

Detect user preference in React:

import { useReducedMotion } from 'framer-motion';

const Component = () => {
  const prefersReducedMotion = useReducedMotion();

  return (
    <div className={prefersReducedMotion ? 'static-glow' : 'animated-glow'}>
      Content
    </div>
  );
};

4. Intensity level control

The Token throughput indicator in HagiCode shows different glow colors based on real-time throughput, and this is implemented dynamically. Different states should be expressed differently.

const colors = [
  null,       // Level 0 - no color
  '#3b82f6',  // Level 1 - Blue
  '#34d399',  // Level 2 - Emerald
  '#facc15',  // Level 3 - Yellow
  '#fbbf24',  // Level 4 - Amber
  '#f97316',  // Level 5 - Orange
  '#22d3ee',  // Level 6 - Cyan
  '#d946ef',  // Level 7 - Fuchsia
  '#f43f5e',  // Level 8 - Rose
];

const IntensityGlow = ({ intensity }) => {
  const glowColor = colors[Math.min(intensity, colors.length - 1)];

  return (
    <div
      className="glow-effect"
      style={{
        '--glow-color': glowColor,
        opacity: 0.6 + (intensity * 0.08),
      }}
    />
  );
};

5. Things to watch out for

There are still a few details worth paying attention to, because by the time you discover these problems the hard way, it is already too late.

Things to watch out for	Explanation
z-index management	The glow layer should use an appropriate `z-index` so it does not interfere with content interaction
pointer-events	The glow pseudo-element should set `pointer-events: none`
Boundary overflow	The parent container needs `overflow: hidden`, or you need to adjust pseudo-element sizing
Performance impact	Complex animations can hurt performance on mobile devices, so test carefully
Dark mode	Make sure the glow color remains clearly visible on dark backgrounds
Theme switching	Use CSS variables so animation colors update correctly when the theme changes

6. Debugging tips

Pseudo-elements can be a little hard to locate in developer tools, so you can temporarily add a border to check the position.

/* Temporarily show pseudo-element boundaries for debugging */
.glow-effect::before {
  /* debug: border: 1px solid red; */
}

After you finish positioning it, remember to comment out or remove that line. Otherwise production can get awkward pretty quickly. Some things are better left in development.

Summary

Border light sweep animations are neither especially hard nor truly trivial. At the core, the formula is conic-gradient plus rotation, but if you want good performance, maintainability, and accessibility, there are still plenty of implementation details to handle carefully.

HagiCode hit a lot of these pitfalls and gradually distilled a set of best practices. That is just how projects go: you experiment, make mistakes, and improve one step at a time. If you are building something similar, I hope this article helps you avoid a few unnecessary detours.

Some things only become clear once you build them yourself.

References

HagiCode SessionRunningBorderHighlight component
HagiCode ProposalFlowDiagram.css styles
The .running-light-border utility class in HagiCode globals.css
MDN - conic-gradient
MDN - prefers-reduced-motion

Copyright Notice

Thank you for reading. If you found this article useful, feel free to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final version was reviewed and approved by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-11-border-light-animation-effect/
Copyright notice: Unless otherwise stated, all blog posts on this site are licensed under BY-NC-SA. Please include attribution when reposting.

Building a Cross-Project Knowledge Base for the AI Era with the Vault System

Apr 10, 2026

Building a Cross-Project Knowledge Base for the AI Era with the Vault System

Learning by studying and reproducing real projects is becoming mainstream, but scattered learning materials and broken context make it hard for AI assistants to deliver their full value. This article introduces the Vault system design in the HagiCode project: through a unified storage abstraction layer, AI assistants can understand and access all learning resources, enabling true cross-project knowledge reuse.

Background

In fact, in the AI era, the way we learn new technologies is quietly changing. Traditional approaches like reading books and watching videos still matter, but “studying and reproducing projects” - deeply researching and learning from the code, architecture, and design patterns of excellent open source projects - is clearly becoming more efficient. Running and modifying high-quality open source projects directly is one of the fastest ways to understand real-world engineering practice.

But this approach also brings new challenges.

Learning materials are too scattered. Notes might live in Obsidian, code repositories may be spread across different folders, and an AI assistant’s conversation history becomes a separate data island. Every time you need AI help analyzing a project, you have to manually copy code snippets and organize context, which is quite tedious.

Context keeps getting lost. AI assistants cannot directly access local learning resources, so every conversation starts with re-explaining background information. The code repositories you study update quickly, and manual synchronization is error-prone. Worse still, knowledge is hard to share across multiple learning projects - the design patterns learned in project A are completely unknown to the AI when it works on project B.

At the core, these issues are all forms of “data islands.” If there were a unified storage abstraction layer that let AI assistants understand and access all learning resources, the problem would be solved.

To address these pain points, we made a key design decision while developing HagiCode: build a Vault system as a unified knowledge storage abstraction layer. The impact of that decision may be even greater than you expect - more on that shortly.

About HagiCode

The approach shared in this article comes from practical experience in the HagiCode project. HagiCode is an AI coding assistant based on the OpenSpec workflow. Its core idea is that AI should not only be able to “talk,” but also be able to “do” - directly operate on code repositories, execute commands, and run tests. GitHub: github.com/HagiCode-org/site

During development, we found that AI assistants need frequent access to many kinds of user learning resources: code repositories, notes, configuration files, and more. If users had to provide everything manually each time, the experience would be terrible. That led us to design the Vault system.

Core Design

Multi-Type Support

HagiCode’s Vault system supports four types, each corresponding to different usage scenarios:

Type	Purpose	Typical Scenario
`folder`	General-purpose folder type	Temporary learning materials, drafts
`coderef`	Designed specifically for studying code projects	Systematically learning an open source project
`obsidian`	Integrates with Obsidian note-taking software	Reusing an existing notes library
`system-managed`	Managed automatically by the system	Project configuration, prompt templates, and more

Among them, the coderef type is the most commonly used in HagiCode. It provides a standardized directory structure and AI-readable metadata descriptions for code-study projects. Why design this type specifically? Because studying an open source project is not as simple as “downloading code.” You also need to manage the code itself, learning notes, configuration files, and other content at the same time, and coderef standardizes all of that.

Persistent Storage Mechanism

The Vault registry is persisted to the file system as JSON:

_registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json");

This design may look simple, but it was carefully considered:

Simple and reliable. JSON is human-readable, making it easy to debug and modify manually. When something goes wrong, you can open the file directly to inspect the state or even repair it by hand - especially useful during development.

Reduced dependencies. File system storage avoids the complexity of a database. There is no need to install and configure an extra database service, which reduces system complexity and maintenance cost.

Concurrency-safe. SemaphoreSlim is used to guarantee thread safety. In an AI coding assistant scenario, multiple operations may access the Vault registry at the same time, so concurrency control is necessary.

AI Context Integration

The system’s core capability is that it can automatically inject Vault information into the context of AI proposals:

export function buildTargetVaultsText(
  vaults: VaultForText[],
  template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
  const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read');
  const editableVaults = vaults.filter((vault) => vault.accessType === 'write');

  const sections = [
    buildVaultSection(readOnlyVaults, template.reference),
    buildVaultSection(editableVaults, template.editable),
  ].filter(Boolean);

  return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}

This allows the AI assistant to automatically understand which learning resources are available, without requiring the user to provide context manually every time. It makes the HagiCode experience feel especially natural - tell the AI, “Help me analyze React concurrent rendering,” and it can automatically find the previously registered React learning Vault instead of asking you to paste code over and over again.

Access Control Mechanism

The system divides Vaults into two access types:

reference (read-only): AI can only use the content for analysis and understanding, without modifying it
editable (modifiable): AI can modify the content as needed for the task

This distinction tells the AI which content is “read-only reference” and which content it is allowed to modify, reducing the risk of accidental changes. For example, if you register an open source project’s Vault as learning material, you definitely do not want AI casually editing the code inside it - so mark it as reference. But if it is your own project Vault, you can mark it as editable and let AI help modify the code.

Practical Guide

Standardized Structure for a CodeRef Vault

For coderef Vaults, the system provides a standardized directory structure:

my-coderef-vault/
├── index.yaml          # vault metadata description
├── AGENTS.md           # operating guide for AI assistants
├── docs/               # stores learning notes and documentation
└── repos/              # manages referenced code repositories through Git submodules

What is the design philosophy behind this structure?

docs/ stores learning notes, using Markdown to record your understanding of the code, architecture analysis, and lessons from debugging. These notes are not only for you - AI can understand them too, and will automatically reference them when handling related tasks.

repos/ manages the studied repositories through Git submodules rather than by copying code directly. This has two benefits: first, it stays in sync with upstream, and a single git submodule update fetches the latest code; second, it saves space, because multiple Vaults can reference different versions of the same repository.

index.yaml contains Vault metadata so the AI assistant can quickly understand its purpose and contents. It is essentially a “self-introduction” for the Vault, so the AI knows what it is for the first time it sees it.

AGENTS.md is a guide written specifically for AI assistants, explaining how to handle the content inside the Vault. You can tell the AI things like: “When analyzing this project, focus on code related to performance optimization” or “Do not modify test files.”

Creating and Using a Vault

Creating a CodeRef Vault is simple:

const createCodeRefVault = async () => {
  const response = await VaultService.postApiVaults({
    requestBody: {
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/Users/developer/vaults/react-learning",
      gitUrl: "https://github.com/facebook/react.git"
    }
  });

  // The system will automatically:
  // 1. Clone the React repository to vault/repos/react
  // 2. Create the docs/ directory for notes
  // 3. Generate index.yaml metadata
  // 4. Create the AGENTS.md guide file

  return response;
};

Then reference this Vault in an AI proposal:

const proposal = composeProposalChiefComplaint({
  chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
  repositories: [
    { id: "react", gitUrl: "https://github.com/facebook/react.git" }
  ],
  vaults: [
    {
      id: "react-learning",
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/vaults/react-learning",
      accessType: "read"  // AI can only read, not modify
    }
  ],
  quickRequestText: "Pay special attention to the Fiber architecture and scheduler implementation"
});

Typical Usage Scenarios

Scenario 1: Systematically studying open source projects

Create a CodeRef Vault, manage the target repository through Git submodules, and record learning notes in the docs/ directory. AI can access both the code and the notes at the same time, providing more accurate analysis. Notes written while studying a module are automatically referenced by the AI when it later analyzes related code - like having an “assistant” that remembers your previous thinking.

Scenario 2: Reusing an Obsidian notes library

If you are already using Obsidian to manage notes, just register your existing Vault in HagiCode directly. AI can access your knowledge base without manual copy-paste. This feature is especially practical because many people have years of accumulated notes, and once connected, AI can “read” and understand that knowledge system.

Scenario 3: Cross-project knowledge reuse

Multiple AI proposals can reference the same Vault, enabling knowledge reuse across projects. For example, you can create a “design patterns learning Vault” that contains notes and code examples for many design patterns. No matter which project the AI is analyzing, it can refer to the content in that Vault - knowledge does not need to be accumulated repeatedly.

Path Safety Mechanism

The system strictly validates paths to prevent path traversal attacks:

private static string ResolveFilePath(string vaultRoot, string relativePath)
{
    var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot));
    var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath));
    if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase))
    {
        throw new BusinessException(VaultRelativePathTraversalCode,
            "Vault file paths must stay inside the registered vault root.");
    }
    return combinedPath;
}

This ensures all file operations stay within the Vault root directory and prevents malicious path access. Security is not something to take lightly. If an AI assistant is going to operate on the file system, the boundaries must be clearly defined.

Notes

When using the HagiCode Vault system, there are several things to pay special attention to:

Path safety: Make sure custom paths stay within the allowed scope, otherwise the system will reject the operation. This prevents accidental misuse and potential security risks.
Git submodule management: CodeRef Vaults are best managed with Git submodules instead of directly copying code. The benefits were covered earlier - keeping in sync and saving space. That said, submodules have their own workflow, so first-time users may need a little time to get familiar with them.
File preview limits: The system limits file size (256KB) and quantity (500 files), so oversized files need to be handled in batches. This limit exists for performance reasons. If you run into very large files, you can split them manually or process them another way.
Diagnostic information: Creating a Vault returns diagnostic information that can be used for debugging on failure. Check the diagnostics first when you run into issues - in most cases, that is where you will find the clue.

Conclusion

The HagiCode Vault system is fundamentally solving a simple but profound problem: how to let AI assistants understand and use local knowledge resources.

Through a unified storage abstraction layer, a standardized directory structure, and automated context injection, it delivers a knowledge management model of “register once, reuse everywhere.” Once a Vault is created, AI can automatically access and understand learning notes, code repositories, and documentation resources.

The experience improvement from this design is obvious. There is no longer any need to manually copy code snippets or repeatedly explain background information - the AI assistant becomes more like a teammate who truly understands the project and can provide more valuable help based on existing knowledge.

The Vault system shared in this article is a solution shaped through real trial and error and real optimization during HagiCode development. If you think this design is valuable, that says something about the engineering behind it - and HagiCode itself is worth checking out as well.

References

HagiCode GitHub: github.com/HagiCode-org/site
HagiCode website: hagicode.com
30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Docker Compose installation guide: docs.hagicode.com/installation/docker-compose
Desktop quick install: hagicode.com/desktop/

If this article helped you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the website to learn more: hagicode.com
Watch the hands-on demo video: www.bilibili.com/video/BV1pirZBuEzq/
Try one-click installation: docs.hagicode.com/installation/docker-compose
Desktop quick install: hagicode.com/desktop/

The public beta has started. Welcome to install it and give it a try.

Copyright Notice

Thank you for reading. If you found this article useful, please like, save, and share it. This content was created with AI-assisted collaboration, and the final version was reviewed and confirmed by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-10-vault-system-ai-knowledge-base/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please indicate the source when reprinting.

Edit DESIGN.md Directly in the Web Interface: From Idea to Implementation

Apr 9, 2026

Edit DESIGN.md Directly in the Web Interface: From Idea to Implementation

In the MonoSpecs project management system, DESIGN.md carries the architectural design and technical decisions of a project. But the traditional editing workflow forces users to jump out to an external editor. That fragmented experience is like being interrupted in the middle of reading a poem: the inspiration is gone, and so is the mood. This article shares the solution we put into practice in the HagiCode project: editing DESIGN.md directly in the web interface, with support for importing templates from an online design site. After all, who does not enjoy the feeling of completing everything in one flow?

Background

As the core carrier of project design documents, DESIGN.md holds key information such as architecture design, technical decisions, and implementation guidance. However, the traditional editing approach requires users to switch to an external editor such as VS Code, manually locate the physical path, and then edit the file. It is not especially complicated, but after repeating the process a few times, it becomes tiring.

The problems mainly show up in the following ways:

Fragmented workflow: users must constantly switch between the web management interface and a local editor, breaking the continuity of their workflow, much like having the music cut out in the middle of a song.
Hard to reuse: the design site already publishes a rich library of design templates, but they cannot be integrated directly into the project editing workflow. The good stuff exists, but you still cannot use it where you need it.
Missing experience loop: there is no closed loop for “preview-select-import,” so users must copy and paste manually, which increases the risk of mistakes.
Collaboration friction: keeping design documents and code implementation in sync becomes a high-friction process, which hurts team efficiency.

To solve these pain points, we decided to add direct editing support for DESIGN.md in the web interface and allow one-click template import from an online design site. It was not some earth-shaking decision. We simply wanted to make the development experience smoother.

About HagiCode

The solution shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI-driven coding assistant project, and during development we frequently need to maintain project design documents. To help the team collaborate more efficiently, we explored and implemented this online editing and import solution. There is nothing mysterious about it. We ran into a problem and worked out a way to solve it.

Technical Solution

Overall Architecture

This solution uses a frontend-backend separated architecture with a same-origin proxy, mainly composed of the following layers. In practice, the design can be summed up as “each part doing its own job”:

1. Frontend editor layer

// Core component: DesignMdManagementDrawer
// Responsibility: handle editing, saving, version conflict detection, and import flow

2. Backend service layer

// Location: repos/hagicode-core/src/PCode.Application/ProjectAppService.DesignMd.cs
// Responsibility: path resolution, file read/write, and version management

3. Same-origin proxy layer

// Location: repos/hagicode-core/src/PCode.Application/ProjectAppService.DesignMdSiteIndex.cs
// Responsibility: proxy design site resources, preview image caching, and security validation

Key Technical Decisions

Decision 1: Global Drawer Pattern

We use a single global drawer instead of local pop-up layers, with state managed through layoutSlice, which gives users a consistent experience across views (classic and kanban). No matter which view the user opens the editor from, they get the same interaction model. A consistent experience makes people feel more at ease instead of getting disoriented when they switch views.

Decision 2: Project-Scoped API

We mounted DESIGN.md-related endpoints under ProjectController, reusing the existing project permission boundary and avoiding the complexity of adding a separate controller. This makes permission handling clearer and also aligns with RESTful resource organization. Sometimes reuse is more meaningful than creating something new from scratch.

Decision 3: Version Conflict Detection

We derive an opaque version from the file system’s LastWriteTimeUtc, which gives us lightweight optimistic concurrency control. When multiple users edit the same file at once, the system can detect conflicts and prompt the user to refresh. This design does not block editing, while still protecting data consistency.

Decision 4: Same-Origin Proxy Pattern

We use IHttpClientFactory to proxy external design-site resources, avoiding both cross-origin issues and SSRF risks. This keeps the system secure while also simplifying frontend calls. You can hardly be too careful with security.

Core Implementation

1. Edit DESIGN.md Directly

Backend Implementation

The backend is mainly responsible for path resolution, file read/write, and version management. These tasks are basic, but indispensable, like the foundation of a house:

// Path resolution and security validation
private Task<string> ResolveDesignDocumentDirectoryAsync(string projectPath, string? repositoryPath)
{
    if (string.IsNullOrWhiteSpace(repositoryPath))
    {
        return Task.FromResult(Path.GetFullPath(projectPath));
    }
    return ValidateSubPathAsync(projectPath, repositoryPath);
}

// Version generation (based on file system timestamp and size)
private static string BuildDesignDocumentVersion(string path)
{
    var fileInfo = new FileInfo(path);
    fileInfo.Refresh();
    return string.Create(
        CultureInfo.InvariantCulture,
        $"{fileInfo.LastWriteTimeUtc.Ticks:x}-{fileInfo.Length:x}");
}

The version design is interesting in its simplicity: we use the file’s last modified time and size to generate a unique version identifier. It is lightweight and reliable, with no extra version database to maintain. Simple solutions are often the most effective.

Frontend Implementation

On the frontend, we implement dirty-state detection and save logic. This design helps users understand whether their changes have been saved and reduces the anxiety of “what if I lose it?”:

// Dirty-state detection and save logic
const [draft, setDraft] = useState('');
const [savedDraft, setSavedDraft] = useState('');
const isDirty = draft !== savedDraft;

const handleSave = useCallback(async () => {
    const result = await saveProjectDesignMdDocument({
        ...activeTarget,
        content: draft,
        expectedVersion: document.version, // optimistic concurrency control
    });
    setSavedDraft(draft); // update saved state
}, [activeTarget, document, draft]);

In this implementation, we maintain two pieces of state: draft is the content currently being edited, while savedDraft is the saved content. Comparing them tells us whether there are unsaved changes. The design is simple, but it gives people peace of mind. Nobody wants the thing they worked hard on to disappear.

2. Import Design Files from an Online Source

Directory Structure

repos/index/
└── src/data/public/design.json    # Design template index

repos/awesome-design-md-site/
├── vendor/awesome-design-md/       # Upstream design templates
│   └── design-md/
│       ├── clickhouse/
│       │   └── DESIGN.md
│       ├── linear/
│       │   └── DESIGN.md
│       └── ...
└── src/lib/content/
    └── awesomeDesignCatalog.ts     # Content pipeline

Index Data Format

The index file on the design site defines all available templates. With this index, users can choose the template they want as easily as ordering from a menu:

{
  "entries": [
    {
      "slug": "linear.app",
      "title": "Linear Inspired Design System",
      "summary": "AI Product / Dark Feel",
      "detailUrl": "/designs/linear.app/",
      "designDownloadUrl": "/designs/linear.app/DESIGN.md",
      "previewLightImageUrl": "...",
      "previewDarkImageUrl": "..."
    }
  ]
}

Each entry includes the template’s basic information and download links. The backend reads the list of available templates from this index and presents them for the user to choose from. That makes selection intuitive instead of forcing people to feel their way around in the dark.

Same-Origin Proxy Implementation

To keep things secure, the backend performs strict validation on access to the design site. You cannot be too cautious about security:

// Safe slug validation
private static readonly Regex SafeDesignSiteSlugRegex =
    new("^[A-Za-z0-9](?:[A-Za-z0-9._-]{0,127})$", RegexOptions.Compiled);

private static string NormalizeDesignSiteSlug(string slug)
{
    var normalizedSlug = slug?.Trim() ?? string.Empty;
    if (!IsSafeDesignSiteSlug(normalizedSlug))
    {
        throw new BusinessException(
            ProjectDesignSiteIndexErrorCodes.InvalidSlug,
            "Design site slug must be a single safe path segment.");
    }
    return normalizedSlug;
}

// Preview image caching (OS temp directory)
private static string ComputePreviewCacheKey(string slug, string theme, string previewUrl)
{
    var raw = $"{slug}|{theme}|{previewUrl}";
    var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(raw));
    return Convert.ToHexString(bytes).ToLowerInvariant();
}

We do two things here: first, we validate the slug format strictly with a regular expression to prevent path traversal attacks; second, we cache preview images to reduce pressure on the external site. The former is protection, the latter is optimization, and both matter.

3. Full Import Flow

// 1. Open the import drawer
const handleRequestImportDrawer = useCallback(() => {
    setIsImportDrawerOpen(true);
}, []);

// 2. Select and import
const handleImportRequest = useCallback((entry) => {
    if (isDirty) {
        setPendingImportEntry(entry);
        setConfirmMode('import'); // overwrite confirmation
        return;
    }
    void executeImport(entry);
}, [isDirty]);

// 3. Execute import
const executeImport = useCallback(async (entry) => {
    const result = await getProjectDesignMdSiteImportDocument(
        activeTarget.projectId,
        entry.slug
    );
    setDraft(result.content); // replace editor text only, do not save automatically
    setIsImportDrawerOpen(false);
}, [activeTarget?.projectId]);

The import flow follows a “user confirmation” principle: after import, only the editor content is updated, and nothing is saved automatically. Users can inspect the imported content and save it manually only after confirming it looks right. The final decision should stay in the hands of the user.

Practical Examples

Scenario 1: Creating DESIGN.md in the Project Root

When DESIGN.md does not exist, the backend returns a virtual document state. This lets the frontend avoid special handling for the “file does not exist” case, and a unified API simplifies the code logic:

return new ProjectDesignDocumentDto
{
    Path = targetPath,
    Exists = false,  // virtual document state
    Content = string.Empty,
    Version = null
};

// Automatically create the file on first save
public async Task<SaveProjectDesignDocumentResultDto> SaveDesignDocumentAsync(...)
{
    Directory.CreateDirectory(targetDirectory);
    await File.WriteAllTextAsync(targetPath, input.Content);
    return new SaveProjectDesignDocumentResultDto { Created = !exists };
}

The benefit of this design is that the frontend does not need special-case logic for missing files. By hiding that complexity in the backend, the frontend can focus more easily on user experience.

Scenario 2: Import a Template from the Design Site

After the user selects the “Linear” design template in the import drawer, the system fetches the DESIGN.md content through the backend proxy. The whole process is transparent to the user: they only choose a template, and the system handles the network requests and data transformation automatically.

// 1. The system fetches DESIGN.md content through the backend proxy
GET /api/project/{id}/design-md/site-index/linear.app

// 2. The backend validates the slug and fetches content from upstream
var entry = FindDesignSiteEntry(catalog, "linear.app");
using var upstreamResponse = await httpClient.SendAsync(request);
var content = await upstreamResponse.Content.ReadAsStringAsync();

// 3. The frontend replaces the editor text
setDraft(result.content);
// The user reviews it and then saves it manually to disk

The whole flow stays transparent to the user. They just choose a template, and the system handles the networking and transformation behind the scenes. That is the experience we want: simple, but powerful.

Scenario 3: Handling Version Conflicts

When multiple users edit the same DESIGN.md at the same time, the system detects version conflicts. This optimistic concurrency control mechanism preserves data consistency without blocking the user’s edits:

if (!string.Equals(currentVersion, expectedVersion, StringComparison.Ordinal))
{
    throw new BusinessException(
        ProjectDesignDocumentErrorCodes.VersionConflict,
        $"DESIGN.md at '{targetPath}' changed on disk.");
}

The frontend catches this error and prompts the user:

// Frontend prompts the user to refresh and retry
<Alert>
    <AlertTitle>Version conflict</AlertTitle>
    <AlertDescription>
        The file was modified by another process. Please refresh to get the latest version and try again.
    </AlertDescription>
</Alert>

This optimistic concurrency control mechanism keeps data consistent without blocking users while they work. Conflicts are unavoidable, but at least users should know what happened instead of silently losing their changes.

Notes and Best Practices

1. Path Security

Always validate repositoryPath to prevent path traversal attacks. You can never do too much when it comes to security:

// Always validate repositoryPath to prevent path traversal attacks
return ValidateSubPathAsync(projectPath, repositoryPath);
// Reject dangerous inputs such as "../" and absolute paths

2. Cache Strategy

Cache preview images for 24 hours, with a maximum of 160 files. Moderate caching improves performance, but balance still matters:

// Cache preview images for 24 hours, with a maximum of 160 files
private static readonly TimeSpan PreviewCacheTtl = TimeSpan.FromHours(24);
private const int PreviewCacheMaxFiles = 160;
// Periodically clean up expired cache

3. Error Handling

Gracefully degrade when the upstream site is unavailable. This design ensures that even if an external dependency fails, the core editing functionality still works normally:

// Gracefully degrade when the upstream site is unavailable
try {
    const catalog = await getProjectDesignMdSiteImportCatalog(projectId);
} catch (error) {
    toast.error(t('project.designMd.siteImport.feedback.catalogLoadFailed'));
    // The main editing drawer remains available
}

This graceful degradation ensures that even when external dependencies are unavailable, the core editing function continues to work. A system should be resilient instead of collapsing the moment something goes wrong.

4. User Experience Optimization

Confirm overwrites before importing, and do not save automatically after import. Users should stay in control of their own actions:

// Confirm overwrite before import
if (isDirty) {
    setConfirmMode('import');
    return;
}

// Do not save automatically after import; let the user confirm
setDraft(result.content); // update draft only
// The content is written to disk only after the user reviews it and clicks Save

5. Performance Considerations

Use an HTTP client factory to avoid creating too many connections. Resource management may seem small, but doing it well can make a big difference:

// Use an HTTP client factory to avoid creating too many connections
private const string DesignSiteProxyClientName = "ProjectDesignSiteProxy";
private static readonly TimeSpan DesignSiteProxyTimeout = TimeSpan.FromSeconds(8);

Suggested Extensions

Markdown enhancement: we currently use a basic Textarea, but we could upgrade to CodeMirror for syntax highlighting and keyboard shortcuts. When the editor feels better, writing documentation feels better too.
Preview mode: add real-time Markdown preview to improve the editing experience. What-you-see-is-what-you-get always gives people more confidence.
Diff merge: implement an intelligent merge algorithm instead of simple full-text replacement. Conflicts are inevitable, but the conflict-resolution process does not have to be painful.
Local caching: cache design.json in the database to reduce dependency on the external site. The fewer dependencies a system has, the more stable it tends to be.

Summary

In the HagiCode project, we implemented a complete online editing and import solution for DESIGN.md through frontend-backend collaboration. The core value of this solution lies in the following points:

Higher efficiency: no need to switch tools; editing and importing design documents can happen in one unified web interface.
Lower barrier to entry: one-click design template import helps new projects get started quickly.
Secure and reliable: path validation, version conflict detection, and graceful degradation mechanisms keep the system stable.
Better user experience: the global drawer, dirty-state detection, and confirmation dialogs refine the overall interaction experience.

This solution is already running in the HagiCode project and has solved the team’s pain points around design document management. If you are facing similar problems, I hope this article gives you some useful ideas. There is no particularly profound theory here, only the practical work of running into a problem and finding a way to solve it.

References

HagiCode project repository: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
MonoSpecs project management system: docs.hagicode.com
30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Docker Compose installation guide: docs.hagicode.com/installation/docker-compose
Desktop installation: hagicode.com/desktop/

If this article helped you, feel free to give the project a Star on GitHub. The public beta has already started, and you can join the experience right after installing it. Open-source projects always need more feedback and encouragement, and if you found this useful, it is worth helping more people discover it.

“Beautiful things or people do not have to belong to you. As long as they remain beautiful, it is enough to quietly appreciate that beauty.”

The same goes for a DESIGN.md editor. It does not need to be overly complex. If it helps you work efficiently, that is already enough.

Copyright Notice

Thank you for reading. If you found this article useful, please consider liking, bookmarking, and sharing it. This content was created with AI-assisted collaboration, and the final version was reviewed and approved by the author.

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-04-09-design-md-web-editor-implementation/
Copyright statement: Unless otherwise noted, all articles on this blog are licensed under BY-NC-SA. Please include attribution when reposting!

Design.md: A Solution for Consistent AI-Driven Frontend UI Design

Apr 8, 2026

Design.md: A Solution for Consistent AI-Driven Frontend UI Design

In the era of AI-assisted frontend development, how can you keep AI-generated UIs consistent? This article shares our hands-on experience building a design gallery site based on awesome-design-md, along with how to create a structured design.md to guide AI toward standardized UI design.

Background

Anyone who has used AI to write frontend code has probably had a similar experience: ask AI to generate the same page several times, and each result comes out in a different style. Sometimes corners are rounded, sometimes they are sharp. Sometimes spacing is 8px, and other times it becomes 16px. Even the same button can look different across different conversations.

This is not an isolated issue. As AI-assisted development becomes more common, the lack of consistency in AI-generated frontend UI has become a widespread problem. Different AI assistants, different prompts, and even the same assistant across different conversations can produce dramatically different interface designs. That creates a huge maintenance cost during product iteration.

The root cause is actually simple: there is no authoritative design reference document. Traditional CSS stylesheets can tell developers “how to implement” something, but they cannot fully communicate “why it is designed this way” or “which design pattern should be used in which scenario.” For AI, a clear and structured description is even more important for understanding design conventions.

At the same time, the open-source community already offers some excellent resources. The VoltAgent/awesome-design-md project collects design system documentation from many well-known companies. Each directory contains a README.md, a DESIGN.md, and preview HTML. However, all of that is scattered across the upstream repository, making it hard to browse and compare quickly.

So, can we consolidate those resources into an easy-to-browse design gallery, while also distilling a structured design.md for AI to use?

The answer is yes. Next, let me walk through our approach.

About HagiCode

The solution shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI-assisted development platform, and during development we ran into the same problem of inconsistent AI-generated UIs. To solve it, we built a design gallery site and created a standardized design.md. This article is a summary of that solution.

GitHub - HagiCode-org/site

First, take a look at the final homepage. It brings together the design gallery entry point, the site repository, the upstream repository, and background information about HagiCode in a single interface, making it easy for the team to establish a shared context before diving into specific entries.

Awesome Design MD Gallery homepage overview

Analysis

Before writing code, let us break down the technical challenges behind this problem.

Source Content Management: How Do You Unify Scattered Design Resources?

The upstream awesome-design-md repository contains a large number of design documents, but we needed a way to bring them into our own project.

Solution: use git submodule

awesome-design-md-site
└── vendor/awesome-design-md          # Upstream resources (git submodule)

This gives us several benefits:

Version control: we can pin a specific upstream version
Offline builds: no need to request external APIs during the build
Content review: specific changes are visible in PRs

Data Normalization: How Do You Standardize Different Document Structures?

Different companies structure their design documents differently. Some are missing preview files, and some use inconsistent naming. We need to normalize them during the build process.

Solution: scan and generate normalized entries at build time

The core module is awesomeDesignCatalog.ts, responsible for:

Scanning the vendor/awesome-design-md/design-md/* directory
Validating whether each entry contains the required files (README.md, DESIGN.md, and at least one preview file)
Extracting and rendering Markdown content into HTML
Generating normalized entry data

export interface DesignEntry {
  slug: string;
  title: string;
  summary: string;
  readmeHtml: string;
  designHtml: string;
  previewLight?: string;
  previewDark?: string;
  searchText: string;
}

export async function scanSourceEntries() {
  // Scan vendor/awesome-design-md/design-md/*
  // Validate file completeness
  // Generate normalized entries
}

export async function normalizeDesignEntry(dir: string) {
  // Extract README.md and DESIGN.md
  // Parse preview files
  // Render Markdown to HTML
}

Static Site Architecture: How Do You Provide Dynamic Search While Staying Fully Static?

Since this is a design gallery, search is a must-have. But Astro is a static site generator, so how do you implement real-time search?

Solution: React island + URL query parameter sync

export function SearchToolbar() {
  const [query, setQuery] = useState('');

  // Sync with the URL
  useEffect(() => {
    const params = new URLSearchParams(window.location.search);
    setQuery(params.get('q') || '');
  }, []);

  // Filter in real time
  const filtered = entries.filter(entry =>
    entry.searchText.includes(query)
  );

  return <input value={query} onChange={e => {
    setQuery(e.target.value);
    updateURL(e.target.value);
  }} />;
}

The advantage of this approach is that it keeps the deployability of a static site intact, meaning it can be deployed to any static hosting service, while still delivering an instant filtering experience.

Design Documentation: How Do You Help AI Understand and Follow Design Standards?

This is the core of the entire solution. We need to create a structured design.md that AI can understand and apply.

Solution: borrow the structure of ClickHouse DESIGN.md

ClickHouse’s DESIGN.md is an excellent reference. It includes:

Visual Theme & Atmosphere
Color Palette & Roles
Typography Rules
Component Stylings
Layout Principles
Depth & Elevation
Do’s and Don’ts
Responsive Behavior
Agent Prompt Guide

Our approach is: reuse the structure, rewrite the content. We keep the section structure of ClickHouse DESIGN.md, but replace the content with the actual design tokens and component conventions used in our own project.

Solution

Based on the analysis above, our solution consists of four core modules.

1. Content Ingestion Pipeline

This is the foundation of the whole system, responsible for extracting and normalizing content from upstream resources.

export async function scanSourceEntries(): Promise<DesignEntry[]> {
  const designDir = 'vendor/awesome-design-md/design-md';
  const entries: DesignEntry[] = [];

  for (const dir of await fs.readdir(designDir)) {
    const entryPath = path.join(designDir, dir);
    if (await isValidDesignEntry(entryPath)) {
      const entry = await normalizeDesignEntry(entryPath);
      entries.push(entry);
    }
  }

  return entries;
}

async function isValidDesignEntry(dir: string): Promise<boolean> {
  const requiredFiles = ['README.md', 'DESIGN.md'];
  for (const file of requiredFiles) {
    if (!(await fileExists(path.join(dir, file)))) {
      return false;
    }
  }
  return true;
}

2. Gallery Browsing Interface

The gallery interface includes three main parts:

Homepage: displays a card grid of all design entries, and each card includes:

Design entry title and summary
Preview image, if available
Quick-search highlighting

Detail page: aggregates the full information for a single design entry:

README document
DESIGN document
Preview with light/dark theme switching
Navigation to adjacent entries

Navigation: supports returning to the gallery and browsing adjacent entries

The homepage gallery uses a high-density card layout, flattening design.md entries from different sources into a unified visual framework so that teams can quickly compare brand styles, button patterns, and typographic rhythm.

Awesome Design MD Gallery design card grid

After opening a specific entry, the detail page places the design summary and live preview on the same page, reducing the cost of switching back and forth among documentation, previews, and source code.

Awesome Design MD Gallery design detail preview page

3. Search Functionality

The search feature is based on client-side filtering, with state preserved through URL query parameters:

function SearchToolbar({ entries }: { entries: DesignEntry[] }) {
  const [query, setQuery] = useState('');
  const [results, setResults] = useState(entries);

  useEffect(() => {
    const params = new URLSearchParams(window.location.search);
    const initialQuery = params.get('q') || '';
    setQuery(initialQuery);
    filterEntries(initialQuery);
  }, []);

  const filterEntries = (searchQuery: string) => {
    const filtered = entries.filter(entry =>
      entry.searchText.toLowerCase().includes(searchQuery.toLowerCase())
    );
    setResults(filtered);
  };

  const handleChange = (e: React.ChangeEvent<HTMLInputElement>) => {
    const value = e.target.value;
    setQuery(value);
    filterEntries(value);

    // Update the URL without triggering a page refresh
    const newUrl = value
      ? `${window.location.pathname}?q=${encodeURIComponent(value)}`
      : window.location.pathname;
    window.history.replaceState({}, '', newUrl);
  };

  return (
    <div className="search-toolbar">
      <input
        type="text"
        value={query}
        onChange={handleChange}
        placeholder="Search design entries..."
      />
      <span className="result-count">{results.length} results</span>
    </div>
  );
}

4. Design Reference Document (`design.md`)

This is the core deliverable of the whole solution. We create a design.md in the project root with the following structure:

In addition to the raw design.md content consumed by AI, we also place both the README and DESIGN documents into the same reading interface, making it easier for people to proofread, copy snippets, and compare them against the preview results.

Awesome Design MD Gallery README and DESIGN document page

# Design Reference for [Project Name]

## 1. Visual Theme & Atmosphere
- Overall style description
- Design philosophy and principles

## 2. Color Palette & Roles
- Primary and supporting colors
- Semantic colors (`success`, `warning`, `error`)
- CSS variable definitions

## 3. Typography Rules
- Font families
- Type scale (`h1-h6`, `body`, `small`)
- Line height and font weight

## 4. Component Stylings
- Button style conventions
- Form component styles
- Card and container styles

## 5. Layout Principles
- Spacing system
- Grid and breakpoints
- Alignment principles

## 6. Depth & Elevation
- Shadow levels
- `z-index` conventions

## 7. Do's and Don'ts
- Common mistakes and correct approaches

## 8. Responsive Behavior
- Breakpoint definitions
- Responsive adaptation rules

## 9. Agent Prompt Guide
- How to use this document in AI prompts
- Example prompt templates

Practice

Now that we have covered the solution, how do you actually implement it?

Implementation Steps

Step 1: Initialize the submodule

# Add the upstream repository as a submodule
git submodule add https://github.com/VoltAgent/awesome-design-md.git vendor/awesome-design-md

# Initialize and update the submodule
git submodule update --init --recursive

Step 2: Create the content pipeline

Implement awesomeDesignCatalog.ts, including:

File scanning and validation logic
Markdown rendering using Astro’s built-in renderer
Entry data extraction

Step 3: Build the gallery UI

Use Astro + React Islands to create:

Homepage gallery layout (card grid)
Design card components
Search toolbar
Detail page layout

Step 4: Write the design document

Based on the structure of ClickHouse DESIGN.md, fill in the actual design tokens from your own project. Update README.md and add a link to design.md.

Notes

Security: Markdown rendering requires filtering unsafe HTML. Astro’s built-in renderer filters script tags by default, but you still need to watch for XSS risks.

Performance: A large number of iframe previews may affect first-paint performance. It is recommended to use loading="lazy" to lazy-load preview content.

Maintainability: design.md needs to stay in sync with the code implementation. It is recommended to add CI checks to ensure that CSS variables remain consistent between documentation and code.

Accessibility: Make sure color contrast meets the WCAG AA standard (at least 4.5:1).

AI Usage Guide

After creating design.md, how do you get AI to actually use it? Here are a few practical tips:

Tip 1: Reference it explicitly in the prompt

Please refer to the design.md file in the project root and use the design conventions defined there to implement the following components:
- Buttons: use the primary color with an 8px border radius
- Cards: use the elevation-2 shadow level

Tip 2: Require AI to reference specific CSS variables

Implement a navigation bar with the following requirements:
- Use --color-bg-primary for the background color
- Use --color-border-subtle for borders
- Use --text-color-primary for text

Tip 3: Include design.md content in the system prompt

If your AI tool supports custom system prompts, you can add the core content of design.md directly to it.

Testing Strategy

Content pipeline testing:

Missing-file scenarios (missing README.md or DESIGN.md)
Format error scenarios (Markdown parsing failure)
Empty-directory scenarios

Search feature testing:

Empty result handling
Special characters such as Chinese and emoji
URL sync verification

UI component testing:

Light/dark theme switching
Responsive layout
Preview loading states

Deployment Workflow

# 1. Update the submodule to the latest version
git submodule update --remote

# 2. Rebuild the site
npm run build

# 3. Deploy static assets
npm run deploy

It is recommended to automate submodule updates plus build and deployment, so that CI can be triggered automatically whenever the upstream repository is updated.

Conclusion

The inconsistency in AI-generated UIs that HagiCode encountered during development was, at its core, caused by the lack of a structured design reference document. By building a design gallery site and creating a standardized design.md, we successfully solved this problem.

The core value of this solution lies in:

Unified resources: consolidating scattered design system documentation
Structured standards: expressing design conventions in a format AI can understand
Continuous maintenance: keeping content up to date through git submodule

If you are also using AI for frontend development, this approach is worth trying. Creating a structured design.md not only improves the consistency of AI-generated code, but also helps your team maintain unified design standards internally.

References

VoltAgent/awesome-design-md - Collection of design system documentation
ClickHouse DESIGN.md - Reference structure for design documentation
Astro Official Documentation - Static site generation framework
HagiCode source code - Production implementation of this solution

If this article helped you:

Give it a like so more people can find it
Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute live demo: www.bilibili.com/video/BV1pirZBuEzq/
Install with one click: docs.hagicode.com/installation/docker-compose
Quick install for Desktop: hagicode.com/desktop/
The public beta has started, and you are welcome to try it

Copyright Notice

Thank you for reading. If you found this article useful, feel free to like, save, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and approved by the author.

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-04-08-design-md-consistent-ui/
License: Unless otherwise noted, all blog posts on this site are published under the BY-NC-SA license. Please include attribution when reposting.

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Apr 7, 2026

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

It is kind of funny when you think about it: the era of AI programming has arrived, and the Agent Skills we keep on hand are becoming more and more numerous. But along with that comes more and more hassle. This article is about how we used skillsbase to solve those problems.

Background

In the age of AI programming, developers need to maintain an increasing number of Agent Skills - reusable instruction sets that extend the capabilities of coding assistants such as Claude Code, OpenCode, and Cursor. However, as the number of skills grows, a practical problem gradually emerges:

It is not exactly a major problem, but once you have too many things, managing them becomes troublesome.

Skills are scattered across different locations, making management costly

Local skills are scattered in multiple places: ~/.agents/skills/, ~/.claude/skills/, ~/.codex/skills/.system/, and so on
Different locations may have naming conflicts, for example skill-creator existing in both the user directory and the system directory
There is no unified management entry point, which makes backup and migration difficult

This part is genuinely annoying. Sometimes you do not even know where a certain skill actually is. It feels like losing something and then struggling to find it.

Lack of a standardized maintenance workflow

Manually copying skills is error-prone and makes it difficult to trace their origins
Without a unified validation mechanism, there is no guarantee that the skill repository remains complete
During team collaboration, synchronizing and sharing a skill collection is difficult

Manual work is always prone to mistakes. Human memory is limited, after all. Who can remember where every single thing came from?

Failing to meet reproducibility requirements

When switching development machines, all skills need to be configured again
In CI/CD environments, the skill repository cannot be validated and synchronized automatically

Changing to a different computer means doing everything all over again. It feels, in a way, just like moving house - troublesome every single time. You have to adapt to the new environment and reconfigure everything again.

To address these pain points, we tried many different approaches: from manual copying to scripted automation, from directly managing directories to globally installing and then recovering files. Each approach had its own flaws. Some could not guarantee consistency, some polluted the environment, and some were hard to use in CI.

We definitely took quite a few detours.

In the end, we found a more elegant solution: skillsbase. The core idea behind this approach is to install and validate locally first, then convert the structure and write it into the repository, and finally uninstall the temporary files. This ensures that the repository contents match the actual installation result while avoiding pollution of the global environment.

It sounds simple when you put it that way, but we only figured it out after stepping into quite a few pitfalls.

About HagiCode

The solution shared in this article comes from our hands-on experience in the HagiCode project.

HagiCode is an AI coding assistant project. During development, we need to maintain a large number of Agent Skills to extend various coding capabilities. These real-world needs are exactly what pushed us to build the skillsbase toolset for standardized management of skill repositories.

This was not invented out of thin air. We were pushed into it by real needs. Once the number of skills grows, management naturally becomes necessary. When problems appear during management, solutions become necessary too. Step by step, that is how we got here.

If you are interested in HagiCode, you can visit the official website to learn more or check the source code on GitHub.

Analysis

Technical challenges

To build a maintainable skills collection repository, the following core problems need to be solved:

Unified namespace conflicts: when multiple sources contain skills with the same name, how do we avoid overwriting them?
Source traceability: how do we record the source of each skill for future updates and audits?
Synchronization and validation: how do we ensure that repository contents stay consistent with the actual installation results?
Automation integration: how do we integrate with CI/CD workflows to enable automatic synchronization and validation?

These problems may look simple, but every single one of them is a headache. Then again, what worthwhile work is ever easy?

Design trade-offs

Option 1: Copy directories directly

Pros: simple to implement Cons: cannot guarantee consistency with the actual installation result of the skills CLI

We did think about this approach. Later, however, we realized that the CLI may apply some preprocessing logic during installation. Direct copying skips that step. As a result, what you copy is not the same as what is actually installed, and that becomes a problem.

Option 2: Install globally and then recover

Pros: the installation process can be validated Cons: pollutes the execution environment, and it is hard to keep CI and local results consistent

This approach is even worse. A global installation pollutes the environment. More importantly, it is difficult to keep the CI environment consistent with the local environment, which leads to the classic “works on my machine, fails in CI” problem. Anyone who has dealt with that knows how painful it is.

Option 3: Local install -> convert -> uninstall (final solution)

This is the approach adopted by skillsbase:

First install skills into a temporary location with npx skills
Convert the directory structure and add source metadata
Write the result into the target repository
Finally uninstall the temporary files

This approach ensures that repository contents are consistent with the actual installation results seen by consumers, avoids polluting the global environment, standardizes the conversion process, and supports idempotent operations.

This solution was not obvious from the beginning either. We simply learned through enough trial and error what works and what does not.

Architecture decisions

Decision Item	Choice	Reason
Runtime	Node.js ESM	No build step required; `.mjs` is enough to orchestrate the file system
Configuration format	YAML (`sources.yaml`)	Highly readable and suitable for manual maintenance
Naming strategy	Namespace prefix	User skills keep their original names, while system skills receive the `system-` prefix
Workflow	`add` updates the manifest -> `sync` executes synchronization	A single synchronization engine avoids implementing the same rules twice
File management	Managed file markers	Add a comment header to support safe overwrites

These decisions all come down to one goal: making things simple. Simplicity wins in the end.

Solution

CLI architecture

The skillsbase CLI provides four core commands:

skillsbase
├── init          # Initialize repository structure
├── sync          # Synchronize skill content
├── add           # Add new skills
└── github_action # Generate GitHub Actions configuration

There are not many commands, but they are enough. A tool only needs to be useful.

Core workflow

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    init     │───▶│    add      │───▶│    sync     │───▶│github_action│
│ initialize  │    │ add source  │    │ sync content│    │ generate CI │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Take it one step at a time. No need to rush.

Synchronization flow design

sources.yaml -> parse sources -> npx skills install -> convert structure -> write to skills/ -> uninstall temporary files
                              ↓
                        .skill-source.json (source metadata)

This workflow is fairly clear. At least when I look at it, I can understand what each step is doing.

Repository structure

repos/skillsbase/
├── sources.yaml              # Source manifest (single source of truth)
├── skills/                   # Skills directory
│   ├── frontend-design/      # User skill
│   ├── skill-creator/        # User skill
│   └── system-skill-creator/ # System skill (with prefix)
├── scripts/
│   ├── sync-skills.mjs       # Synchronization script
│   └── validate-skills.mjs   # Validation script
├── docs/
│   └── maintainer-workflow.md # Maintainer documentation
└── .github/
    ├── workflows/
    │   └── skills-sync.yml   # CI workflow
    └── actions/
        └── skillsbase-sync/
            └── action.yml     # Reusable Action

There are quite a few files, but that is fine. Once the structure is organized clearly, maintenance becomes much easier.

Practice

Initialize a skills repository

# 1. Create an empty repository
mkdir repos/myskills && cd repos/myskills
git init

# 2. Initialize it with skillsbase
npx skillsbase init

# Output:
# [1/4] create manifest ................. done
# [2/4] create scripts .................. done
# [3/4] create docs ..................... done
# [4/4] create github workflow .......... done
#
# next: skillsbase add <skill-name>

This step generates a lot of files, but there is no need to worry - they are all generated automatically. After that, you can start adding skills.

Add skills

# Add a single skill (this automatically triggers synchronization)
npx skillsbase add frontend-design --source vercel-labs/agent-skills

# Add from a local source
npx skillsbase add documentation-writer --source /home/user/.agents/skills

# Output:
# source: first-party ......... updated
# target: skills/frontend-design ... synced
# status: 1 skill added, 0 removed

Adding a skill is very simple. One command is enough. Sometimes, though, you may hit unexpected issues such as poor network conditions or permission problems. Those are manageable - just take them one at a time.

Synchronize skills

# Perform synchronization (reconcile all sources)
npx skillsbase sync

# Only check for drift (do not modify files)
npx skillsbase sync --check

# Allow missing sources (CI scenario)
npx skillsbase sync --allow-missing-sources

During synchronization, the system checks every source defined in sources.yaml and reconciles them with the contents under the skills/ directory. If differences exist, it updates them; if there are no differences, it skips them. This prevents the “configuration changed but files did not” problem.

Generate CI configuration

# Generate workflow
npx skillsbase github_action --kind workflow

# Generate action
npx skillsbase github_action --kind action

# Generate everything
npx skillsbase github_action --kind all

The CI configuration is generated automatically as well. You still need to adjust some details yourself, such as trigger conditions and runtime environments, but that is not difficult.

`sources.yaml` configuration example

# Skills root directory configuration
skillsRoot: skills/
metadataFile: .skill-source.json

# Source definitions
sources:
  # First-party: local user skills
  first-party:
    type: local
    path: /home/user/.agents/skills
    naming: original  # Keep original name
    includes:
      - documentation-writer
      - frontend-design
      - skill-creator

  # System: skills provided by the system
  system:
    type: local
    path: /home/user/.codex/skills/.system
    naming: prefix-system  # Add system- prefix
    includes:
      - imagegen
      - openai-docs
      - skill-creator  # Becomes system-skill-creator

  # Remote: third-party repository
  vercel:
    type: remote
    url: vercel-labs/agent-skills
    naming: original
    includes:
      - web-design-guidelines

This configuration file is the core of the entire system. All sources are defined here. Change this file, and the next synchronization will apply the new state. In that sense, it is truly a “single source of truth.”

`.skill-source.json` metadata example

{
  "source": "first-party",
  "originalPath": "/home/user/.agents/skills/documentation-writer",
  "originalName": "documentation-writer",
  "targetName": "documentation-writer",
  "syncedAt": "2026-04-07T00:00:00.000Z",
  "version": "unknown"
}

Every skill directory contains this file, recording its source information. That way, when something goes wrong later, you can quickly locate where it came from and when it was synchronized.

Security and validation

# Validate repository structure
node scripts/validate-skills.mjs

# Validate with the skills CLI
npx skills add . --list

# Check for updates
npx skills check

Validation is one of those things that can feel both important and optional. Still, for the sake of safety, it never hurts to run it from time to time. After all, you never know when something unexpected might happen.

GitHub Actions integration

name: Skills Sync

on:
  push:
    paths:
      - 'sources.yaml'
      - 'skills/**'
  workflow_dispatch:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - name: Validate repository
        run: |
          npx skills add . --list
          node scripts/validate-skills.mjs
      - name: Sync check
        run: npx skillsbase sync --check

Once CI integration is in place, every change to sources.yaml or the skills/ directory automatically triggers validation. That prevents the situation where changes were made locally but synchronization was forgotten.

Best practices

Handle naming conflicts: add the system- prefix to system skills consistently. This keeps every skill available while avoiding naming conflicts.
Idempotent operations: all commands support repeated execution, and running sync multiple times does not produce side effects. This is especially important in CI.
Managed files: generated files include the # Managed by skillsbase CLI comment, making them easy to identify and manage. These files can be safely overwritten, and manual modifications are not preserved.
Non-interactive mode: CI environments use deterministic behavior by default, so interactive prompts do not interrupt execution. All configuration is declared through sources.yaml.
Source traceability: every skill has a .skill-source.json file recording its source information, making troubleshooting much faster.

Team collaboration

# Team members install the shared skills repository
npx skills add your-org/myskills -g --all

# Clone locally and validate
git clone https://github.com/your-org/myskills.git
cd myskills
npx skills add . --list

By managing the skills repository with Git, team members can easily synchronize their skill collection and ensure that everyone uses the same versions of tools and configuration.

This is especially useful in team collaboration. You no longer run into situations where “it works for me but not for you.” Once the environment is unified, half the problems disappear.

Conclusion

The core value of using skillsbase to maintain a skills collection repository lies in the following:

Security: source validation, conflict detection, and managed file protection
Maintainability: a unified entry point, idempotent operations, and configuration-as-documentation
Standardization: a unified directory structure, naming conventions, and metadata format
Automation: CI/CD integration, automatic synchronization, and automatic validation

With this approach, developers can manage their own Agent Skills the same way they manage npm packages, building a reproducible, shareable, and maintainable skills repository system.

The tools and workflow shared in this article are exactly what we refined through real mistakes and real optimization while building HagiCode. If you find this approach valuable, that is a good sign that our engineering direction is the right one - and that HagiCode itself is worth your attention as well.

After all, good tools deserve to be used by more people.

References

skillsbase repository: github.com/HagiCode-org/skillsbase
HagiCode official website: hagicode.com
HagiCode source code: github.com/HagiCode-org/site
Installation guide: docs.hagicode.com/installation/docker-compose
Quick desktop installation: hagicode.com/desktop/

If this article helped you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Try one-click installation: docs.hagicode.com/installation/docker-compose
Public beta has started, and you are welcome to try it out

This article was first published on the HagiCode Blog.

Copyright Notice

Thank you for reading. If you found this article useful, you are welcome to like it, save it, and share it in support. This content was created with AI-assisted collaboration, and the final version was reviewed and confirmed by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-07-why-use-skillsbase-for-skills-repository/
Copyright notice: Unless otherwise stated, all blog posts on this site are licensed under BY-NC-SA. Please indicate the source when reposting.

How to Reproduce Projects in the AI Era: Vault, a Cross-Project Persistent Storage System

Apr 6, 2026

How to Reproduce Projects in the AI Era: Vault, a Cross-Project Persistent Storage System

In the era of AI-assisted development, how can we help AI assistants better understand our learning resources? The HagiCode project built the Vault system as a unified knowledge storage abstraction layer that AI can understand, greatly improving the efficiency of learning through project reproduction.

Background

In the AI era, the way developers learn new technologies and architectures is changing profoundly. “Reproducing projects” - that is, deeply studying and learning from the code, architecture, and design patterns of excellent open source projects - has become an efficient way to learn. Compared with traditional methods like reading books or watching videos, directly reading and running high-quality open source projects helps you understand real-world engineering practices much faster.

Still, this learning method comes with quite a few challenges.

Learning materials are too scattered. Your notes may live in Obsidian, code repositories may be scattered across different folders, and your AI assistant’s conversation history becomes yet another isolated data island. When you want AI to help analyze a project, you have to manually copy code snippets and organize context, which is rather tedious.

What is even more troublesome is the broken context. AI assistants cannot directly access your local learning resources, so you have to provide background information again in every conversation. On top of that, reproduced code repositories update quickly, manual syncing is error-prone, and knowledge is hard to share across multiple learning projects.

At the root, all of these problems come from “data islands.” If there were a unified storage abstraction layer that allowed AI assistants to understand and access all your learning resources, the problem would be solved neatly.

About HagiCode

The Vault system shared in this article is exactly the solution we developed while building HagiCode. HagiCode is an AI coding assistant project, and in our daily development work we often need to study and refer to many different open source projects. To help AI assistants better understand these learning resources, we designed Vault, a cross-project persistent storage system.

This solution has already been validated in HagiCode in real use. If you are facing similar knowledge management challenges, I hope these experiences can offer some inspiration. After all, once you’ve fallen into a few pits yourself, you should leave something behind for the next person.

Vault system design philosophy

The core idea of the Vault system is simple: create a unified knowledge storage abstraction layer that AI can understand. From an implementation perspective, the system has several key characteristics.

Multi-type support

The system supports four vault types, each corresponding to a different usage scenario:

// folder: general-purpose folder type
export const DEFAULT_VAULT_TYPE = 'folder';

// coderef: a type specifically for reproduced code projects
export const CODEREF_VAULT_TYPE = 'coderef';

// obsidian: integrated with Obsidian note-taking software
export const OBSIDIAN_VAULT_TYPE = 'obsidian';

// system-managed: vault automatically managed by the system
export const SYSTEM_MANAGED_VAULT_TYPE = 'system-managed';

Among them, the coderef type is the most commonly used in HagiCode. It is specifically designed for reproduced code projects, providing a standardized directory structure and AI-readable metadata descriptions.

Persistent storage mechanism

The Vault registry is stored persistently in JSON format, ensuring that the configuration remains available after the application restarts:

public class VaultRegistryStore : IVaultRegistryStore
{
    private readonly string _registryFilePath;

    public VaultRegistryStore(IConfiguration configuration, ILogger<VaultRegistryStore> logger)
    {
        var dataDir = configuration["DataDir"] ?? "./data";
        var absoluteDataDir = Path.IsPathRooted(dataDir)
            ? dataDir
            : Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), dataDir));

        _registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json");
    }
}

The advantage of this design is that it is simple and reliable. JSON is human-readable, which makes debugging and manual editing easier; filesystem storage avoids the complexity of a database and reduces system dependencies. After all, sometimes the simplest option really is the best one.

AI context integration

Most importantly, the system can automatically inject vault information into the context of AI proposals:

export function buildTargetVaultsText(
  vaults: VaultForText[],
  template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
  const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read');
  const editableVaults = vaults.filter((vault) => vault.accessType === 'write');

  if (readOnlyVaults.length === 0 && editableVaults.length === 0) {
    return '';
  }

  const sections = [
    buildVaultSection(readOnlyVaults, template.reference),
    buildVaultSection(editableVaults, template.editable),
  ].filter(Boolean);

  return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}

This enables an important capability: AI assistants can automatically understand the available learning resources without users manually providing context. You could say that counts as a kind of tacit understanding.

The standardized structure of CodeRef Vault

For the coderef type of vault, HagiCode provides a standardized directory structure:

my-coderef-vault/
├── index.yaml          # vault metadata description
├── AGENTS.md           # operating guide for AI assistants
├── docs/               # stores study notes and documents
└── repos/              # manages reproduced code repositories through Git submodules

When creating a vault, the system automatically initializes this structure:

private async Task EnsureCodeRefStructureAsync(
    string vaultName,
    string physicalPath,
    ICollection<VaultBootstrapDiagnosticDto> diagnostics,
    CancellationToken cancellationToken)
{
    Directory.CreateDirectory(physicalPath);

    var indexPath = Path.Combine(physicalPath, CodeRefIndexFileName);
    var docsPath = Path.Combine(physicalPath, CodeRefDocsDirectoryName);
    var reposPath = Path.Combine(physicalPath, CodeRefReposDirectoryName);

    // Create the standard directory structure
    if (!Directory.Exists(docsPath))
    {
        Directory.CreateDirectory(docsPath);
    }

    if (!Directory.Exists(reposPath))
    {
        Directory.CreateDirectory(reposPath);
    }

    // Create the AGENTS.md guide
    await EnsureCodeRefAgentsDocumentAsync(physicalPath, cancellationToken);

    // Create the index.yaml metadata
    await WriteCodeRefIndexDocumentAsync(indexPath, mergedDocument, cancellationToken);
}

This structure is carefully designed as well:

docs/ stores your study notes, where you can record your understanding of the code, architecture analysis, lessons learned, and so on in Markdown
repos/ manages reproduced repositories through Git submodules instead of copying code directly, which keeps the code in sync and saves space
index.yaml contains the vault metadata so AI assistants can quickly understand the purpose and contents of the vault
AGENTS.md is a guide written specifically for AI assistants, explaining how to handle the contents of the vault

Organized this way, perhaps AI can understand what you have in mind a little more easily.

Automatic initialization for system-managed vaults

In addition to manually created vaults, HagiCode also supports system-managed vaults:

public async Task<IReadOnlyList<VaultRegistryEntry>> EnsureAllSystemManagedVaultsAsync(
    CancellationToken cancellationToken = default)
{
    var definitions = GetAllResolvedDefinitions();
    var entries = new List<VaultRegistryEntry>(definitions.Count);

    foreach (var definition in definitions)
    {
        entries.Add(await EnsureResolvedSystemManagedVaultAsync(definition, cancellationToken));
    }

    return entries;
}

The system automatically creates and manages the following vaults:

hagiprojectdata: project data storage used to save project configuration and state
personaldata: personal data storage used to save user preferences
hbsprompt: a prompt template library used to manage commonly used AI prompts

These vaults are initialized automatically when the system starts, so users do not need to configure them manually. Some things are simply better left to the system instead of humans worrying about them.

Access control mechanism

An important part of the design is access control. The system divides vaults into two access types:

export interface VaultForText {
  id: string;
  name: string;
  type: string;
  physicalPath: string;
  accessType: 'read' | 'write';  // Key: distinguish read-only from editable
}

reference (read-only): AI is only used for analysis and understanding and cannot modify content. Suitable for referenced open source projects, documents, and similar materials
editable (editable): AI can modify content as needed for the task. Suitable for your notes, drafts, and similar materials

This distinction matters. It tells AI which content is “read-only reference” and which content is “safe to edit,” reducing the risk of accidental changes. After all, nobody wants their hard work to disappear because of an unintended edit.

In practice: creating and using Vault

Now that we’ve covered the ideas, let’s look at how it works in practice.

Create a CodeRef Vault

Here is a complete frontend call example:

const createCodeRefVault = async () => {
  const response = await VaultService.postApiVaults({
    requestBody: {
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/Users/developer/vaults/react-learning",
      gitUrl: "https://github.com/facebook/react.git"
    }
  });

  // The system will automatically:
  // 1. Clone the React repository into vault/repos/react
  // 2. Create the docs/ directory for notes
  // 3. Generate the index.yaml metadata
  // 4. Create the AGENTS.md guide file

  return response;
};

This API call completes a series of actions: creating the directory structure, initializing Git submodules, generating metadata files, and more. You only need to provide the basic information and let the system handle the rest. It is honestly a fairly worry-free approach.

Use Vault in an AI proposal

After creating the vault, you can reference it in an AI proposal:

const proposal = composeProposalChiefComplaint({
  chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
  repositories: [
    { id: "react", gitUrl: "https://github.com/facebook/react.git" }
  ],
  vaults: [
    {
      id: "react-learning",
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/vaults/react-learning",
      accessType: "read"  // AI can only read, not modify
    }
  ],
  quickRequestText: "Focus on the Fiber architecture and scheduler implementation"
});

The system automatically injects vault information into the AI context, letting AI know which learning resources are available. When AI can understand what you have in mind, that kind of tacit understanding is hard to come by.

Best practices and things to watch for

While using the Vault system, we have summarized a few lessons learned.

Path safety

The system strictly validates paths to prevent path traversal attacks:

private static string ResolveFilePath(string vaultRoot, string relativePath)
{
    var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot));
    var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath));
    if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase))
    {
        throw new BusinessException(VaultRelativePathTraversalCode,
            "Vault file paths must stay inside the registered vault root.");
    }
    return combinedPath;
}

This is important. If you customize a vault path, make sure it stays within the allowed range, otherwise the system will reject the operation. You really cannot overemphasize security.

Git submodule management

CodeRef Vault recommends Git submodules instead of directly copying code:

private static string BuildCodeRefAgentsContent()
{
    return """
    # CodeRef Vault Guide

    Repositories under `repos/` should be maintained through Git submodules
    rather than copied directly into the vault root.

    Keep this structure stable so assistants and tools can understand the vault quickly.
    """ + Environment.NewLine;
}

This brings several advantages: keeping code synchronized with upstream, saving disk space, and making it easier to manage multiple versions of the code. After all, who wants to download the same thing again and again?

File preview limits

To prevent performance problems, the system limits file size and type:

private const int FileEnumerationLimit = 500;
private const int PreviewByteLimit = 256 * 1024;  // 256KB

If your vault contains a large number of files or very large files, preview performance may be affected. In that case, you can consider processing files in batches or using specialized search tools. Sometimes when something gets too large, it becomes harder to handle, not easier.

Diagnostic information

When creating a vault, the system returns diagnostic information to help with debugging:

List<VaultBootstrapDiagnosticDto> bootstrapDiagnostics = [];

if (IsCodeRefVaultType(normalizedType))
{
    bootstrapDiagnostics = await EnsureCodeRefBootstrapAsync(
        normalizedName,
        normalizedPhysicalPath,
        normalizedGitUrl,
        cancellationToken);
}

If creation fails, you can inspect the diagnostic information to understand the specific cause. When something goes wrong, checking the diagnostics is often the most direct way forward.

Summary

Through a unified storage abstraction layer, the Vault system solves several core pain points of reproducing projects in the AI era:

Centralized knowledge management: all learning resources are gathered in one place instead of scattered everywhere
Automatic AI context injection: AI assistants can automatically understand the available learning resources without manual context setup
Cross-project knowledge reuse: knowledge can be shared and reused across multiple learning projects
Standardized directory structure: a consistent directory layout lowers the learning curve

This solution has already been validated in the HagiCode project. If you are also building tools related to AI-assisted development, or facing similar knowledge management problems, I hope these experiences can serve as a useful reference.

In truth, the value of a technical solution does not lie in how complicated it is, but in whether it solves real problems. The core idea of the Vault system is very simple: build a unified knowledge storage layer that AI can understand. Yet it is precisely this simple abstraction that improved our development efficiency quite a bit.

Sometimes the simple approach really is the best one. After all, complicated things often hide even more pitfalls…

References

HagiCode project: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
HagiCode installation docs: docs.hagicode.com/installation/docker-compose
Obsidian official website: obsidian.md
Git submodule documentation: git-scm.com/docs/gitsubmodules

If this article helped you, feel free to give the project a Star on GitHub, or visit the official website to learn more about HagiCode. The public beta has already started, and you can experience the full AI coding assistant features as soon as you install it.

Maybe you should give it a try as well…

Copyright notice

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-06-vault-persistent-storage-for-ai-era/
Copyright notice: Unless otherwise stated, all blog posts on this site are licensed under BY-NC-SA. Please include attribution when reprinting.

Progressive Disclosure: Improving Human-Computer Interaction in AI Products with the Less Is More Philosophy

Apr 5, 2026

Progressive Disclosure: Improving Human-Computer Interaction in AI Products with the “Less Is More” Philosophy

In AI product design, the quality of user input often determines the quality of the output. This article shares a “progressive disclosure” interaction approach we practiced in the HagiCode project. Through step-by-step guidance, intelligent completion, and instant feedback, it turns users’ brief and vague inputs into structured technical proposals, significantly improving human-computer interaction efficiency.

Background

Anyone building AI products has probably seen this situation: a user opens your app and enthusiastically types a one-line request, only for the AI to return something completely off target. It is not that the AI is not smart enough. The user simply did not provide enough information. Mind-reading is hard for anyone.

This issue became especially obvious while we were building HagiCode. HagiCode is an AI-driven coding assistant where users describe requirements in natural language to create technical proposals and sessions. In actual use, we found that user input often has these problems:

Inconsistent input quality: some users type only a few words, such as “optimize login” or “fix bug”, without the necessary context
Inconsistent technical terminology: different users use different terms for the same thing; some say “frontend” while others say “FE”
Missing structured information: there is no project background, repository scope, or impact scope, even though these are critical details
Repeated problems: the same types of requests appear again and again, and each time they need to be explained from scratch

The direct result is predictable: the AI has a harder time understanding the request, proposal quality becomes unstable, and the user experience suffers. Users think, “This AI is not very good,” while we feel unfairly blamed. If you give me only one sentence, how am I supposed to guess what you really want?

In truth, this is understandable. Even people need time to understand one another, and machines are no exception.

To solve these pain points, we made a bold decision: introduce the design principle of “progressive disclosure” to improve human-computer interaction. The changes this brought were probably larger than you would imagine. To be honest, we did not expect it to be this effective at the time.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant designed to help developers complete tasks such as code writing, technical proposal generation, and code review through natural language interaction. Project link: github.com/HagiCode-org/site.

We developed this progressive disclosure approach through multiple rounds of iteration and optimization during real product development. If you find it valuable, that at least suggests our engineering is doing something right. In that case, HagiCode itself may also be worth a look. Good tools are meant to be shared.

What Is Progressive Disclosure

“Progressive disclosure” is a design principle from the field of HCI (human-computer interaction). Its core idea is simple: do not show users all information and options at once. Instead, reveal only what is necessary step by step, based on the user’s actions and needs.

This principle is especially suitable for AI products because AI interaction is naturally progressive. The user says a little, the AI understands a little, then the user adds more, and the AI understands more. It is very similar to how people communicate with each other: understanding usually develops gradually.

In HagiCode’s scenario, we applied progressive disclosure in four areas:

1. Description optimization mechanism: let AI help you say things more clearly

When a user enters a short description, we do not send it directly to the AI for interpretation. Instead, we first trigger a “description optimization” flow. The core of this flow is “structured output”: converting the user’s free text into a standard format. It is like stringing loose pearls into a necklace so everything becomes easier to understand.

The optimized description must include the following standard sections:

Background: the problem background and context
Analysis: technical analysis and reasoning
Solution: the solution and implementation steps
Practice: concrete code examples and notes

At the same time, we automatically generate a Markdown table showing information such as the target repository, paths, and edit permissions, making subsequent AI operations easier. A clear directory always makes things easier to find.

Here is the actual implementation:

// Core method in ProposalDescriptionMemoryService.cs
public async Task<string> OptimizeDescriptionAsync(
    string title,
    string description,
    string locale = "zh-CN",
    DescriptionOptimizationMemoryContext? memoryContext = null,
    CancellationToken cancellationToken = default)
{
    // Build query parameters
    var queryContext = BuildQueryContext(title, description);

    // Retrieve historical context
    var memoryContext = await RetrieveHistoricalContextAsync(queryContext, cancellationToken);

    // Generate a structured prompt
    var prompt = await BuildOptimizationPromptAsync(
        title,
        description,
        memoryContext,
        cancellationToken);

    // Call AI for optimization
    return await _aiService.CompleteAsync(prompt, cancellationToken);
}

The key to this flow is “memory injection”. We inject historical context such as project conventions, similar cases, and negative patterns into the prompt, allowing the AI to reference past experience during optimization. Experience should not go to waste.

Notes:

Make sure the current input takes priority over historical memory, so explicitly specified user information is not overridden
HagIndex references must be treated as factual sources and must not be altered by historical cases
Low-confidence correction suggestions should not be injected as strong constraints

2. Voice input capability: speaking is more natural than typing

In addition to text input, we also support voice input. This is especially useful for describing complex requirements. Typing a technical request can take minutes, while saying it out loud may take only a few dozen seconds.

The key design focus for voice input is “state management”. Users must clearly know what state the system is currently in. We defined the following states:

Idle: the system is ready and recording can start
Waiting upstream: the system is connecting to the backend service
Recording: the user’s voice is being recorded
Processing: speech is being converted to text
Error: an error occurred and needs user attention

The frontend state model looks roughly like this:

interface VoiceInputState {
  status: 'idle' | 'waiting-upstream' | 'recording' | 'processing' | 'error';
  duration: number;
  error?: string;
  deletedSet: Set<string>; // Fingerprint set of deleted results
}

// State transition when recording starts
const handleVoiceInputStart = async () => {
  // Enter waiting state first and show a loading animation
  setState({ status: 'waiting-upstream' });

  // Wait for backend readiness confirmation
  const isReady = await waitForBackendReady();
  if (!isReady) {
    setState({ status: 'error', error: 'Backend service is not ready' });
    return;
  }

  // Start recording
  setState({ status: 'recording', startTime: Date.now() });
};

// Handle recognition results
const handleRecognitionResult = (result: RecognitionResult) => {
  const fingerprint = normalizeFingerprint(result.text);

  // Check whether it has already been deleted
  if (state.deletedSet.has(fingerprint)) {
    return; // Skip deleted content
  }

  // Merge the result into the text box
  appendResult(result);
};

There is an important detail here: we use a “fingerprint set” to manage deletion synchronization. When speech recognition returns multiple results, users may delete some of them. We store the fingerprints of deleted content so that if the same content appears again later, it is skipped automatically. It is essentially a way to remember what the user has already rejected.

3. Prompt management system: externalize the AI’s “brain”

HagiCode has a flexible prompt management system in which all prompts are stored as files:

prompts/
├── metadata/
│   ├── optimize-description.zh-CN.json
│   └── optimize-description.en-US.json
└── templates/
    ├── optimize-description.zh-CN.hbs
    └── optimize-description.en-US.hbs

Each prompt consists of two parts:

Metadata file (.json): defines information such as the prompt scenario, version, and parameters
Template file (.hbs): the actual prompt content, written with Handlebars syntax

The metadata file format looks like this:

{
  "scenario": "optimize-description",
  "locale": "zh-CN",
  "version": "1.0.0",
  "syntax": "handlebars",
  "syntaxVersion": "1.0",
  "parameters": [
    {
      "name": "title",
      "type": "string",
      "required": true,
      "description": "Proposal title"
    },
    {
      "name": "description",
      "type": "string",
      "required": true,
      "description": "Original description"
    }
  ],
  "author": "HagiCode Team",
  "description": "Optimize the user's technical proposal description",
  "lastModified": "2026-04-05",
  "tags": ["optimization", "nlp"]
}

The template file uses Handlebars syntax and supports parameter injection:

You are a technical proposal expert.

<task>
Generate a structured technical proposal description based on the following information.
</task>

<input>
<title>{{title}}</title>
<description>{{description}}</description>
{{#if memoryContext}}
<memory_context>
{{memoryContext}}
</memory_context>
{{/if}}
</input>

<output_format>
## Background
[Describe the problem background and context, including project information, repository scope, and so on]

## Analysis
[Provide the technical analysis and reasoning process, and explain why this change is needed]

## Solution
[Provide the solution and implementation steps, listing the key code locations]

## Practice
[Provide concrete code examples and notes]
</output_format>

The benefits of this design are clear:

prompts can be version-controlled just like code
multiple languages are supported and can be switched automatically based on user preference
the parameterized design allows context to be injected dynamically
completeness can be validated at startup, avoiding runtime errors

If knowledge stays only in someone’s head, it is easy to lose. Recording it in a structured way from the beginning is much safer.

4. Progressive wizard: split complex tasks into small steps

For complex tasks, such as first-time installation and configuration, we use a multi-step wizard design. Each step requests only the necessary information and provides clear progress indicators. Large tasks become much more manageable when handled one step at a time.

The wizard state model:

interface WizardState {
  currentStep: number;           // 0-3, corresponding to 4 steps
  steps: WizardStep[];
  canGoNext: boolean;
  canGoBack: boolean;
  isLoading: boolean;
  error: string | null;
}

interface WizardStep {
  id: number;
  title: string;
  description: string;
  completed: boolean;
}

// Step navigation logic
const goToNextStep = () => {
  if (wizardState.currentStep < wizardState.steps.length - 1) {
    // Validate input for the current step
    if (validateCurrentStep()) {
      wizardState.currentStep++;
      wizardState.steps[wizardState.currentStep - 1].completed = true;
    }
  }
};

const goToPreviousStep = () => {
  if (wizardState.currentStep > 0) {
    wizardState.currentStep--;
  }
};

Each step has its own validation logic, and completed steps receive clear visual markers. Canceling opens a confirmation dialog to prevent users from losing progress through accidental actions.

Conclusion

Looking back at HagiCode’s progressive disclosure practice, we can summarize several core principles:

Step-by-step guidance: break complex tasks into smaller steps and request only the necessary information at each stage
Intelligent completion: use historical context and project knowledge to fill in information automatically
Instant feedback: give every action clear visual feedback and status hints
Fault-tolerance mechanisms: allow users to undo and reset so mistakes do not lead to irreversible loss
Input diversity: support multiple input methods such as text and voice

In HagiCode, the practical result of this approach was clear: the average length of user input increased from fewer than 20 characters to structured descriptions of 200-300 characters, the quality of AI-generated proposals improved significantly, and user satisfaction increased along with it.

This is not surprising. The more information users provide, the more accurately the AI can understand them, and the better the results it can return. In that sense, it is not very different from communication between people.

If you are also building AI-related products, I hope these experiences offer some useful inspiration. Remember: users do not necessarily refuse to provide information. More often, the product has not yet asked the right questions in the right way. The core of progressive disclosure is finding the best timing and form for those questions.

References

HagiCode project: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
Progressive disclosure design principle: Wikipedia - Progressive Disclosure
Handlebars template engine: handlebarsjs.com

If this article helped you, feel free to give us a Star on GitHub and follow the continued development of the HagiCode project. Public beta has already started, and you can experience the full feature set right now by installing it:

GitHub: github.com/HagiCode-org/site
Official website: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
One-click installation with Docker Compose: docs.hagicode.com/installation/docker-compose
Quick installation for the Desktop app: hagicode.com/desktop/

Copyright Notice

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-05-progressive-disclosure-hci/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please cite the source when reprinting.

AI Output Token Optimization: Practicing an Ultra-Minimal Classical Chinese Mode

Apr 4, 2026

AI Output Token Optimization: Practicing an Ultra-Minimal Classical Chinese Mode

In AI application development, token consumption directly affects cost. In the HagiCode project, we implemented an “ultra-minimal Classical Chinese output mode” through the SOUL system. Without sacrificing information density, it reduces output tokens by roughly 30-50%. This article shares the implementation details of that approach and the lessons we learned using it.

Background

In AI application development, token consumption is an unavoidable cost issue. This becomes especially painful in scenarios where the AI needs to produce large amounts of content. How do you reduce output tokens without sacrificing information density? The more you think about it, the more frustrating the problem can get.

Traditional optimization ideas mostly focus on the input side: trimming system prompts, compressing context, or using more efficient encoding. But these methods eventually hit a ceiling. Push compression too far, and you start hurting the AI’s comprehension and output quality. That is basically just deleting content, which is not very meaningful.

So what about the output side? Could we get the AI to express the same meaning more concisely?

The question sounds simple, but there is quite a bit hidden beneath it. If you directly ask the AI to “be concise,” it may really give you only a few words. If you add “keep the information complete,” it may drift back to the original verbose style. Constraints that are too strong hurt usability; constraints that are too weak do nothing. Where exactly is the balance point? No one can say for sure.

To solve these pain points, we made a bold decision: start from language style itself and design a configurable, composable constraint system for expression. The impact of that decision may be even larger than you expect. I will get into the details shortly, and the result may surprise you a little.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project.

HagiCode is an open-source AI coding assistant that supports multiple AI models and custom configuration. During development, we discovered that AI output token usage was too high, so we designed a solution for it. If you find this approach valuable, that probably says something good about our engineering work. And if that is the case, HagiCode itself may also be worth your attention. Code does not lie.

SOUL System Overview

The full name of the SOUL system is Soul Oriented Universal Language. It is the configuration system used in the HagiCode project to define the language style of an AI Hero. Its core idea is simple: by constraining how the AI expresses itself, it can output content in a more concise linguistic form while preserving informational completeness.

It is a bit like putting a linguistic mask on the AI… though honestly, it is not quite that mystical.

Technical Architecture

The SOUL system uses a frontend-backend separated architecture:

Frontend (Soul Builder):

Built with React + TypeScript + Vite
Located in the repos/soul/ directory
Provides a visual Soul building interface
Supports bilingual use (zh-CN / en-US)

Backend:

Built on .NET (C#) + the Orleans distributed runtime
The Hero entity includes a Soul field (maximum 8000 characters)
Injects Soul into the system prompt through SessionSystemMessageCompiler

Agent Templates generation:

Generated from reference materials
Output to the /agent-templates/soul/templates/ directory
Includes 50 main Catalog groups and 10 orthogonal dimensions

Soul Injection Mechanism

When a Session executes for the first time, the system reads the Hero’s Soul configuration and injects it into the system prompt:

sequenceDiagram
    participant UI as User Interface
    participant Session as SessionGrain
    participant Hero as Hero Repository
    participant AI as AI Executor

    UI->>Session: Send message (bind Hero)
    Session->>Hero: Read Hero.Soul
    Session->>Session: Cache Soul snapshot
    Session->>AI: Build AIRequest (inject Soul)
    AI-->>Session: Execution result
    Session-->>UI: Stream response

The injected system prompt format is:

<hero_soul>
[User-defined Soul content]
</hero_soul>

This injection mechanism is implemented in SessionSystemMessageCompiler.cs:

internal static string? BuildSystemMessage(
    string? existingSystemMessage,
    string? languagePreference,
    IReadOnlyList<HeroTraitDto>? traits,
    string? soul)
{
    var segments = new List<string>();

    // ... language preference and Traits handling ...

    var normalizedSoul = NormalizeSoul(soul);
    if (!string.IsNullOrWhiteSpace(normalizedSoul))
    {
        segments.Add($"<hero_soul>\n{normalizedSoul}\n</hero_soul>");
    }

    // ... other system messages ...

    return segments.Count == 0 ? null : string.Join("\n\n", segments);
}

Once you have seen the code and understood the principle, that is really all there is to it.

Ultra-Minimal Classical Chinese Mode

Ultra-minimal Classical Chinese mode is the most representative token-saving strategy in the SOUL system. Its core principle is to use the high semantic density of Classical Chinese to compress output length while preserving complete information.

Why Classical Chinese

Classical Chinese has several natural advantages:

Semantic compression: the same meaning can be expressed with fewer characters.
Redundancy removal: Classical Chinese naturally omits many conjunctions and particles common in modern Chinese.
Concise structure: each sentence carries high information density, making it well suited as a vehicle for AI output.

Here is a concrete example:

Modern Chinese output (about 80 characters):

Based on your code analysis, I found several issues. First, on line 23, the variable name is too long and should be shortened. Second, on line 45, you did not handle null values and should add conditional logic. Finally, the overall code structure is acceptable, but it can be further optimized.

Ultra-minimal Classical Chinese output (about 35 characters, saving 56%):

Code reviewed: line 23 variable name verbose, abbreviate; line 45 lacks null handling, add checks. Overall structure acceptable; minor tuning suffices.

The gap is large enough to make you stop and think.

Soul Configuration Template

The complete Soul configuration for ultra-minimal Classical Chinese mode is as follows:

{
  "id": "soul-orth-11-classical-chinese-ultra-minimal-mode",
  "name": "Ultra-Minimal Classical Chinese Output Mode",
  "summary": "Use relatively readable Classical Chinese to compress semantic density, convey the meaning with as few words as possible, and retain only conclusions, judgments, and necessary actions, thereby significantly reducing output tokens.",
  "soul": "Your persona core comes from the \"Ultra-Minimal Classical Chinese Output Mode\": use relatively readable Classical Chinese to compress semantic density, convey the meaning with as few words as possible, and retain only conclusions, judgments, and necessary actions, thereby significantly reducing output tokens.\nMaintain the following signature language traits: 1. Prefer concise Classical Chinese sentence patterns such as \"can\", \"should\", \"do not\", \"already\", \"however\", and \"therefore\", while avoiding obscure and difficult wording;\n2. Compress each sentence to 4-12 characters whenever possible, removing preamble, pleasantries, repeated explanation, and ineffective modifiers;\n3. Do not expand arguments unless necessary; if the user does not ask a follow-up, provide only conclusions, steps, or judgments;\n4. Do not alter the core persona of the main Catalog; only compress the expression into restrained, classical, ultra-minimal short sentences."
}

There are several key points in this template design:

Clear constraints: 4-12 characters per sentence, remove redundancy, prioritize conclusions.
Avoid obscurity: use concise Classical Chinese sentence patterns and avoid rare, difficult wording.
Preserve persona: only change the mode of expression, not the core persona.

When you keep adjusting configuration, it all comes down to a few parameters in the end.

Other Ultra-Minimal Modes

Besides the Classical Chinese mode, the HagiCode SOUL system also provides several other token-saving modes:

Telegraph-style ultra-minimal output mode (soul-orth-02):

Keep every sentence strictly within 10 characters
Prohibit decorative adjectives
No modal particles, exclamation marks, or reduplication throughout

Short fragmented muttering mode (soul-orth-01):

Keep sentences within 1-5 characters
Simulate fragmented self-talk
Weaken explicit logic and prioritize emotional transmission

Guided Q&A mode (soul-orth-03):

Use questions to guide the user’s thinking
Reduce direct output content
Lower token usage through interaction

Each of these modes emphasizes a different design direction, but the core goal is the same: reduce output tokens while preserving information quality. There are many roads to Rome; some are simply easier to walk than others.

Combination Strategy

One powerful feature of the SOUL system is support for cross-combining main Catalogs and orthogonal dimensions:

50 main Catalog groups: define the base persona (such as healing style, top-student style, aloof style, and so on)
10 orthogonal dimensions: define the mode of expression (such as Classical Chinese, telegraph-style, Q&A style, and so on)
Combination effect: can generate 500+ unique language-style combinations

For example, you can combine “Professional Development Engineer” with “Ultra-Minimal Classical Chinese Output Mode” to create an AI assistant that is both professional and concise. This flexibility allows the SOUL system to adapt to many different scenarios. You can mix and match however you like; there are more combinations than you are likely to exhaust.

Practical Guide

Create Through Soul Builder

Visit soul.hagicode.com and follow these steps:

Select a main Catalog (for example, “Professional Development Engineer”)
Select an orthogonal dimension (for example, “Ultra-Minimal Classical Chinese Output Mode”)
Preview the generated Soul content
Copy the generated Soul configuration

It is mostly just point-and-click, so there is probably not much more to say.

Use in Hero Configuration

Apply the Soul configuration to a Hero through the web interface or API:

// Hero Soul update example
const heroUpdate = {
  soul: "Your persona core comes from the \"Ultra-Minimal Classical Chinese Output Mode\": ...",
  soulCatalogId: "soul-orth-11-classical-chinese-ultra-minimal-mode",
  soulDisplayName: "Ultra-Minimal Classical Chinese Output Mode",
  soulStyleType: "orthogonal-dimension",
  soulSummary: "Use relatively readable Classical Chinese to compress semantic density..."
};

await updateHero(heroId, heroUpdate);

Custom Soul Templates

Users can fine-tune a preset template or write one from scratch. Here is a custom example for a code review scenario:

You are a code reviewer who pursues extreme concision.
All output must follow these rules:
1. Only point out specific problems and line numbers
2. Each issue must not exceed 15 characters
3. Use concise terms such as "should", "must", and "do not"
4. Do not provide extra explanation

Example output:
- Line 23: variable name too long, should abbreviate
- Line 45: null not handled, must add checks
- Line 67: logic redundant, can simplify

You can revise the template however you like. A template is only a starting point anyway.

Notes

Compatibility:

Classical Chinese mode works with all 50 main Catalog groups
Can be combined with any base persona
Does not change the core persona of the main Catalog

Caching mechanism:

Soul is cached when the Session executes for the first time
The cache is reused within the same SessionId
Modifying Hero configuration does not affect Sessions that have already started

Constraints and limits:

The maximum length of the Soul field is 8000 characters
Heroes without a Soul field in historical data can still be used normally
Soul and style equipment slots are independent and do not overwrite each other

Effect Comparison

According to real test data from the project, the results after enabling ultra-minimal Classical Chinese mode are as follows:

Scenario	Original output tokens	Classical Chinese mode	Savings
Code review	850	420	51%
Technical Q&A	620	380	39%
Solution suggestions	1100	680	38%
Average	-	-	30-50%

The data comes from actual usage statistics in the HagiCode project, and exact results vary by scenario. Still, the saved tokens add up, and your wallet will appreciate it.

Conclusion

The HagiCode SOUL system offers an innovative way to optimize AI output: reduce token consumption by constraining expression rather than compressing the information itself. As its most representative approach, ultra-minimal Classical Chinese mode has delivered 30-50% token savings in real-world use.

The core value of this approach lies in the following:

Preserve information quality: instead of simply truncating output, it expresses the same content more efficiently.
Flexible and composable: supports 500+ combinations of personas and expression styles.
Easy to use: Soul Builder provides a visual interface, so no coding is required.
Production-grade stability: validated in the project and capable of large-scale use.

If you are also building AI applications, or if you are interested in the HagiCode project, feel free to reach out. The meaning of open source lies in progressing together, and we also look forward to seeing your own innovative uses. The saying may be old, but it remains true: one person may go fast, but a group goes farther.

References

HagiCode GitHub: github.com/HagiCode-org/site
HagiCode official site: hagicode.com
Soul Builder: soul.hagicode.com
Docker deployment guide: docs.hagicode.com/installation/docker-compose
Desktop app: hagicode.com/desktop/
30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/

If this article helped you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official site to learn more: hagicode.com
The public beta has started, and you are welcome to install and try it

Copyright Notice

Author: newbe36524
Original article link: https://docs.hagicode.com/blog/2026-04-04-soul-token-optimization-classical-chinese/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please cite the source when reposting.

From CLI Calls to SDK Integration: Best Practices for GitHub Copilot in .NET Projects

Apr 3, 2026

From CLI Calls to SDK Integration: Best Practices for GitHub Copilot in .NET Projects

The upgrade path from command-line invocation to official SDK integration has been quite a journey. Today, I want to share the pitfalls we ran into and what we learned while building HagiCode.

Background

After the GitHub Copilot SDK was officially released in 2025, we started integrating it into our AI capability layer. Before that, the project mainly used GitHub Copilot by directly invoking the Copilot CLI command-line tool, but that approach had several obvious issues:

Complex process management: We had to manually manage the CLI process lifecycle, startup timeouts, and process cleanup. Processes can crash without warning.
Incomplete event handling: Raw CLI invocation makes it hard to capture fine-grained events from model reasoning and tool execution. It is like seeing only the result without the thinking process.
Difficult session management: There was no effective mechanism for session reuse and recovery, so every interaction had to start over.
Compatibility problems: CLI arguments changed frequently, which meant we constantly had to maintain compatibility logic for those parameters.

These issues became increasingly apparent in day-to-day development, especially when we needed to track model reasoning (thinking) and tool execution status in real time. At that point, it became clear that we needed a lower-level and more complete integration approach.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project, and during development we needed deep integration with GitHub Copilot capabilities, from basic code completion to complex multi-turn conversations and tool calling. Those real-world requirements pushed us to move from CLI invocation to the official SDK.

If this implementation sounds useful to you, there is a good chance our engineering experience can help. In that case, the HagiCode project itself may also be worth checking out. You might even find more project information and links at the end of this article.

Architecture Design

The project uses a layered architecture to address the limitations of CLI invocation:

┌─────────────────────────────────────────────────────────┐
│  hagicode-core (Orleans Grains + AI Provider Layer)    │
│  - CopilotAIProvider: Converts AIRequest to CopilotOptions │
│  - GitHubCopilotGrain: Orleans distributed execution interface │
└─────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────┐
│  HagiCode.Libs (Shared Provider Layer)                 │
│  - CopilotProvider: CLI Provider interface implementation │
│  - ICopilotSdkGateway: SDK invocation abstraction      │
│  - GitHubCopilotSdkGateway: SDK session management and event dispatch │
└─────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────┐
│  GitHub Copilot SDK (Official .NET SDK)                │
│  - CopilotClient: SDK client                           │
│  - CopilotSession: Session management                  │
│  - SessionEvent: Event stream                          │
└─────────────────────────────────────────────────────────┘

This layered design brings several practical technical advantages:

Separation of concerns: Core business logic is decoupled from SDK implementation details.
Testability: The ICopilotSdkGateway interface makes unit testing straightforward.
Reusability: HagiCode.Libs can be referenced by multiple projects.
Maintainability: SDK upgrades only require changes in the Gateway layer, while upper layers remain untouched.

Core Implementation

Authentication Flow

Authentication is the first and most important step of SDK integration. If authentication fails, nothing else matters. We designed a flexible authentication configuration that supports multiple authentication sources:

// CopilotProvider.cs - Authentication source configuration
public class CopilotOptions
{
    public bool UseLoggedInUser { get; set; } = true;
    public string? GitHubToken { get; set; }
    public string? CliUrl { get; set; }
}

// Convert to SDK request
return new CopilotSdkRequest(
    GitHubToken: options.AuthSource == CopilotAuthSource.GitHubToken
        ? options.GitHubToken
        : null,
    UseLoggedInUser: options.AuthSource != CopilotAuthSource.GitHubToken
);

The benefits of this design are fairly clear:

It supports logged-in user mode without requiring a token, which fits desktop scenarios well.
It supports GitHub Token mode, which is suitable for server-side deployments and centralized management.
It supports overriding the Copilot CLI URL, which helps with enterprise proxy configuration.

In practice, this flexible authentication model greatly simplified configuration across different deployment scenarios. The desktop client can use each user’s own Copilot login state, while the server side can manage authentication centrally through tokens.

Event Stream Handling

One of the most powerful capabilities of the SDK is complete event stream capture. We implemented an event dispatch system that can handle all kinds of SDK events in real time:

// GitHubCopilotSdkGateway.cs - Core logic for event dispatch
internal static SessionEventDispatchResult DispatchSessionEvent(
    SessionEvent evt, bool sawDelta)
{
    switch (evt)
    {
        case AssistantReasoningEvent reasoningEvent:
            // Capture the model reasoning process
            events.Add(new CopilotSdkStreamEvent(
                CopilotSdkStreamEventType.ReasoningDelta,
                Content: reasoningEvent.Data.Content));
            break;

        case ToolExecutionStartEvent toolStartEvent:
            // Capture the start of a tool invocation
            events.Add(new CopilotSdkStreamEvent(
                CopilotSdkStreamEventType.ToolExecutionStart,
                ToolName: toolStartEvent.Data.ToolName,
                ToolCallId: toolStartEvent.Data.ToolCallId));
            break;

        case ToolExecutionCompleteEvent toolCompleteEvent:
            // Capture tool completion and its result
            events.Add(new CopilotSdkStreamEvent(
                CopilotSdkStreamEventType.ToolExecutionEnd,
                Content: ExtractToolExecutionContent(toolCompleteEvent)));
            break;

        default:
            // Preserve unhandled events as RawEvent
            events.Add(new CopilotSdkStreamEvent(
                CopilotSdkStreamEventType.RawEvent,
                RawEventType: evt.GetType().Name));
            break;
    }
}

The value of this implementation is significant:

Complete capture of the model reasoning process (thinking): users can see how the AI is reasoning, not just the final answer.
Real-time tracking of tool execution status: we know which tools are running, when they finish, and what they return.
Zero event loss: by falling back to RawEvent, we ensure every event is recorded.

In HagiCode, these fine-grained events help users understand how the AI works internally, especially when debugging complex tasks.

CLI Compatibility Handling

After migrating from CLI invocation to the SDK, we found that some existing CLI parameters no longer applied in the SDK. To preserve backward compatibility, we implemented a parameter filtering system:

// CopilotCliCompatibility.cs - Argument filtering
private static readonly Dictionary<string, string> RejectedFlags = new()
{
    ["--headless"] = "Unsupported startup argument",
    ["--model"] = "Passed through an SDK-native field",
    ["--prompt"] = "Passed through an SDK-native field",
    ["--interactive"] = "Interaction is managed by the provider",
};

public static CopilotCliArgumentBuildResult BuildCliArgs(CopilotOptions options)
{
    // Filter out unsupported arguments and keep compatible ones
    // Generate diagnostic information
}

This gives us several benefits:

It automatically filters incompatible CLI arguments to prevent runtime errors.
It generates clear diagnostic messages to help developers locate problems quickly.
It keeps the SDK stable and insulated from changes in CLI parameters.

During the upgrade, this compatibility mechanism helped us transition smoothly. Existing configuration files could still be used and only needed gradual adjustments based on the diagnostic output.

Runtime Pooling

Creating Copilot SDK sessions is relatively expensive, and creating and destroying sessions too frequently hurts performance. To solve that, we implemented a session pool management system:

// CopilotProvider.cs - Session pool management
await using var lease = await _poolCoordinator.AcquireCopilotRuntimeAsync(
    request,
    async ct => await _gateway.CreateRuntimeAsync(sdkRequest, ct),
    cancellationToken);

if (lease.IsWarmLease)
{
    // Reuse an existing session
    yield return CreateSessionReusedMessage();
}

await foreach (var eventData in lease.Entry.Resource.SendPromptAsync(...))
{
    yield return MapEvent(eventData);
}

The benefits of session pooling include:

Session reuse: requests with the same sessionId can reuse existing sessions and reduce startup overhead.
Session recovery support: after a network interruption, previous session state can be restored.
Automatic pooling management: expired sessions are cleaned up automatically to avoid resource leaks.

In HagiCode, session pooling noticeably improved responsiveness, especially for continuous conversations.

Orleans Integration

HagiCode uses Orleans as its distributed framework, and we integrated the Copilot SDK into Orleans Grains:

// GitHubCopilotGrain.cs - Distributed execution
public async IAsyncEnumerable<GitHubCopilotResponse> ExecuteCommandStreamAsync(
    string command,
    CancellationToken token = default)
{
    var provider = await aiProviderFactory.GetProviderAsync(AIProviderType.GitHubCopilot);

    await foreach (var chunk in provider.SendMessageAsync(request, null, token))
    {
        // Map to the unified response format
        yield return BuildChunkResponse(chunk, startedAt);
    }
}

The advantages of Orleans integration are substantial:

Unified AI Provider abstraction: it becomes easy to switch between different AI providers.
Multi-tenant isolation: Copilot sessions for different users remain isolated from one another.
Persistent session state: session state can be restored even after server restarts.

For scenarios that need to handle a large number of concurrent requests, Orleans provides strong scalability.

Practical Guide

Configuration Example

Here is a complete configuration example:

{
  "AI": {
    "Providers": {
      "Providers": {
        "GitHubCopilot": {
          "Enabled": true,
          "ExecutablePath": "copilot",
          "Model": "gpt-5",
          "WorkingDirectory": "/path/to/project",
          "Timeout": 7200,
          "StartupTimeout": 30,
          "UseLoggedInUser": true,
          "NoAskUser": true,
          "Permissions": {
            "AllowAllTools": false,
            "AllowedTools": ["Read", "Bash", "Grep"],
            "DeniedTools": ["Edit"]
          }
        }
      }
    }
  }
}

Usage Notes

In real-world usage, we summarized several points worth paying attention to:

Startup timeout configuration: The first startup of Copilot CLI can take a relatively long time, so we recommend setting StartupTimeout to at least 30 seconds. If this is the first login, it may take even longer.

Permission management: In production environments, avoid using AllowAllTools: true. Use the AllowedTools allowlist to control which tools are available, and use the DeniedTools denylist to block dangerous operations. This effectively prevents the AI from executing risky commands.

Session management: Requests with the same sessionId automatically reuse sessions. Session state is persisted through ProviderSessionId. Cancellation is propagated via CancellationTokenSource.

Diagnostic output: Incompatible CLI arguments generate messages of type diagnostic. Raw SDK events are preserved as event.raw. Error messages include categories such as startup timeout and argument incompatibility to make troubleshooting easier.

Best Practices

Based on our practical experience, here are a few best practices:

1. Use a tool allowlist

var request = new AIRequest
{
    Prompt = "Analyze this file",
    AllowedTools = new[] { "Read", "Grep", "Bash(git:*)" }
};

Explicitly specifying the allowed tools through an allowlist helps prevent unexpected AI actions. This is especially important for tools with write permissions, such as Edit, which should be handled with extra care.

2. Set reasonable timeouts

options.Timeout = 3600;  // 1 hour
options.StartupTimeout = 60;  // 1 minute

Set appropriate timeout values based on task complexity. If the value is too short, tasks may be interrupted. If it is too long, resources may be wasted waiting on unresponsive requests.

3. Enable session reuse

options.SessionId = "my-session-123";

Using the same sessionId for related tasks lets you reuse prior session context and improve response speed.

4. Handle streaming responses

await foreach (var chunk in provider.StreamAsync(request))
{
    switch (chunk.Type)
    {
        case StreamingChunkType.ThinkingDelta:
            // Handle the reasoning process
            break;
        case StreamingChunkType.ToolCallDelta:
            // Handle tool invocation
            break;
        case StreamingChunkType.ContentDelta:
            // Handle text output
            break;
    }
}

Streaming responses let you show AI processing progress in real time, which improves the user experience. This is especially valuable for long-running tasks.

5. Error handling and retries

try
{
    await foreach (var chunk in provider.StreamAsync(request))
    {
        // Handle the response
    }
}
catch (CopilotSessionException ex)
{
    // Handle session exceptions
    logger.LogError(ex, "Copilot session failed");
    // Decide whether to retry based on the exception type
}

Proper error handling and retry logic improve overall system stability.

Conclusion

Upgrading from CLI invocation to SDK integration delivered substantial value to the HagiCode project:

Improved stability: the SDK provides a more stable interface that is not affected by CLI version changes.
More complete functionality: it captures the full event stream, including reasoning and tool execution status.
Higher development efficiency: the type-safe SDK interface makes development more efficient and reduces runtime errors.
Better user experience: real-time event feedback gives users a clearer understanding of what the AI is doing.

This upgrade was not just a replacement of technical implementation. It was also an architectural optimization of the entire AI capability layer. Through layered design and abstraction interfaces, we gained better maintainability and extensibility.

If you are considering integrating GitHub Copilot into your own .NET project, I hope these practical lessons help you avoid unnecessary detours. The official SDK is indeed more stable and complete than CLI invocation, and it is worth the time to understand and adopt it.

References

If this article helped you:

Give it a like so more people can discover it.
Star us on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
One-click installation experience: docs.hagicode.com/installation/docker-compose
Quick installation for the Desktop app: hagicode.com/desktop/
Public beta has started. You are welcome to install and try it out.

That brings this article to a close. Technical writing never really ends, because technology keeps evolving and we keep learning. If you have questions or suggestions while using HagiCode, feel free to contact us anytime.

Copyright Notice

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-03-github-copilot-sdk-integration/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please include the source when reposting.

The Hallucination Problem in AI Coding Assistants: How to Achieve Specification-driven Development with OpenSpec

Apr 2, 2026

The Hallucination Problem in AI Coding Assistants: How to Achieve Specification-driven Development with OpenSpec

AI coding assistants are powerful, but they often generate code that does not match real requirements or violates project conventions. This article shares how the HagiCode project uses the OpenSpec workflow to implement specification-driven development and significantly reduce the risk of AI hallucinations through a structured proposal mechanism.

Background

Anyone who has used GitHub Copilot or ChatGPT to write code has probably had this experience: the code generated by AI looks polished, but once you actually use it, problems show up everywhere. Maybe it uses the wrong component from the project, maybe it ignores the team’s coding standards, or maybe it writes a large chunk of logic based on assumptions that do not even exist.

This is the so-called “AI hallucination” problem. In programming, it appears as code that seems reasonable on the surface but does not actually fit the real state of the project.

There is also something a bit frustrating about this. As AI coding assistants become more widespread, the problem becomes more serious. After all, AI lacks an understanding of project history, architectural decisions, and coding conventions, and when given too much freedom it can “creatively” generate code that does not match reality. It is a bit like writing an article: without structure, it is easy to wander off into imagination, even though the real situation is far more grounded.

To solve these pain points, we made a bold decision: instead of trying to make AI smarter, we put it inside a “specification” cage. The change this decision brought was probably bigger than you might expect, and I will explain that shortly.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project dedicated to solving real problems in AI programming through structured engineering practices.

The Root Causes of AI Hallucinations

Before diving into the solution, let us first look at where the problem actually comes from. After all, if you understand both yourself and your opponent, you can fight a hundred battles without defeat. Applied to AI, that saying is still surprisingly fitting.

Missing context

AI models are trained on public code repositories, but your project has its own history, conventions, and architectural decisions. AI cannot directly access this kind of “implicit knowledge,” so the code it generates is often disconnected from the actual project.

This is not entirely the AI’s fault. It has never lived inside your project, so how could it know all of your unwritten rules? Like a brand-new intern, not understanding the local customs is normal. The only issue is that the cost can be rather high.

Too much freedom

When you ask AI, “Help me implement a user authentication feature,” it may generate code in almost any form. Without clear constraints, AI will implement things in the way it “thinks” is reasonable instead of following your project’s requirements.

That is like asking someone who has never learned your project standards to improvise freely. How could that not cause trouble? It is not even that the AI is being irresponsible; it just has no idea what responsibility means in this context.

Lack of validation

After AI generates code, if there is no structured review process, code based on false assumptions can go directly into the repository. By the time the problem is discovered in testing or even in production, the cost is already far too high.

That is like trying to mend the pen after the sheep are already gone. The principle is obvious, but in practice people often still find the extra work bothersome. Before things go wrong, who really wants to spend more time up front?

OpenSpec: The Answer to Specification-driven Development

HagiCode chose OpenSpec as the solution. The core idea is simple: all code changes must go through a structured proposal workflow, turning abstract ideas into executable implementation plans.

That may sound grand, but in plain terms it just means making AI write the requirements document before writing the code. As the old saying goes, preparation leads to success, and lack of preparation leads to failure.

What is OpenSpec

OpenSpec is an npm-based command-line tool (@fission-ai/openspec) that defines a standard proposal file structure and validation mechanism. Put simply, it makes AI “write the requirements document” before it writes code.

A three-step workflow to prevent hallucinations

OpenSpec ensures proposal quality through a three-step workflow:

Step 1: Initialize the proposal - Set the session state to Openspecing Step 2: Intermediate processing - Keep the Openspecing state while gradually refining the artifacts Step 3: Complete the proposal - Transition to the Reviewing state

There is a clever detail in this design: the first step uses the ProposalGenerationStart type, and completing it does not trigger a state transition. This ensures that the review stage is not entered too early before the entire multi-step workflow is finished.

This detail is actually quite interesting. It is like cooking: if you lift the lid before the heat is right, nothing will turn out well. Only by moving step by step with a bit of patience can you end up with a good dish.

// Implementation in the HagiCode project
public enum MessageAssociationType
{
    ProposalGeneration = 2,
    ProposalExecution = 3,

    /// <summary>
    /// Marks the start of the three-step proposal generation workflow
    /// Does not transition to the Reviewing state when completed
    /// </summary>
    ProposalGenerationStart = 5
}

Standardized file structure

Every OpenSpec proposal follows the same directory structure:

openspec/
├── changes/                    # Active and archived changes
│   ├── {change-name}/
│   │   ├── proposal.md        # Proposal description
│   │   ├── design.md          # Design document
│   │   ├── specs/             # Technical specifications
│   │   └── tasks.md           # Executable task list
│   └── archive/               # Archived changes
└── specs/                     # Standalone specification library

According to statistics from the HagiCode project, there are already more than 4,000 archived changes and over 150,000 lines of specification files. This historical accumulation not only gives AI clear guidance to follow, but also provides the team with a valuable knowledge base.

It is a bit like the classics left behind by earlier generations. Read enough of them and patterns begin to emerge. The only difference is that these classics are stored in files instead of written on bamboo slips.

Multi-layer validation mechanism

The system implements multiple layers of validation to ensure proposal quality:

// Validate that required files exist
ValidateProposalFiles()

// Validate prerequisites before execution
ValidateExecuteAsync()

// Validate start conditions
ValidateStartAsync()

// Validate archive conditions
ValidateArchiveAsync()

// Validate proposal name format (kebab-case)
ValidateNameFormat()

These validations are like gatekeepers at multiple checkpoints. Only truly qualified proposals can pass through. It may look tedious, but it is still much better than letting poor code enter the repository.

Prompt template constraints

When AI runs inside HagiCode, it uses predefined Handlebars templates. These templates contain explicit step-by-step instructions and protective guardrails. For example:

Do not continue before understanding the user’s intent
Do not generate unvalidated code
Require the user to provide the name again if it is invalid
If the change already exists, suggest using the continue command instead of recreating it

This way of “dancing in shackles” actually helps AI focus more on understanding requirements and generating code that follows standards. Constraints are not always a bad thing. Sometimes too much freedom is exactly what creates chaos.

Practice: How to Use OpenSpec in a Project

Installation and initialization

npm install -g @fission-ai/openspec@1
openspec --version  # Verify the installation

The openspec/ folder structure will be created automatically in the project root.

There is not much mystery in this step. It is just tool installation, which everyone understands. Just remember to use @fission-ai/openspec@1; newer versions may have pitfalls, and stability matters most.

Create a proposal

In the HagiCode conversation interface, use the shortcut command:

/opsx:new

Or specify a change name and target repository:

/opsx:new "add-user-auth" --repos "repos/web"

Creating a proposal is like outlining an article before writing it. Once you have an outline, the rest becomes much easier. Many people prefer to jump straight into writing, only to realize halfway through that the idea does not hold together. That is when the real headache begins.

Generate artifacts

Use /opsx:continue to generate the required artifacts step by step:

proposal.md - Describes the purpose and scope of the change

# Proposal: Add User Authentication

## Why
The current system lacks user authentication and cannot protect sensitive APIs.

## What Changes
- Add JWT authentication middleware
- Implement login/registration APIs
- Update frontend integration

design.md - Detailed technical design

# Design: Add User Authentication

## Context
The system currently uses public APIs, so anyone can access them...

## Decisions
1. Choose JWT instead of Session...
2. Use the HS256 algorithm...

## Risks
- Risk of token leakage...
- Mitigation measures...

specs/ - Technical specifications and test scenarios

# user-auth Specification

## Requirements

### Requirement: JWT Token Generation
The system SHALL use the HS256 algorithm to generate JWT tokens.

#### Scenario: Valid login
- WHEN the user provides valid credentials
- THEN the system SHALL return a valid JWT token

tasks.md - Executable task list

# Tasks: Add User Authentication

## 1. Backend Changes
- [ ] 1.1 Create AuthController
- [ ] 1.2 Implement JWT middleware
- [ ] 1.3 Add unit tests

These artifacts are a lot like drafts for an article. Once the draft is complete, the main text flows naturally. Many people dislike writing drafts because they think it wastes time, but in reality that is often where the clearest thinking happens.

Review and apply

After all artifacts are complete:

/opsx:apply

AI will read all context files and execute tasks step by step according to the checklist in tasks.md. At this point, because the specification is already clear, the quality of the generated code is much higher.

By this stage, half the work is already done. Once there is a clear task list, the rest is simply executing it step by step. The problem is that many people skip the earlier steps and jump straight here, and then quality naturally becomes hard to guarantee.

Notes and Best Practices

Proposal naming rules

Use kebab-case, start with a letter, and include only lowercase letters, numbers, and hyphens:

✅ add-user-auth
❌ AddUserAuth
❌ add--user-auth

Naming rules may seem minor, but consistency is always worth something. In software, consistency matters even when people do not always pay attention to it.

Avoid common mistakes

Using the wrong type in step 1 of the three-step workflow - This causes the state to transition too early
Forgetting to trigger the state transition in the final step - This leaves the workflow stuck in the Openspecing state
Skipping review and executing directly - You should validate that all artifacts are complete first

These mistakes are all common for beginners. Experienced people naturally know how to avoid them. Still, everyone becomes experienced eventually, and taking a few detours is part of the process. The only hope is to avoid taking too many.

Multi-change management

OpenSpec supports managing multiple proposals at the same time, which is especially useful for large features:

# View all active changes
openspec list

# Switch to a specific change
openspec apply "add-user-auth"

# View change status
openspec status --change "add-user-auth"

Managing multiple changes is like writing several articles at once. It takes some technique and patience, but once you get used to it, it becomes natural enough.

Understanding the session state machine

Understanding state transitions helps with troubleshooting:

Init → Drafting → Openspecing → Reviewing → Executing → ExecutionCompleted → Completed → Archived

Openspecing: Generating the plan
Reviewing: Under review (artifacts can be revised repeatedly)
Executing: In execution (applying tasks.md)

A state machine is, in the end, just a set of rules. Rules can feel annoying at times, but more often they are useful. As the saying goes, without rules, nothing can be accomplished properly.

Conclusion

Through the OpenSpec workflow, the HagiCode project has achieved significant results in addressing the AI hallucination problem:

Fewer hallucinations - AI must follow a structured specification instead of generating code arbitrarily
Higher quality - Multi-layer validation ensures changes comply with project standards
Faster collaboration - Archived changes provide references for future development
Traceability - Every change has a complete record of proposal, design, specification, and tasks

This approach does not make AI smarter. It puts AI inside a “specification” cage. Practice has shown that dancing in shackles can actually lead to a better performance.

The principle is simple. Constraints are not necessarily bad. Like writing, having a format to follow often makes it easier to produce good work. Many people dislike constraints because they think constraints limit creativity, but creativity also needs the right soil to grow.

If you are also using AI coding assistants and have run into similar problems, give OpenSpec a try. Specification-driven development may seem to add extra steps, but that early investment pays back many times over in code quality and maintenance efficiency.

Sometimes slowing down a little is actually the fastest way forward. Many people just do not realize it yet.

References

OpenSpec npm package: www.npmjs.com/package/@fission-ai/openspec
HagiCode project repository: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
One-click installation with Docker Compose: docs.hagicode.com/installation/docker-compose
Quick installation for Desktop: hagicode.com/desktop/

If this article helped you, feel free to give us a Star on GitHub. The HagiCode public beta has already started, and you can join the experience by installing it now.

That is about enough for this article. There is nothing especially profound here, just a summary of a few practical lessons. I hope it is useful to everyone. Sharing is a good thing: you learn something yourself, and others learn something too.

Still, an article is only an article. Practice is what really matters. Knowledge from the page always feels shallow until you apply it yourself.

Copyright Notice

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-02-ai-coding-assistant-hallucination-openspec-spec-driven-development/
Copyright: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please include the source when reposting.

Building Elegant New User Onboarding in React Projects: HagiCode's driver.js Practice

Apr 1, 2026

Building Elegant New User Onboarding in React Projects: HagiCode’s driver.js Practice

When users open your product for the first time, do they really know where to start? In this article, I want to talk a bit about how we used driver.js for new user onboarding in the HagiCode project. Consider it a small practical example to get the conversation started.

Background

Have you ever run into this situation? A new user signs up for your product, opens the page, and immediately looks lost. They scan around, unsure what to click or what to do next. As developers, we often assume users will just “explore on their own” because, after all, human curiosity is limitless. But reality is different: most users quietly leave within minutes if they cannot find the right entry point, as if the story begins suddenly and ends just as naturally.

New user onboarding is an important way to solve this problem, but building it well is not that simple. A good onboarding system needs to:

Precisely locate page elements and highlight them
Support multi-step onboarding flows
Remember the user’s choice (complete or skip)
Avoid affecting page performance and normal interaction
Keep the code structure clear and easy to maintain

While building HagiCode, we ran into the same challenge. HagiCode is an AI coding assistant, and its core workflow is an OpenSpec workflow that looks like this: “the user creates a Proposal -> the AI generates a plan -> the user reviews it -> the AI executes it.” For users encountering this concept for the first time, the workflow is completely new, so they need solid onboarding to get started quickly. New things always take a little time to get used to.

About HagiCode

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is a Claude-based AI coding assistant that helps developers complete coding tasks more efficiently through the OpenSpec workflow. You can view our open-source code on GitHub.

Why We Chose driver.js

During the technical evaluation phase, we looked at several mainstream onboarding libraries. Each one had its own strengths:

Intro.js: Powerful, but relatively large, and style customization is somewhat complex
Shepherd.js: Well-designed API, but a bit too “heavy” for our use case
driver.js: Lightweight, concise, intuitive API, and works well in the React ecosystem

In the end, we chose driver.js. There was no especially dramatic reason. The choice mainly came down to these considerations:

Lightweight: The core library is small and does not significantly increase bundle size
Simple API: The configuration is clear and intuitive, so it is easy to pick up
Flexible: Supports custom positioning, styling, and interaction behavior
Dynamic import: Can be loaded on demand without affecting first-screen performance

With technology selection, there is rarely a universally best answer. Usually, there is only the option that fits best.

Technical Implementation

Core Configuration

driver.js has a very intuitive configuration model. Here is the core configuration we use in the HagiCode project:

import { driver } from 'driver.js';
import 'driver.js/dist/driver.css';

const newConversationDriver = driver({
  allowClose: true,           // Allow users to close the guide
  animate: true,              // Enable animations
  overlayClickBehavior: 'close', // Close the guide when the overlay is clicked
  disableActiveInteraction: false, // Keep elements interactive
  showProgress: false,        // Do not show the progress bar (we manage progress ourselves)
  steps: guideSteps           // Array of guide steps
});

The reasoning behind these settings is:

allowClose: true - Respect the user’s choice and do not force them to finish the guide
disableActiveInteraction: false - Some steps require real user actions, such as typing input, so interaction cannot be disabled
overlayClickBehavior: 'close' - Give users a quick way to exit

State Management

Persisting onboarding state is critical. We do not want to restart the guide every time the page refreshes, because that gets annoying fast. HagiCode uses localStorage to manage guide state:

export type GuideState = 'pending' | 'dismissed' | 'completed';

export interface UserGuideState {
  session: GuideState;
  detailGuides: Record<string, GuideState>;
}

// Read state
export const getUserGuideState = (): UserGuideState => {
  const state = localStorage.getItem('userGuideState');
  return state ? JSON.parse(state) : { session: 'pending', detailGuides: {} };
};

// Update state
export const setUserGuideState = (state: UserGuideState) => {
  localStorage.setItem('userGuideState', JSON.stringify(state));
};

We define three states:

pending: The guide is still in progress, and the user has not completed or skipped it
dismissed: The user closed the guide proactively
completed: The user completed all steps

For Proposal detail page onboarding, we also support more fine-grained state tracking through the detailGuides map, because one Proposal can go through multiple stages - draft, review, and execution complete - and each stage needs different guidance. The state of things is always changing, and onboarding should reflect that.

Target Element Selection

driver.js uses CSS selectors to locate target elements. HagiCode follows a simple convention: use a custom data-guide attribute to mark onboarding targets:

const steps = [
  {
    element: '[data-guide="launch"]',
    popover: {
      title: 'Start a New Conversation',
      description: 'Click here to create a new conversation session...'
    }
  }
];

In components, it looks like this:

<button data-guide="launch" onClick={handleLaunch}>
  New Conversation
</button>

The benefits of this approach are:

Avoid conflicts with business styling class names
Clear semantics, so you can immediately tell the element is related to onboarding
Easier to manage and maintain consistently

Dynamic Import Optimization

Because the onboarding feature is only needed in specific scenarios, such as a user’s first visit, we use dynamic imports to optimize initial loading performance:

const initNewUserGuide = async () => {
  // Dynamically import driver.js
  const { driver } = await import('driver.js');
  await import('driver.js/dist/driver.css');

  // Initialize the guide
  const newConversationDriver = driver({
    // ...configuration
  });

  newConversationDriver.drive();
};

This way, driver.js and its stylesheet are only loaded when needed and do not affect first-screen performance. Not many people enjoy waiting for something they do not even need yet.

Onboarding Flow Design

HagiCode implements two onboarding paths that cover the user’s core scenarios.

Session Onboarding (10 Steps)

This onboarding path helps users complete the entire flow from creating a conversation to submitting their first complete Proposal:

launch - Start the guide and introduce the “New Conversation” button
compose - Guide the user to type a request in the input box
send - Guide the user to click the send button
proposal-launch-readme - Guide the user to create a README Proposal
proposal-compose-readme - Guide the user to edit the README request content
proposal-submit-readme - Guide the user to submit the README Proposal
proposal-launch-agents - Guide the user to create an AGENTS.md Proposal
proposal-compose-agents - Guide the user to edit the AGENTS.md request
proposal-submit-agents - Guide the user to submit the AGENTS.md Proposal
proposal-wait - Explain that the AI is processing and the user should wait a moment

The idea behind this path is to let users experience HagiCode’s core workflow firsthand through two real Proposal creation tasks, one for README and one for AGENTS.md. There is a big difference between hearing about a workflow and going through it yourself.

The following screenshots correspond to a few key points in the session onboarding flow:

Session onboarding: starting from creating a conversation session

The first step of session onboarding takes the user to the entry point for creating a new Conversation session.

Session onboarding: entering the first request

Next, the guide prompts the user to type their first request into the input box, lowering the barrier to getting started.

Session onboarding: sending the first message

After the input is complete, the guide clearly prompts the user to send the first message so the action flow feels more connected.

Session onboarding: waiting in the session list for continued execution

Once both Proposals have been created, the guide returns to the session list and lets the user know that the next step is simply to wait for the system to continue execution and refresh.

Proposal Detail Onboarding (3 Steps)

When users enter the Proposal detail page, HagiCode triggers the corresponding guide based on the Proposal’s current state:

drafting (draft stage) - Guide the user to review the AI-generated plan
reviewing (review stage) - Guide the user to execute the plan
executionCompleted (completed stage) - Guide the user to archive the plan

The defining characteristic of this guide is that it is state-driven: it dynamically decides which onboarding step to show based on the Proposal’s actual state. Things change, and onboarding should change with them.

The screenshot below shows the Proposal detail page in its onboarding state during the drafting phase:

Proposal detail onboarding: generate the plan first during drafting

At this stage, the guide focuses the user’s attention on the key action of generating a plan, so they do not wonder what to do first when entering the detail page for the first time.

Element Render Retry Mechanism

In React applications, the target onboarding element may not have finished rendering yet, for example because asynchronous data is still loading. To handle this, HagiCode implements a retry mechanism:

const waitForElement = (selector: string, maxRetries = 10, interval = 100) => {
  let retries = 0;
  return new Promise<HTMLElement>((resolve, reject) => {
    const checkElement = () => {
      const element = document.querySelector(selector) as HTMLElement;
      if (element) {
        resolve(element);
      } else if (retries < maxRetries) {
        retries++;
        setTimeout(checkElement, interval);
      } else {
        reject(new Error(`Element not found: ${selector}`));
      }
    };
    checkElement();
  });
};

Call this function before initializing the guide to make sure the target element already exists. Sometimes waiting a little longer is worth it.

Best Practices Summary

Based on HagiCode’s practical experience, here are a few key best practices:

1. Onboarding Should Be Easy to Exit

Do not force users to complete onboarding. Some users are explorers by nature and prefer to figure things out on their own. Provide a clear “Skip” button, remember their choice, and do not interrupt them again next time.

2. Keep Onboarding Content Short and Sharp

Each onboarding step should focus on a single goal:

Title: Short and clear, ideally no more than a few words
Description: Get straight to the point and tell the user what this is and why it matters

Avoid long-winded explanations. User attention is limited during onboarding, and the more you say, the less likely they are to read it.

3. Keep Selectors Stable

Use stable element markers that do not change often. The custom data-guide attribute is a good choice. Avoid depending on class names or DOM structure, because those are easy to change during refactors. Code changes all the time, but some anchors should stay stable when possible.

4. Test Your Onboarding

HagiCode includes complete test cases for the onboarding feature:

describe('NewUserConversationGuide', () => {
  it('should initialize guide state correctly', () => {
    const state = getUserGuideState();
    expect(state.session).toBe('pending');
  });

  it('should update guide state correctly', () => {
    setUserGuideState({ session: 'completed', detailGuides: {} });
    const state = getUserGuideState();
    expect(state.session).toBe('completed');
  });
});

Tests help ensure that refactoring does not accidentally break onboarding behavior. Nobody wants a small code change to quietly damage previously working functionality.

5. Performance Optimization

Use dynamic imports to lazy-load the onboarding library
Avoid initializing onboarding logic after the user has already completed the guide
Consider the performance impact of animations, and disable them on lower-end devices if needed

Performance, like many things in life, deserves a bit of careful budgeting.

Conclusion

New user onboarding is an important part of improving product user experience. In the HagiCode project, we used driver.js to build a complete onboarding system that covers the full workflow from session creation to Proposal execution.

The core points we hope to share through this article are:

Technical choices should match actual needs: driver.js is not the most powerful option, but it is the best fit for us
State management is critical: Use localStorage to persist onboarding state and avoid repeatedly interrupting users
Onboarding design should stay focused: Each step should solve one problem, and no more
Code structure should stay clear: Separate onboarding configuration, state management, and UI logic to make maintenance easier

If you are adding new user onboarding to your own project, I hope the practical experience in this article helps. There is nothing especially mystical about this kind of technology. Keep trying, keep summarizing what you learn, and things gradually get easier.

References

Copyright Notice

Author: newbe36524
Original article: https://docs.hagicode.com/blog/2026-04-01-new-user-guide-with-driverjs/
Copyright notice: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please include the source when reposting.

Typing Is Slower Than Talking, and Talking Is Slower Than a Screenshot - Multimodal Input Practices for AI Coding Assistants

Mar 31, 2026

Typing Is Slower Than Talking, and Talking Is Slower Than a Screenshot - Multimodal Input Practices for AI Coding Assistants

Writing code has a speed limit no matter how fast you type. Sometimes something you could say in one sentence takes forever to type out; sometimes one screenshot explains everything, yet you still have to describe it with a pile of text. This article talks about what we ran into while building HagiCode, from speech recognition to image uploads. In the end, we just wanted to make an AI coding assistant a little easier to use.

Background

While building HagiCode, we noticed a problem - or rather, a problem that naturally surfaced once people started using it heavily: relying on typing alone can be tiring.

Think about it: interaction between users and the Agent is a core scenario. But if every exchange requires nonstop typing at the keyboard, the efficiency is not great:

Typing is too slow: For complicated issues, like error messages or UI problems, typing everything out can take half a minute, while saying it aloud might take ten seconds. That gap is real.
Images are more direct: Sometimes the UI throws an error, sometimes you want to compare a design draft, and sometimes you need to show a code structure. “A picture is worth a thousand words” may be an old saying, but it still holds up. Letting AI directly “see” the problem is much clearer than describing it for ages.
Interaction should feel natural: Modern AI assistants should support text, voice, and images. Users should be able to choose whichever input method feels most natural.

So we decided to add speech recognition and image upload support to HagiCode to make Agent interactions more convenient. If users can type a little less, that is already a win.

About HagiCode

The solutions shared in this article come from our hands-on work in the HagiCode project - or, more accurately, from lessons learned while stumbling through quite a few pitfalls.

HagiCode is an open-source AI coding assistant project with a simple goal: use AI to improve development efficiency. As we kept building, it became clear that users strongly wanted multimodal input. Sometimes speaking one sentence is faster than typing a long paragraph, and sometimes a screenshot is far clearer than a long explanation.

Those needs pushed us forward, and that is how features like speech recognition and image uploads eventually took shape. Users can now interact with AI in the most natural way available to them, and that feels good.

Analysis

Technical Challenges in Speech Recognition

When building speech recognition, we ran into a tricky issue: the browser WebSocket API does not support custom HTTP headers.

The speech recognition service we chose was ByteDance’s Doubao Speech Recognition API. Unfortunately, this API requires authentication information such as accessToken and secretKey to be passed through HTTP headers. That created an immediate technical conflict:

// The browser WebSocket API does not support this approach
const ws = new WebSocket('wss://api.com/ws', {
  headers: {
    'Authorization': 'Bearer token'  // Not supported
  }
});

We basically had two options:

URL query parameter approach: put the authentication info in the URL
- The advantage is that it is simple to implement
- The downside is that credentials are exposed to the frontend, which is insecure; some APIs also require header-based authentication
Backend proxy approach: implement a WebSocket proxy on the backend
- The advantage is that credentials remain securely stored on the backend and the solution is fully compatible with API requirements
- The downside is that implementation is a bit more complex

In the end, we chose the backend proxy approach. Security is not something you compromise on.

Functional Requirements for Image Uploads

Our requirements for image uploads were actually pretty straightforward:

Multiple upload methods: click to select a file, drag and drop, and paste from the clipboard
File validation: type restrictions (PNG, JPG, WebP, GIF) and size limits (5-10 MB) are basic requirements
User experience: upload progress, previews, and error messages so users always know what is happening
Security: server-side validation and protection against malicious file uploads are essential

Solution

Speech Recognition: WebSocket Proxy Architecture

We designed a three-layer architecture for speech recognition and found a path that worked:

Browser WebSocket
       |
       | ws://backend/api/voice/ws
       | (binary audio)
       v
Backend Proxy
       |
       | wss://openspeech.bytedance.com/ (with auth header)
       v
Doubao API

Core component implementations:

Frontend AudioWorklet processor:

class AudioProcessorWorklet extends AudioWorkletProcessor {
  process(inputs, outputs, parameters) {
    const input = inputs[0]?.[0];
    if (!input) return true;

    // Resample to 16 kHz (required by the Doubao API)
    const samples = this.resampleAudio(input, 48000, 16000);

    // Accumulate samples into 500 ms chunks
    this.accumulatedSamples.push(...samples);

    if (this.accumulatedSamples.length >= 8000) {
      // Convert to 16-bit PCM and send
      const pcm = this.floatToPcm16(this.accumulatedSamples);
      this.port.postMessage({ type: 'audioData', data: pcm.buffer }, [pcm.buffer]);
      this.accumulatedSamples = [];
    }
    return true;
  }
}

Backend WebSocket handler (C#):

[HttpGet("ws")]
public async Task GetWebSocket()
{
    if (HttpContext.WebSockets.IsWebSocketRequest)
    {
        await _webSocketHandler.HandleAsync(HttpContext);
    }
}

Frontend VoiceTextArea component:

export const VoiceTextArea = forwardRef<HTMLTextAreaElement, VoiceTextAreaProps>(
  ({ value, onChange, onTextRecognized, maxDuration }, ref) => {
    const { isRecording, interimText, volume, duration, startRecording, stopRecording } =
      useVoiceRecording({ onTextRecognized, maxDuration });

    return (
      <div className="flex gap-2">
        {/* Voice button */}
        <button onClick={handleButtonClick}>
          {isRecording ? <VolumeWaveform volume={volume} /> : <Mic />}
        </button>
        {/* Text input area */}
        <textarea value={displayValue} onChange={handleChange} />
      </div>
    );
  }
);

Image Uploads: Multi-Method Upload Component

We built a full-featured image upload component with support for all three upload methods, covering the most common scenarios users run into.

Core features:

Three upload methods:

// Click to upload
const handleClick = () => fileInputRef.current?.click();

// Drag-and-drop upload
const handleDrop = (e: React.DragEvent) => {
  const file = e.dataTransfer.files?.[0];
  if (file) uploadFile(file);
};

// Clipboard paste
const handlePaste = (e: ClipboardEvent) => {
  for (const item of Array.from(e.clipboardData?.items || [])) {
    if (item.type.startsWith('image/')) {
      const file = item.getAsFile();
      if (file) uploadFile(file);
    }
  }
};

Frontend validation:

const validateFile = (file: File): { valid: boolean; error?: string } => {
  if (!acceptedTypes.includes(file.type)) {
    return { valid: false, error: 'Only PNG, JPG, JPEG, WebP, and GIF images are allowed' };
  }
  if (file.size > maxSize) {
    return { valid: false, error: `Maximum file size is ${(maxSize / 1024 / 1024).toFixed(1)}MB` };
  }
  return { valid: true };
};

Backend upload handler (TypeScript):

export const Route = createFileRoute('/api/upload')({
  server: {
    handlers: {
      POST: async ({ request }) => {
        const formData = await request.formData();
        const file = formData.get('file') as File;

        // Validation
        const validation = validateFile(file);
        if (!validation.isValid) {
          return Response.json({ error: validation.error }, { status: 400 });
        }

        // Save file
        const uuid = uuidv4();
        const filePath = join(uploadDir, `${uuid}${extension}`);
        await writeFile(filePath, buffer);

        return Response.json({ url: `/uploaded/${today}/${uuid}${extension}` });
      }
    }
  }
});

Practical Guide

How to Use Speech Recognition

Configure the speech recognition service:
- Open the speech recognition settings page
- Configure the Doubao Speech AppId and AccessToken
- Optionally configure hotwords to improve recognition accuracy for domain-specific terms
Use it in the input box:
- Click the microphone icon on the left side of the input box
- Start speaking after the waveform animation appears
- Click the icon again to stop recording
- The recognized text is automatically inserted at the cursor position
Hotword configuration example:

TypeScript
React
useState
useEffect

How to Use Image Uploads

Upload methods:
- Click the upload button to choose a file
- Drag an image directly into the upload area
- Use Ctrl+V to paste a screenshot from the clipboard
Supported formats: PNG, JPG, JPEG, WebP, GIF
Size limit: 5 MB by default (configurable)

Notes

Speech recognition:
- Microphone permission is required
- Use in a quiet environment when possible
- The maximum supported recording duration is 300 seconds by default (configurable)
Image uploads:
- Only common image formats are supported
- Pay attention to file size limits
- Uploaded images automatically receive a preview URL
Security considerations:
- Speech recognition credentials are stored on the backend
- Image uploads go through strict server-side validation
- HTTPS/WSS is recommended in production environments

Conclusion

After adding speech recognition and image uploads, the HagiCode user experience improved noticeably. Users can now interact with AI in a more natural way - speaking instead of typing, and sharing screenshots instead of describing everything manually. It feels like finally finding a more comfortable way to communicate.

While building this feature, we ran into the problem that browser WebSocket APIs do not support custom headers. In the end, we solved it with a backend proxy approach. That solution not only preserved security, but also laid the groundwork for integrating other authenticated WebSocket services later on.

The image upload component also benefits from supporting multiple upload methods, letting users choose whatever is most convenient in the moment. Clicking, dragging, or pasting all work, and each path gets the job done quickly.

“Typing is slower than talking, and talking is slower than a screenshot” fits the theme here quite well. If you are building a similar AI assistant product, I hope these experiences help, even if only a little.

References

If this article helped you:

Give it a like so more people can find it
Star us on GitHub: github.com/HagiCode-org/site
Visit the official site to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Try one-click installation: docs.hagicode.com/installation/docker-compose
Quick install for Desktop: hagicode.com/desktop/
Public beta has started - feel free to install and try it

Copyright Notice

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-03-31-voice-and-image-upload-multimodal-input/
Copyright: Unless otherwise stated, all blog posts on this site are licensed under BY-NC-SA. Please include the source when reposting.