CLI

5 posts with the tag “CLI”

How to Install and Use Hermes: A Quick Start from the Local CLI to Feishu Integration

Apr 13, 2026

If you want to install Hermes and start using it, the shortest path is really just three steps:

Run the official installation command
Start the CLI in your terminal with hermes
If you want to keep using it in Feishu, then configure hermes gateway setup

This article does not try to explain every Hermes capability all at once. Instead, it helps you complete the most important beginner loop first: install it, get it running, start using it, and then connect it to one of the most common messaging-platform scenarios.

What Hermes is and who it is for

Hermes Agent is an AI agent that you can use either from a local terminal or through a messaging-platform gateway.

For most developers, it has two common entry points:

CLI: Type hermes in your terminal to enter the interactive interface directly.
Messaging Gateway: Run hermes gateway, then chat with it from platforms such as Feishu, Telegram, Discord, and Slack.

If your goal right now is simply to get started quickly, do not reverse the order. Start with this path instead:

Install Hermes first
Verify it works from the CLI first
Then decide whether you want to connect a messaging platform

This makes problems easier to diagnose and is more suitable for people using Hermes for the first time.

What to know before installing Hermes

According to the Hermes README, the official quick-install path supports these environments:

Linux
macOS
WSL2
Android via Termux

Notes for Windows users

Hermes does not currently support running directly on native Windows. If you are using Windows, the recommended approach is to install WSL2 first and then run the installation command inside WSL2.

It is best to make this clear at the beginning, because many installation failures are not caused by the command itself, but by using an unsupported runtime environment.

How to install Hermes

The quick installation command provided in the Hermes README is:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

This command runs the official installation script and handles platform-specific initialization steps.

Reload your shell after installation

Once installation finishes, reload your shell environment first. The most common command is:

source ~/.bashrc

If you use zsh, you can use:

source ~/.zshrc

How to confirm Hermes is installed correctly

The most direct way to check is to run:

hermes

If you want additional confirmation that your configuration and dependencies are working, you can also run:

hermes doctor

hermes doctor is especially useful in these situations:

The command behaves abnormally after installation
Model configuration fails
The gateway fails to start
You are not sure whether your environment dependencies are complete

How to start using Hermes for the first time

Start with the CLI

If you just want to confirm as quickly as possible that Hermes works, the simplest method is:

hermes

This launches the interactive Hermes CLI. For first-time Hermes users, it is also the most recommended starting point, because you can verify the most essential things first:

Whether the command is actually available
Whether the current model configuration works properly
Whether the terminal toolchain is working correctly
Whether the interaction style matches what you need

These commands are enough for your first round of setup

The Hermes README lists several high-frequency commands, and together they form a practical first-use path:

hermes model
hermes tools
hermes config set
hermes setup
hermes update
hermes doctor

If you are not sure what each one does, remember them like this:

hermes model: choose or switch models
hermes tools: view and configure currently available tools
hermes config set: change specific configuration items
hermes setup: run the full initialization wizard once
hermes update: update Hermes
hermes doctor: troubleshoot problems

For beginners, the most practical order is usually:

Run hermes model first
If you want to configure all common options at once, then run hermes setup

The two most common ways to use Hermes

1. Use Hermes in the terminal as a daily development assistant

CLI mode is a good fit for these scenarios:

Ask questions directly while writing code locally
Inspect projects, edit files, and run commands
Do one-off debugging or review work
Collaborate continuously in the current working directory

Its biggest advantage is that it is the shortest path: no extra platform integration, no bot configuration to handle up front, and it is the best way to build your first set of usage habits.

2. Use Hermes through a messaging platform

If you want to chat with Hermes on platforms such as Feishu, Telegram, or Discord, you need to use the messaging gateway.

The most common entry commands are:

hermes gateway setup
hermes gateway

Specifically:

hermes gateway setup is used for interactive platform configuration
hermes gateway is used to start the gateway process

According to the official documentation, the gateway is a unified background process that connects your configured platforms, manages sessions, and handles features such as cron jobs.

Using Feishu as an example: how to connect Hermes to a messaging platform

If most of your daily work happens in Feishu, then Feishu/Lark is a very natural way to use Hermes.

The minimum viable integration path

The official documentation recommends this entry command for Feishu/Lark:

hermes gateway setup

After you run it, simply choose Feishu / Lark in the wizard.

The Feishu documentation describes two connection modes:

websocket: recommended
webhook: optional

If Hermes runs on your laptop, workstation, or private server, using websocket first is usually simpler because you do not need to expose a public callback URL.

If you configure it manually, at least know these variables

If you are not using the wizard and are writing the configuration manually, the Feishu documentation lists these core variables:

FEISHU_APP_ID=cli_xxx
FEISHU_APP_SECRET=***
FEISHU_DOMAIN=feishu
FEISHU_CONNECTION_MODE=websocket

FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy
FEISHU_HOME_CHANNEL=oc_xxx

Two of them deserve special attention:

FEISHU_ALLOWED_USERS: recommended, so not everyone who can reach the bot can use it directly
FEISHU_HOME_CHANNEL: lets you predefine a home chat to receive cron results or default notifications

Why Hermes sometimes does not reply in Feishu group chats

This detail is easy to miss: in Feishu group chats, Hermes does not respond to every message by default.

The official documentation clearly states:

In direct messages, Hermes responds to messages
In group chats, you must explicitly @ the bot before it will process the message

If you want to set a Feishu conversation as the home channel, you can also use this in the chat:

/set-home

Or define it in the configuration ahead of time:

FEISHU_HOME_CHANNEL=oc_xxx

The Hermes commands beginners should remember first

Whether you use Hermes in the CLI or on a messaging platform, remembering the following commands is already enough to get started:

/new or /reset: start a new session
/model: view or switch the model
/retry: retry the previous turn
/undo: undo the previous interaction
/compress: manually compress the context
/help: view help

If you mainly use Hermes on a messaging platform, remember one more:

/sethome or /set-home: set the current chat as the home channel

These commands cover the most common beginner-stage operations: restarting, adjusting, rolling back, checking, and continuing.

Frequently asked questions

Can Windows install Hermes directly?

No. The current official documentation clearly states that native Windows is not supported, and WSL2 is recommended.

What should I do if typing `hermes` does nothing after installation?

It is best to troubleshoot in this order:

Reload your shell first, for example with source ~/.bashrc
Run hermes again
If it is still abnormal, run hermes doctor

Why does the bot not reply in a Feishu group?

Check these three things first:

Whether you @ mentioned Hermes in the group
Whether FEISHU_ALLOWED_USERS restricts the current user
Whether the current group-chat policy allows handling group messages

According to the official Feishu documentation, explicitly using an @mention is required in group-chat scenarios.

Recommended next steps

If you simply want to start using Hermes as quickly as possible, this is the most recommended order:

Run the installation command first
Start with hermes in the local CLI first
Use hermes model and hermes setup to complete the basic configuration
If you want to keep using it in Feishu, then configure hermes gateway setup

If this article is the first part of a series, its best role is not to explain every advanced feature all at once, but to get users in the door first.

The following topics are better split into follow-up articles:

A complete Hermes Feishu integration guide
A guide to common Hermes slash commands
A guide to Hermes gateway configuration and troubleshooting

If you plan to keep creating Hermes content, this article can serve as the starting point for later posts, while you gradually build out the internal link structure.

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Apr 7, 2026

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

It is kind of funny when you think about it: the era of AI programming has arrived, and the Agent Skills we keep on hand are becoming more and more numerous. But along with that comes more and more hassle. This article is about how we used skillsbase to solve those problems.

Background

In the age of AI programming, developers need to maintain an increasing number of Agent Skills - reusable instruction sets that extend the capabilities of coding assistants such as Claude Code, OpenCode, and Cursor. However, as the number of skills grows, a practical problem gradually emerges:

It is not exactly a major problem, but once you have too many things, managing them becomes troublesome.

Skills are scattered across different locations, making management costly

Local skills are scattered in multiple places: ~/.agents/skills/, ~/.claude/skills/, ~/.codex/skills/.system/, and so on
Different locations may have naming conflicts, for example skill-creator existing in both the user directory and the system directory
There is no unified management entry point, which makes backup and migration difficult

This part is genuinely annoying. Sometimes you do not even know where a certain skill actually is. It feels like losing something and then struggling to find it.

Lack of a standardized maintenance workflow

Manually copying skills is error-prone and makes it difficult to trace their origins
Without a unified validation mechanism, there is no guarantee that the skill repository remains complete
During team collaboration, synchronizing and sharing a skill collection is difficult

Manual work is always prone to mistakes. Human memory is limited, after all. Who can remember where every single thing came from?

Failing to meet reproducibility requirements

When switching development machines, all skills need to be configured again
In CI/CD environments, the skill repository cannot be validated and synchronized automatically

Changing to a different computer means doing everything all over again. It feels, in a way, just like moving house - troublesome every single time. You have to adapt to the new environment and reconfigure everything again.

To address these pain points, we tried many different approaches: from manual copying to scripted automation, from directly managing directories to globally installing and then recovering files. Each approach had its own flaws. Some could not guarantee consistency, some polluted the environment, and some were hard to use in CI.

We definitely took quite a few detours.

In the end, we found a more elegant solution: skillsbase. The core idea behind this approach is to install and validate locally first, then convert the structure and write it into the repository, and finally uninstall the temporary files. This ensures that the repository contents match the actual installation result while avoiding pollution of the global environment.

It sounds simple when you put it that way, but we only figured it out after stepping into quite a few pitfalls.

About HagiCode

The solution shared in this article comes from our hands-on experience in the HagiCode project.

HagiCode is an AI coding assistant project. During development, we need to maintain a large number of Agent Skills to extend various coding capabilities. These real-world needs are exactly what pushed us to build the skillsbase toolset for standardized management of skill repositories.

This was not invented out of thin air. We were pushed into it by real needs. Once the number of skills grows, management naturally becomes necessary. When problems appear during management, solutions become necessary too. Step by step, that is how we got here.

If you are interested in HagiCode, you can visit the official website to learn more or check the source code on GitHub.

Analysis

Technical challenges

To build a maintainable skills collection repository, the following core problems need to be solved:

Unified namespace conflicts: when multiple sources contain skills with the same name, how do we avoid overwriting them?
Source traceability: how do we record the source of each skill for future updates and audits?
Synchronization and validation: how do we ensure that repository contents stay consistent with the actual installation results?
Automation integration: how do we integrate with CI/CD workflows to enable automatic synchronization and validation?

These problems may look simple, but every single one of them is a headache. Then again, what worthwhile work is ever easy?

Design trade-offs

Option 1: Copy directories directly

Pros: simple to implement Cons: cannot guarantee consistency with the actual installation result of the skills CLI

We did think about this approach. Later, however, we realized that the CLI may apply some preprocessing logic during installation. Direct copying skips that step. As a result, what you copy is not the same as what is actually installed, and that becomes a problem.

Option 2: Install globally and then recover

Pros: the installation process can be validated Cons: pollutes the execution environment, and it is hard to keep CI and local results consistent

This approach is even worse. A global installation pollutes the environment. More importantly, it is difficult to keep the CI environment consistent with the local environment, which leads to the classic “works on my machine, fails in CI” problem. Anyone who has dealt with that knows how painful it is.

Option 3: Local install -> convert -> uninstall (final solution)

This is the approach adopted by skillsbase:

First install skills into a temporary location with npx skills
Convert the directory structure and add source metadata
Write the result into the target repository
Finally uninstall the temporary files

This approach ensures that repository contents are consistent with the actual installation results seen by consumers, avoids polluting the global environment, standardizes the conversion process, and supports idempotent operations.

This solution was not obvious from the beginning either. We simply learned through enough trial and error what works and what does not.

Architecture decisions

Decision Item	Choice	Reason
Runtime	Node.js ESM	No build step required; `.mjs` is enough to orchestrate the file system
Configuration format	YAML (`sources.yaml`)	Highly readable and suitable for manual maintenance
Naming strategy	Namespace prefix	User skills keep their original names, while system skills receive the `system-` prefix
Workflow	`add` updates the manifest -> `sync` executes synchronization	A single synchronization engine avoids implementing the same rules twice
File management	Managed file markers	Add a comment header to support safe overwrites

These decisions all come down to one goal: making things simple. Simplicity wins in the end.

Solution

CLI architecture

The skillsbase CLI provides four core commands:

skillsbase
├── init          # Initialize repository structure
├── sync          # Synchronize skill content
├── add           # Add new skills
└── github_action # Generate GitHub Actions configuration

There are not many commands, but they are enough. A tool only needs to be useful.

Core workflow

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    init     │───▶│    add      │───▶│    sync     │───▶│github_action│
│ initialize  │    │ add source  │    │ sync content│    │ generate CI │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Take it one step at a time. No need to rush.

Synchronization flow design

sources.yaml -> parse sources -> npx skills install -> convert structure -> write to skills/ -> uninstall temporary files
                              ↓
                        .skill-source.json (source metadata)

This workflow is fairly clear. At least when I look at it, I can understand what each step is doing.

Repository structure

repos/skillsbase/
├── sources.yaml              # Source manifest (single source of truth)
├── skills/                   # Skills directory
│   ├── frontend-design/      # User skill
│   ├── skill-creator/        # User skill
│   └── system-skill-creator/ # System skill (with prefix)
├── scripts/
│   ├── sync-skills.mjs       # Synchronization script
│   └── validate-skills.mjs   # Validation script
├── docs/
│   └── maintainer-workflow.md # Maintainer documentation
└── .github/
    ├── workflows/
    │   └── skills-sync.yml   # CI workflow
    └── actions/
        └── skillsbase-sync/
            └── action.yml     # Reusable Action

There are quite a few files, but that is fine. Once the structure is organized clearly, maintenance becomes much easier.

Practice

Initialize a skills repository

# 1. Create an empty repository
mkdir repos/myskills && cd repos/myskills
git init

# 2. Initialize it with skillsbase
npx skillsbase init

# Output:
# [1/4] create manifest ................. done
# [2/4] create scripts .................. done
# [3/4] create docs ..................... done
# [4/4] create github workflow .......... done
#
# next: skillsbase add <skill-name>

This step generates a lot of files, but there is no need to worry - they are all generated automatically. After that, you can start adding skills.

Add skills

# Add a single skill (this automatically triggers synchronization)
npx skillsbase add frontend-design --source vercel-labs/agent-skills

# Add from a local source
npx skillsbase add documentation-writer --source /home/user/.agents/skills

# Output:
# source: first-party ......... updated
# target: skills/frontend-design ... synced
# status: 1 skill added, 0 removed

Adding a skill is very simple. One command is enough. Sometimes, though, you may hit unexpected issues such as poor network conditions or permission problems. Those are manageable - just take them one at a time.

Synchronize skills

# Perform synchronization (reconcile all sources)
npx skillsbase sync

# Only check for drift (do not modify files)
npx skillsbase sync --check

# Allow missing sources (CI scenario)
npx skillsbase sync --allow-missing-sources

During synchronization, the system checks every source defined in sources.yaml and reconciles them with the contents under the skills/ directory. If differences exist, it updates them; if there are no differences, it skips them. This prevents the “configuration changed but files did not” problem.

Generate CI configuration

# Generate workflow
npx skillsbase github_action --kind workflow

# Generate action
npx skillsbase github_action --kind action

# Generate everything
npx skillsbase github_action --kind all

The CI configuration is generated automatically as well. You still need to adjust some details yourself, such as trigger conditions and runtime environments, but that is not difficult.

`sources.yaml` configuration example

# Skills root directory configuration
skillsRoot: skills/
metadataFile: .skill-source.json

# Source definitions
sources:
  # First-party: local user skills
  first-party:
    type: local
    path: /home/user/.agents/skills
    naming: original  # Keep original name
    includes:
      - documentation-writer
      - frontend-design
      - skill-creator

  # System: skills provided by the system
  system:
    type: local
    path: /home/user/.codex/skills/.system
    naming: prefix-system  # Add system- prefix
    includes:
      - imagegen
      - openai-docs
      - skill-creator  # Becomes system-skill-creator

  # Remote: third-party repository
  vercel:
    type: remote
    url: vercel-labs/agent-skills
    naming: original
    includes:
      - web-design-guidelines

This configuration file is the core of the entire system. All sources are defined here. Change this file, and the next synchronization will apply the new state. In that sense, it is truly a “single source of truth.”

`.skill-source.json` metadata example

{
  "source": "first-party",
  "originalPath": "/home/user/.agents/skills/documentation-writer",
  "originalName": "documentation-writer",
  "targetName": "documentation-writer",
  "syncedAt": "2026-04-07T00:00:00.000Z",
  "version": "unknown"
}

Every skill directory contains this file, recording its source information. That way, when something goes wrong later, you can quickly locate where it came from and when it was synchronized.

Security and validation

# Validate repository structure
node scripts/validate-skills.mjs

# Validate with the skills CLI
npx skills add . --list

# Check for updates
npx skills check

Validation is one of those things that can feel both important and optional. Still, for the sake of safety, it never hurts to run it from time to time. After all, you never know when something unexpected might happen.

GitHub Actions integration

name: Skills Sync

on:
  push:
    paths:
      - 'sources.yaml'
      - 'skills/**'
  workflow_dispatch:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - name: Validate repository
        run: |
          npx skills add . --list
          node scripts/validate-skills.mjs
      - name: Sync check
        run: npx skillsbase sync --check

Once CI integration is in place, every change to sources.yaml or the skills/ directory automatically triggers validation. That prevents the situation where changes were made locally but synchronization was forgotten.

Best practices

Handle naming conflicts: add the system- prefix to system skills consistently. This keeps every skill available while avoiding naming conflicts.
Idempotent operations: all commands support repeated execution, and running sync multiple times does not produce side effects. This is especially important in CI.
Managed files: generated files include the # Managed by skillsbase CLI comment, making them easy to identify and manage. These files can be safely overwritten, and manual modifications are not preserved.
Non-interactive mode: CI environments use deterministic behavior by default, so interactive prompts do not interrupt execution. All configuration is declared through sources.yaml.
Source traceability: every skill has a .skill-source.json file recording its source information, making troubleshooting much faster.

Team collaboration

# Team members install the shared skills repository
npx skills add your-org/myskills -g --all

# Clone locally and validate
git clone https://github.com/your-org/myskills.git
cd myskills
npx skills add . --list

By managing the skills repository with Git, team members can easily synchronize their skill collection and ensure that everyone uses the same versions of tools and configuration.

This is especially useful in team collaboration. You no longer run into situations where “it works for me but not for you.” Once the environment is unified, half the problems disappear.

Conclusion

The core value of using skillsbase to maintain a skills collection repository lies in the following:

Security: source validation, conflict detection, and managed file protection
Maintainability: a unified entry point, idempotent operations, and configuration-as-documentation
Standardization: a unified directory structure, naming conventions, and metadata format
Automation: CI/CD integration, automatic synchronization, and automatic validation

With this approach, developers can manage their own Agent Skills the same way they manage npm packages, building a reproducible, shareable, and maintainable skills repository system.

The tools and workflow shared in this article are exactly what we refined through real mistakes and real optimization while building HagiCode. If you find this approach valuable, that is a good sign that our engineering direction is the right one - and that HagiCode itself is worth your attention as well.

After all, good tools deserve to be used by more people.

References

skillsbase repository: github.com/HagiCode-org/skillsbase
HagiCode official website: hagicode.com
HagiCode source code: github.com/HagiCode-org/site
Installation guide: docs.hagicode.com/installation/docker-compose
Quick desktop installation: hagicode.com/desktop/

If this article helped you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Try one-click installation: docs.hagicode.com/installation/docker-compose
Public beta has started, and you are welcome to try it out

This article was first published on the HagiCode Blog.

Copyright Notice

Thank you for reading. If you found this article useful, you are welcome to like it, save it, and share it in support. This content was created with AI-assisted collaboration, and the final version was reviewed and confirmed by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-04-07-why-use-skillsbase-for-skills-repository/
Copyright notice: Unless otherwise stated, all blog posts on this site are licensed under BY-NC-SA. Please indicate the source when reposting.

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Mar 28, 2026

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Background

In the Hagicode project, users can choose from multiple CLI tools to drive AI programming assistants, including Claude Code CLI, GitHub Copilot, OpenCode CLI, Codebuddy CLI, Hermes CLI, and more. These CLI tools are general-purpose AI programming tools on their own, but through Hagicode’s abstraction layer, they can flexibly connect to different AI model providers.

Zhipu AI (ZAI) provides an interface compatible with the Anthropic Claude API, allowing these CLI tools to directly use domestic GLM series models. Among them, GLM-5.1 is Zhipu’s latest large language model release, with significant improvements over GLM-5.0.

Hagicode’s CLI abstraction architecture

Hagicode defines 11 CLI provider types through the AIProviderType enum, covering mainstream AI programming CLI tools:

public enum AIProviderType
{
    ClaudeCodeCli = 0,    // Claude Code CLI
    CodexCli = 1,          // GitHub Copilot Codex
    GitHubCopilot = 2,     // GitHub Copilot
    CodebuddyCli = 3,     // Codebuddy CLI
    OpenCodeCli = 4,      // OpenCode CLI
    IFlowCli = 5,         // IFlow CLI
    HermesCli = 6,        // Hermes CLI
    QoderCli = 7,         // Qoder CLI
    KiroCli = 8,          // Kiro CLI
    KimiCli = 9,          // Kimi CLI
    GeminiCli = 10,       // Gemini CLI
}

Each CLI has corresponding model parameter configuration and supports the model and reasoning parameters:

private static readonly IReadOnlyDictionary<AIProviderType, IReadOnlyList<string>> ManagedModelParameterKeysByProvider =
    new Dictionary<AIProviderType, IReadOnlyList<string>>
    {
        [AIProviderType.ClaudeCodeCli] = ["model", "reasoning"],
        [AIProviderType.CodexCli] = ["model", "reasoning"],
        [AIProviderType.OpenCodeCli] = ["model", "reasoning"],
        [AIProviderType.HermesCli] = ["model", "reasoning"],
        [AIProviderType.CodebuddyCli] = ["model", "reasoning"],
        [AIProviderType.QoderCli] = ["model", "reasoning"],
        [AIProviderType.KiroCli] = ["model", "reasoning"],
        [AIProviderType.GeminiCli] = ["model"],  // Gemini does not support the reasoning parameter
        // ...
    };

GLM model support system

Hagicode’s Secondary Professions Catalog defines complete support for the GLM model series:

Model ID	Name	Default Reasoning	Compatible CLI Families
`glm-4.7`	GLM 4.7	high	claude, codebuddy, hermes, qoder, kiro
`glm-5`	GLM 5	high	claude, codebuddy, hermes, qoder, kiro
`glm-5-turbo`	GLM 5 Turbo	high	claude, codebuddy, hermes, qoder, kiro
`glm-5.0`	GLM 5.0 (Legacy)	high	claude, codebuddy, hermes, qoder, kiro
`glm-5.1`	GLM 5.1	high	claude, codebuddy, hermes, qoder, kiro

Key differences between GLM-5.1 and GLM-5.0

From the implementation in AcpSessionModelBootstrapper.cs, we can clearly see the differences between GLM-5.1 and GLM-5.0:

Standalone implementation of GLM-5.1

GLM-5.1 is a standalone new model identifier with no legacy handling logic:

private const string Glm51ModelValue = "glm-5.1";

Definition in the Secondary Professions Catalog:

{
  "id": "secondary-glm-5-1",
  "name": "GLM 5.1",
  "family": "anthropic",
  "summary": "hero.professionCopy.secondary.glm51.summary",
  "sourceLabel": "hero.professionCopy.sources.aiSharedAnthropicModel",
  "sortOrder": 64,
  "supportsImage": true,
  "compatiblePrimaryFamilies": [
    "claude",
    "codebuddy",
    "hermes",
    "qoder",
    "kiro"
  ],
  "defaultParameters": {
    "model": "glm-5.1",
    "reasoning": "high"
  }
}

Model provider configuration

Zhipu AI (ZAI)

Zhipu AI provides the most complete GLM model support:

{
  "providerId": "zai",
  "name": "智谱 AI",
  "description": "智谱 AI 提供的 Claude API 兼容服务",
  "category": "china-providers",
  "apiUrl": {
    "codingPlanForAnthropic": "https://open.bigmodel.cn/api/anthropic"
  },
  "recommended": true,
  "region": "cn",
  "defaultModels": {
    "sonnet": "glm-4.7",
    "opus": "glm-5",
    "haiku": "glm-4.5-air"
  },
  "supportedModels": [
    "glm-4.7",
    "glm-5",
    "glm-4.5-air",
    "qwen3-coder-next",
    "qwen3-coder-plus"
  ],
  "features": ["experimental-agent-teams"],
  "authTokenEnv": "ANTHROPIC_AUTH_TOKEN",
  "referralUrl": "https://www.bigmodel.cn/claude-code?ic=14BY54APZA",
  "documentationUrl": "https://open.bigmodel.cn/dev/api"
}

Features:

Supports the widest variety of GLM model variants
Provides default mapping across the Sonnet/Opus/Haiku tiers
Supports the experimental-agent-teams feature

Using GLM-5.1 in different CLIs

1. Claude Code CLI + GLM-5.1

Claude Code CLI is one of Hagicode’s core CLIs and is configured through the Hero configuration system:

{
  "primaryProfessionId": "profession-claude-code",
  "secondaryProfessionId": "secondary-glm-5-1",
  "model": "glm-5.1",
  "reasoning": "high"
}

Corresponding HeroEquipmentCatalogItem configuration:

{
  id: 'secondary-glm-5-1',
  name: 'GLM 5.1',
  family: 'anthropic',
  kind: 'model',
  primaryFamily: 'claude',
  compatiblePrimaryFamilies: ['claude', 'codebuddy', 'hermes', 'qoder', 'kiro'],
  defaultParameters: {
    model: 'glm-5.1',
    reasoning: 'high'
  }
}

2. OpenCode CLI + GLM-5.1

OpenCode CLI is the most flexible CLI and supports specifying any model in the provider/model format:

Method 1: Use the ZAI provider prefix

{
  "primaryProfessionId": "profession-opencode",
  "model": "zai/glm-5.1",
  "reasoning": "high"
}

Method 2: Use the model ID directly

{
  "model": "glm-5.1"
}

Method 3: Frontend configuration UI

In HeroModelEquipmentForm.tsx, OpenCode CLI has a dedicated placeholder hint:

const OPEN_CODE_MODEL_PLACEHOLDER = 'myprovider/glm-4.7';

const modelPlaceholder = primaryProviderType === PCode_Models_AIProviderType.OPEN_CODE_CLI
  ? OPEN_CODE_MODEL_PLACEHOLDER
  : 'gpt-5.4';

Users can enter:

zai/glm-5.1
glm-5.1

OpenCode CLI model parsing logic:

internal OpenCodeModelSelection? ResolveModelSelection(string? rawModel)
{
    var normalized = NormalizeOptionalValue(rawModel);
    if (normalized == null) return null;

    var slashIndex = normalized.IndexOf('/');
    if (slashIndex < 0)
    {
        // No slash: use the model ID directly
        return new OpenCodeModelSelection {
            ProviderId = string.Empty,
            ModelId = normalized,
        };
    }

    // Slash exists: parse the provider/model format
    var providerId = normalized[..slashIndex].Trim();
    var modelId = normalized[(slashIndex + 1)..].Trim();

    return new OpenCodeModelSelection {
        ProviderId = providerId,
        ModelId = modelId,
    };
}

3. Codebuddy CLI + GLM-5.1

Codebuddy CLI has dedicated legacy handling logic:

{
  "primaryProfessionId": "profession-codebuddy",
  "model": "glm-5.1",
  "reasoning": "high"
}

Note: Codebuddy retains special handling for GLM-5.0 and does not use legacy normalization:

return !string.Equals(providerName, "CodebuddyCli", StringComparison.OrdinalIgnoreCase)
       && string.Equals(normalizedModel, LegacyGlm5TurboModelValue, StringComparison.OrdinalIgnoreCase)
    ? Glm5TurboModelValue
    : normalizedModel;
// For CodebuddyCli, glm-5.0 is not normalized to glm-5-turbo

Environment variable configuration

Using Zhipu AI ZAI

# Set the API key
export ANTHROPIC_AUTH_TOKEN="***"

# Optional: specify the API endpoint (ZAI uses this endpoint by default)
export ANTHROPIC_BASE_URL="https://open.bigmodel.cn/api/anthropic"

Using Alibaba Cloud DashScope

# Set the API key
export ANTHROPIC_AUTH_TOKEN="your-a...-key"

# Specify the Alibaba Cloud endpoint
export ANTHROPIC_BASE_URL="https://coding.dashscope.aliyuncs.com/apps/anthropic"

Get an API key

Zhipu AI: https://www.bigmodel.cn/claude-code?ic=14BY54APZA
Alibaba Cloud: https://www.aliyun.com/benefit/ai/aistar?userCode=vmx5szbq

Improvement advantages of GLM-5.1

Compared with GLM-5.0, GLM-5.1 brings the following significant improvements:

1. Better reasoning capability

According to Zhipu’s official release information, improvements in GLM-5.1 include:

Stronger code understanding: More accurate analysis of complex code structures
Longer context comprehension: Supports longer conversational context
Enhanced tool calling: Higher success rate for MCP tool calls
Output stability: Reduces randomness and hallucinations

2. Comprehensive multi-CLI compatibility

GLM-5.1 covers all mainstream CLIs supported by Hagicode:

compatiblePrimaryFamilies: [
  "claude",      // Claude Code CLI
  "codebuddy",   // Codebuddy CLI
  "hermes",      // Hermes CLI
  "qoder",       // Qoder CLI
  "kiro"         // Kiro CLI
]

Notes

1. API key configuration

Make sure the ANTHROPIC_AUTH_TOKEN environment variable is set correctly. It is the required credential for every CLI to connect to the model.

2. Model availability

GLM-5.1 needs to be enabled by the corresponding model provider:

The Zhipu AI ZAI platform supports it by default
Alibaba Cloud DashScope may require a separate application

3. OpenCode CLI format

When using the provider/model format, make sure the provider ID is correct:

Zhipu AI: zai or zhipuai
Alibaba Cloud: aliyun or dashscope

4. Reasoning parameter

high is recommended for the best code generation results
Gemini CLI does not support the reasoning parameter and will ignore this configuration automatically

Conclusion

Through a unified abstraction layer, Hagicode enables flexible integration between GLM-5.1 and multiple CLIs. Developers can choose the CLI tool that best fits their preferences and usage scenarios, then use the latest GLM-5.1 model through simple configuration.

As Zhipu’s latest model version, GLM-5.1 offers clear improvements over GLM-5.0:

An independent version identifier with no legacy burden
Stronger reasoning and code understanding
Broad multi-CLI compatibility
Flexible reasoning level configuration

With the correct environment variables and Hero equipment configured, users can fully unlock the power of GLM-5.1 across different CLI environments.

Continue With HagiCode

If you want to put GLM-5.1, multi-CLI orchestration, and HagiCode’s configuration model into real use, these are the fastest entry points:

Track the main project and latest implementation progress on GitHub: github.com/HagiCode-org/site
Visit the official site to understand the product direction, capability boundaries, and install options: hagicode.com
Start with the Docker Compose guide, then switch models and CLIs in a real environment: docs.hagicode.com/installation/docker-compose
If you prefer a local desktop workflow, begin with the Desktop entry point: hagicode.com/desktop/

Once you compare Kimi, Claude Code, OpenCode, and other CLIs inside the same abstraction layer, questions about model switching, parameter mapping, and engineering boundaries tend to become much easier to reason about.

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Mar 20, 2026

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

During the development of the HagiCode project, we needed to integrate multiple AI coding assistant CLIs at the same time, including Claude Code, Codex, and CodeBuddy. Each CLI has different interfaces, parameters, and output formats, and the repeated integration code made the project harder and harder to maintain. In this article, we share how we built a unified abstraction layer with HagiCode.Libs to solve this engineering pain point. You could also say it is simply some hard-earned experience gathered from the pitfalls we have already hit.

Background

The market for AI coding assistants is quite lively now. Besides Claude Code, there are also OpenAI’s Codex, Zhipu’s CodeBuddy, and more. As an AI coding assistant project, HagiCode needs to integrate these different CLI tools across multiple subprojects, including desktop, backend, and web.

At first, the problem was manageable. Integrating one CLI was only a few hundred lines of code. But as the number of CLIs we needed to support kept growing, things started to get messy.

Each CLI has its own command-line argument format, different environment variable requirements, and a wide variety of output formats. Some output JSON, some output streaming JSON, and some output plain text. On top of that, there are cross-platform compatibility issues. Executable discovery and process management work very differently between Windows and Unix systems, so code duplication kept increasing. In truth, it was just a bit more Ctrl+C and Ctrl+V, but maintenance quickly became painful.

The most frustrating part was that every time we wanted to add support for a new CLI capability, we had to change the same code in several projects. That approach was clearly not sustainable in the long run. Code has a temper too; duplicate it too many times and it starts causing trouble.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project that needs to maintain multiple subprojects at the same time, including a frontend VSCode extension, backend AI services, and a cross-platform desktop client. In a way, it was exactly this complex, multi-language, multi-platform environment that led to the birth of HagiCode.Libs. You could say we were forced into it, and so be it.

Analysis: Finding Common Ground

Although these AI coding assistant CLIs each have their own characteristics, from a technical perspective they share several obvious traits:

Similar interaction patterns: they all start a CLI process, send a prompt, receive streaming responses, parse messages, and then either end or continue the session. At the end of the day, the whole flow follows the same basic mold.

Similar configuration needs: they all need API key authentication, working directory setup, model selection, tool permission control, and session management. After all, everyone is making a living from APIs; the differences are mostly a matter of flavor.

The same cross-platform challenges: they all need to solve executable path resolution (claude vs claude.exe vs /usr/local/bin/claude), process startup and environment variable handling, shell command escaping, and argument construction. Cross-platform work is painful no matter how you describe it. Only people who have stepped into the traps really understand the difference between Windows and Unix.

Based on this analysis, we needed a unified abstraction layer that could provide a consistent interface, encapsulate cross-platform CLI discovery logic, handle streaming output parsing, and support both dependency injection and non-DI scenarios. It is the kind of problem that makes your head hurt just thinking about it, but you still have to face it. After all, it is our own project, so we have to finish it even if we have to cry our way through it.

Solution: HagiCode.Libs

We created HagiCode.Libs, a lightweight .NET 10 library workspace released under the MIT license and now published on GitHub. It may not be some world-shaking masterpiece, but it is genuinely useful for solving real problems.

Project structure

HagiCode.Libs/
├── src/
│   ├── HagiCode.Libs.Core/           # Core capabilities
│   │   ├── Discovery/                 # CLI executable discovery
│   │   ├── Process/                   # Cross-platform process management
│   │   ├── Transport/                 # Streaming message transport
│   │   └── Environment/               # Runtime environment resolution
│   ├── HagiCode.Libs.Providers/       # Provider implementations
│   │   ├── ClaudeCode/                # Claude Code provider
│   │   ├── Codex/                     # Codex provider
│   │   └── Codebuddy/                 # CodeBuddy provider
│   ├── HagiCode.Libs.ConsoleTesting/  # Testing framework
│   ├── HagiCode.Libs.ClaudeCode.Console/
│   ├── HagiCode.Libs.Codex.Console/
│   └── HagiCode.Libs.Codebuddy.Console/
└── tests/                             # xUnit tests

Design goals

When designing HagiCode.Libs, we followed a few principles. They all came from lessons learned the hard way:

Zero heavy framework dependencies: it does not depend on ABP or any other large framework, which keeps it lightweight. These days, the fewer dependencies you have, the fewer headaches you get. Most people have already been beaten up by dependency hell at least once.

Cross-platform support: native support for Windows, macOS, and Linux, without writing separate code for different platforms. One codebase that runs everywhere is a pretty good thing.

Streaming processing: CLI output is handled with asynchronous streams, which fits modern .NET programming patterns much better. Times change, and async is king.

Flexible integration: it supports dependency injection scenarios while also allowing direct instantiation. Different people have different preferences, so we wanted it to be convenient either way.

How to use it

Through dependency injection

If your project already uses dependency injection, such as ASP.NET Core or the generic host, you can integrate it directly. It is a small thing, but a well-behaved one:

using HagiCode.Libs.Providers;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();
services.AddHagiCodeLibs();

await using var provider = services.BuildServiceProvider();
var claude = provider.GetRequiredService<ICliProvider<ClaudeCodeOptions>>();

var options = new ClaudeCodeOptions
{
    ApiKey = "your-api-key",
    Model = "claude-sonnet-4-20250514"
};

await foreach (var message in claude.ExecuteAsync(options, "Hello, Claude!"))
{
    Console.WriteLine($"{message.Type}: {message.Content}");
}

Direct instantiation

If you are writing a simple script or working in a non-DI scenario, creating an instance directly also works. Put simply, it depends on your personal preference:

var claude = new ClaudeCodeProvider();
var options = new ClaudeCodeOptions
{
    ApiKey = "sk-ant-xxx",
    Model = "claude-sonnet-4-20250514"
};

await foreach (var message in claude.ExecuteAsync(options, "Help me write a quicksort"))
{
    // Handle messages
}

Both approaches use the same underlying implementation, so you can choose the integration style that best fits your project. There is no universal right answer in this world. What suits you is the best option. It may sound cliché, but it is true.

Practical experience

1. Dedicated testing consoles

Each provider has its own dedicated testing console project, making it easier to validate the integration independently. Testing is one of those things where if you are going to do it, you should do it properly:

# Claude Code tests
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-provider
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-all claude

# CodeBuddy tests
dotnet run --project src/HagiCode.Libs.Codebuddy.Console -- --test-provider codebuddy-cli

# Codex tests
dotnet run --project src/HagiCode.Libs.Codex.Console -- --test-provider codex-cli

The testing scenarios cover several key cases:

Ping: health check to confirm the CLI is available
Simple Prompt: basic prompt test
Complex Prompt: multi-turn conversation test
Session Restore/Resume: session recovery test
Repository Analysis: repository analysis test

This standalone testing console design is especially useful during debugging because it lets us quickly identify whether the issue is in the HagiCode.Libs layer or in the CLI itself. Debugging is really just about finding where the problem is. Once the direction is right, you are already halfway there.

2. Cross-platform CI/CD validation

Cross-platform compatibility is one of the core goals of HagiCode.Libs. We configured the GitHub Actions workflow .github/workflows/cli-discovery-cross-platform.yml to run real CLI discovery validation across ubuntu-latest, macos-latest, and windows-latest.

This ensures that every code change does not break cross-platform compatibility. During local development, you can also reproduce it with the following commands. After all, you cannot ask CI to take the blame for everything. Your local environment should be able to run it too:

npm install --global @anthropic-ai/claude-code@2.1.79
HAGICODE_REAL_CLI_TESTS=1 dotnet test --filter "Category=RealCli"

3. Message stream processing

HagiCode.Libs uses asynchronous streams to process CLI output. Compared with traditional callback or event-based approaches, this fits the asynchronous programming style of modern .NET much better. In the end, this is simply how technology moves forward, whether anyone likes it or not:

public async IAsyncEnumerable<CliMessage> ExecuteAsync(
    TOptions options,
    string prompt,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    // Start the CLI process
    // Parse streaming JSON output
    // Yield the CliMessage sequence
}

The message types include:

user: user message
assistant: assistant response
tool_use: tool invocation
result: session end

This design lets callers handle streaming output flexibly, whether for real-time display, buffered post-processing, or forwarding to other services. Why worry whether the sky is sunny or cloudy? What matters is that once the idea opens up, you can use it however you like.

4. Git repository exploration

The HagiCode.Libs.Exploration module provides Git repository discovery and status checking, which is especially useful in repository analysis scenarios. This feature was also born out of necessity, because HagiCode needs to analyze repositories:

// Discover Git repositories
var repositories = await GitRepositoryDiscovery.DiscoverAsync("/path/to/search");

// Get repository information
var info = await GitRepository.GetInfoAsync(repoPath);
Console.WriteLine($"Branch: {info.Branch}, Remote: {info.RemoteUrl}");
Console.WriteLine($"Has uncommitted changes: {info.HasUncommittedChanges}");

HagiCode’s code analysis capabilities use this module to identify project structure and Git status. It is a good example of making full use of what we built.

Things to note

Based on our practice in the HagiCode project, there are several points that deserve special attention. They are all real issues that need to be handled carefully:

API key security: do not hardcode API keys in your code. Use environment variables or configuration management instead. HagiCode.Libs supports passing configuration through Options objects, making it easier to integrate with different configuration sources. When it comes to security, there is no such thing as being too careful.

CLI version pinning: in CI/CD, we pin specific versions, such as @anthropic-ai/claude-code@2.1.79, to reduce uncertainty caused by version drift. It is also a good idea to use fixed versions in local development. Versioning can be painful. If you do not pin versions, the problem will teach you a lesson very quickly.

Test categorization: default tests use fake providers to keep them deterministic and fast, while real CLI tests must be enabled explicitly. This gives CI fast feedback while still allowing real-environment validation when needed. Striking that balance is never easy. Speed and stability always require trade-offs.

Session management: different CLIs have different session recovery mechanisms. Claude Code uses the .claude/ directory to store sessions, while Codex and CodeBuddy each have their own approaches. When using them, be sure to check their respective documentation and understand the details of their session persistence mechanisms. There is no harm in understanding it clearly.

Summary

HagiCode.Libs is the unified abstraction layer we built during the development of HagiCode to solve the repeated engineering work involved in multi-CLI integration. By providing a consistent interface, encapsulating cross-platform details, and supporting flexible integration patterns, it greatly reduces the engineering complexity of integrating multiple AI coding assistants. Much may fade away, but the experience remains.

If you also need to integrate multiple AI CLI tools in your project, or if you are interested in cross-platform process management and streaming message handling, feel free to check it out on GitHub. The project is released under the MIT license, and contributions and feedback are welcome. In the end, it is a happy coincidence that we met here, so since you are already here, we might as well become friends.

The approach shared in this article was shaped by real pitfalls and real optimization work inside HagiCode. What else could we do? Running into pitfalls is normal. If you think this solution is valuable, then perhaps our engineering work is doing all right. And HagiCode itself may also be worth your attention. You might even find a pleasant surprise.

References

HagiCode.Libs GitHub: github.com/HagiCode-org/Hagicode.Libs
HagiCode main project: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
Claude Code official documentation: docs.anthropic.com

If this article helped you:

Give it a like so more people can see it
Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Try the one-click installation: docs.hagicode.com/installation/docker-compose
Quick install for the Desktop app: hagicode.com/desktop/
Public beta has started, and you are welcome to install and try it

Copyright notice

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Author: newbe36524
Original link: https://docs.hagicode.com/blog/2026-03-20-hagicode-libs-unified-cli-integration/
Copyright statement: Unless otherwise stated, all articles on this blog are licensed under BY-NC-SA. Please indicate the source when reposting.

ImgBin CLI Tool Design: HagiCode's Image Asset Management Approach

Mar 13, 2026

ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach

This article explains how to build an automatable image asset pipeline from scratch, covering CLI tool design, a Provider Adapter architecture, and metadata management strategies.

Background

Honestly, I did not expect image asset management to keep us tangled up for this long.

During HagiCode development, we ran into a problem that looked simple on the surface but was surprisingly thorny in practice: generating and managing image assets. In a way, it was like the dramas of adolescence - calm on the outside, turbulent underneath.

As the project accumulated more documentation and marketing materials, we needed a large number of supporting images. Some had to be AI-generated, some had to be selected from an existing asset library, and others needed AI recognition plus automatic labeling. The problem was that all of this had long been handled through scattered scripts and manual steps. Every time we generated an image, we had to run a script by hand, organize metadata by hand, and create thumbnails by hand. That alone was annoying enough, but the bigger issue was that everything was scattered everywhere. When we wanted to find something, we could not. When we needed to reuse something, we could not.

The pain points were concrete:

No unified entry point: the logic for image generation was spread across different scripts, so batch execution was basically impossible.
Missing metadata: generated images had no unified metadata.json, which meant no reliable searchability or traceability.
High manual organization cost: titles and tags had to be sorted out one by one by hand, which was inefficient.
No automation: automatically generating visual assets in a CI/CD pipeline? Not a chance.

We did think about just leaving it alone. But projects still need to move forward. Since we could not avoid the problem, we figured we might as well solve it. So we decided to upgrade ImgBin from a set of scattered scripts into an image asset pipeline that can be executed automatically. Some problems, after all, do not disappear just because you look away.

About HagiCode

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI coding assistant project that simultaneously maintains multiple components, including a VSCode extension, backend AI services, and a cross-platform desktop client. In a complex, multilingual, cross-platform environment like this, standardized image asset management becomes a key part of improving development efficiency.

You could say this was one of those small growing pains in HagiCode’s journey. Every project has moments like that: a minor issue that looks insignificant, yet somehow manages to take up half the day.

HagiCode’s build system is based on the TypeScript + Node.js ecosystem, so ImgBin naturally adopted the same tech stack to keep the project technically consistent. Once you are used to one stack, switching to something else just feels like unnecessary trouble.

Core Design

Overall Architecture

ImgBin uses a layered architecture that cleanly separates CLI commands, application services, third-party API adapters, and the infrastructure layer:

Component hierarchy
├── CLI Entry (cli.ts)              Global argument parsing, command routing
├── Commands (commands/*)           generate | batch | annotate | thumbnail
├── Application Services            job-runner | metadata | thumbnail | asset-writer
├── Provider Adapters               image-api-provider | vision-api-provider
└── Infrastructure Layer            config | logger | paths | schema

The benefit of this layered design is clear responsibility boundaries. It also makes testing easier because external dependencies can be mocked cleanly. In practice, it just means each layer does its own job without getting in the way of the others, so when something breaks, it is easier to figure out why.

Single-Asset Directory Model

ImgBin uses a model of “one asset, one directory.” Every time an image is generated, it creates a structure like this:

library/
└── 2026-03/
    └── orange-dashboard/
        ├── original.png      # Original image
        ├── thumbnail.webp    # 512x512 thumbnail
        └── metadata.json     # Structured metadata

The advantages of this model are:

Self-contained: all files for a single asset live in the same directory, making migration and backup convenient.
Traceable: metadata.json makes it possible to trace generation time, prompt, model, and other details.
Extensible: if more variants are needed later, such as thumbnails in multiple sizes, we can simply add new files in the same directory.

Beautiful things do not always need to be possessed. Sometimes it is enough that they remain beautiful, and that you can quietly appreciate them. That may sound a little far afield, but the logic still holds here: once images are kept together, they are more pleasant to look at and much easier to find.

Layered Metadata Storage

metadata.json is the core of the entire system. It uses a layered storage strategy that separates fields into three categories:

{
  "schemaVersion": 2,
  "assetId": "orange-dashboard",
  "slug": "orange-dashboard",
  "title": "Orange Dashboard",
  "tags": ["dashboard", "hero", "orange"],

  "source": { "type": "generated" },

  "paths": {
    "assetDir": "library/2026-03/orange-dashboard",
    "original": "original.png",
    "thumbnail": "thumbnail.webp"
  },

  "generated": {
    "prompt": "orange dashboard for docs hero",
    "provider": "azure-openai-image-api",
    "model": "gpt-image-1.5"
  },

  "recognized": {
    "title": "Orange Dashboard",
    "tags": ["dashboard", "ui", "orange"],
    "description": "A modern orange dashboard with charts and metrics"
  },

  "status": {
    "generation": "succeeded",
    "recognition": "succeeded",
    "thumbnail": "succeeded"
  },

  "timestamps": {
    "createdAt": "2026-03-11T04:01:19.570Z",
    "updatedAt": "2026-03-11T04:02:09.132Z"
  }
}

generated: records the original information from image generation, such as the prompt, provider, and model.
recognized: stores AI recognition results, such as auto-generated titles, tags, and descriptions.
manual: stores manually curated results. Data in this area has the highest priority and will not be overwritten by AI recognition.

This layered strategy resolves one of our earlier core conflicts: when AI recognition and manual curation disagree, which one should win? The answer is manual input. AI recognition is there to assist, not to decide. That question also became clearer over time - machines are still machines, and in the end, people still need to make the call.

Provider Adapter Pattern

Another core part of ImgBin is the Provider Adapter pattern. We abstract external APIs behind a unified interface so that even if we switch AI service providers, we do not need to change the business logic.

In a way, it is a bit like relationships - outward appearances can change, but what matters is that the inner structure stays the same. Once the interface is fixed, the internal implementation can vary freely.

Image Generation Provider

interface ImageGenerationProvider {
  // Generate an image and return its Buffer
  generate(options: GenerateOptions): Promise<Buffer>;

  // Get the list of supported models
  getSupportedModels(): Promise<string[]>;
}

interface GenerateOptions {
  prompt: string;
  model?: string;
  size?: '1024x1024' | '1792x1024' | '1024x1792';
  quality?: 'standard' | 'hd';
  format?: 'png' | 'webp' | 'jpeg';
}

Vision Recognition Provider

interface VisionRecognitionProvider {
  // Recognize image content and return structured metadata
  recognize(imageBuffer: Buffer): Promise<RecognitionResult>;

  // Get the list of supported models
  getSupportedModels(): Promise<string[]>;
}

interface RecognitionResult {
  title?: string;
  tags: string[];
  description?: string;
  confidence: number;
}

The advantages of this interface design are:

Testable: in unit tests, we can pass in mock providers instead of making real external API calls.
Extensible: adding a new provider only requires implementing the interface; caller code does not need to change.
Replaceable: production can use Azure OpenAI while testing can use a local model, with configuration being the only thing that changes.

Sometimes project work feels like that too. On the surface it looks like we just swapped an API, but the internal logic remains exactly the same, and that makes the whole thing a lot less scary.

CLI Command Design

ImgBin provides four core commands to cover different usage scenarios:

generate: single-image generation

# Simplest usage
imgbin generate --prompt "orange dashboard for docs hero"

# Generate a thumbnail and AI annotations at the same time
imgbin generate --prompt "orange dashboard" --annotate --thumbnail

# Specify an output directory
imgbin generate --prompt "orange dashboard" --output ./library

batch: batch jobs

Batch jobs are defined through YAML or JSON manifest files, which makes them suitable for CI/CD workflows:

defaults:
  annotate: true
  thumbnail: true
  libraryRoot: ./library

jobs:
  - prompt: "orange dashboard hero"
    slug: orange-dashboard
    tags: [dashboard, hero, orange]

  - prompt: "pricing grid for docs"
    slug: pricing-grid
    tags: [pricing, grid, docs]

Run the command:

imgbin batch assets/jobs/launch.yaml

The batch job design supports failure isolation: items in the manifest are processed one by one, and a failure in one item does not affect the others. You can also preview the job with --dry-run without actually executing it.

And the best part is that it tells you exactly what succeeded and what failed. Unlike some things in life, where failure happens and you are left not even knowing how it happened.

annotate: AI annotation

Run AI recognition on existing images to automatically generate titles, tags, and descriptions:

# Annotate a single image
imgbin annotate ./library/2026-03/orange-dashboard

# Annotate an entire directory in batch
imgbin annotate ./library/2026-03/

thumbnail: thumbnail generation

Generate thumbnails for existing images:

# Generate a thumbnail
imgbin thumbnail ./library/2026-03/orange-dashboard

Batch Job Manifest Design

The manifest format for batch jobs supports flexible configuration. Defaults can be set globally, and individual jobs can override them:

# Global defaults
defaults:
  annotate: true        # Enable AI annotation by default
  thumbnail: true       # Generate thumbnails by default
  libraryRoot: ./library
  model: gpt-image-1.5

jobs:
  # Minimal configuration: only provide a prompt
  - prompt: "first image"

  # Full configuration
  - prompt: "second image"
    slug: custom-slug
    tags: [tag1, tag2]
    annotate: false     # Do not run AI annotation for this job
    model: dall-e-3    # Use a different model for this job

When executed, ImgBin processes jobs one by one. The result of each job is written to its corresponding metadata.json. Even if one job fails, the others are unaffected. After all jobs complete, the CLI outputs a summary report:

✓ orange-dashboard (succeeded)
✓ pricing-grid (succeeded)
✗ hero-banner (failed: API rate limit exceeded)

2/3 succeeded, 1 failed

Some things cannot be rushed. Taking them one at a time is often the steadier path. Maybe that is the philosophy behind batch jobs.

Environment Variable Configuration

ImgBin supports flexible configuration through environment variables:

# ImgBin working directory
IMGBIN_WORKDIR=/path/to/imgbin

# Executable path (for invocation inside scripts)
IMGBIN_EXECUTABLE=/path/to/imgbin/dist/cli.js

# Asset library root
IMGBIN_LIBRARY_ROOT=./.imgbin-library

# Azure OpenAI configuration (if using the Azure provider)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=***
AZURE_OPENAI_IMAGE_DEPLOYMENT=gpt-image-1

Configuration is one of those things that can feel both important and not that important at the same time. In the end, whatever feels comfortable and fits your workflow best is usually the right choice.

Implementation Notes

During implementation, we summarized a few key points:

Provider Interface Design

Interface definitions should be clear and complete, including input parameters, return values, and error handling. It is also a good idea to provide both synchronous and asynchronous invocation styles for different scenarios.

That is one small piece of hard-earned experience. Once an interface is set, nobody wants to keep changing it later.

Failure Handling Strategy

When one item fails in a batch job, the CLI should:

Write detailed error information to a separate log file.
Continue executing other jobs instead of interrupting the whole process.
Return a non-zero exit code at the end to indicate that some jobs failed.
Clearly display the execution result of every job in the summary report.

Some failures are just failures. There is no point pretending otherwise. It is better to acknowledge them openly and then figure out how to solve them. The same logic applies to projects and to life.

Metadata Merge Strategy

Recognition results are written to the recognized section by default, while manually edited fields are marked in manual. Metadata updates follow an append-only strategy: unless --force is explicitly passed, existing manually curated results are not overwritten.

That point became clear too - some things, once overwritten, are just gone. It is often better to preserve them, because the record itself has value.

Directory Creation Atomicity

Use fs.mkdir({ recursive: true }) to ensure directory creation remains atomic and to avoid race conditions in concurrent scenarios.

Maybe that is what security feels like - being stable when stability matters, moving fast when speed matters, and never getting stuck second-guessing.

Conclusion

As the core tool for image asset management in the HagiCode project, ImgBin solves our problems through the following design choices:

Unified entry point: the CLI covers generation, annotation, thumbnails, and all other core operations.
Metadata-driven: every asset has a complete metadata.json, enabling search and traceability.
Provider Adapter: flexible abstraction for external APIs, making testing and extension easier.
Batch job support: batch image generation can be automated within CI/CD workflows.

Everything else may have faded, but this approach really did end up proving useful.

This solution not only improves HagiCode’s own development efficiency, but also forms a reusable framework for image asset management. If you are building a similarly multi-component project, I believe ImgBin’s design ideas may give you some inspiration.

Youth is all about trying things and making a bit of a mess. If you never put yourself through that, how would you know what you are really capable of?

References

ImgBin technical proposal: https://github.com/HagiCode-org/site/tree/main/openspec/changes/archive/2026-03-10-imgbin-cli-tool
HagiCode official website: https://hagicode.com
HagiCode GitHub: https://github.com/HagiCode-org/site

Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was produced with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

Author: newbe36524
Article link: https://docs.hagicode.com/blog/2026-03-13-imgbin-cli-tool-asset-management/

CLI

What Hermes is and who it is for

What to know before installing Hermes

Notes for Windows users

How to install Hermes

Reload your shell after installation

How to confirm Hermes is installed correctly

How to start using Hermes for the first time

Start with the CLI

These commands are enough for your first round of setup

The two most common ways to use Hermes

1. Use Hermes in the terminal as a daily development assistant

2. Use Hermes through a messaging platform

Using Feishu as an example: how to connect Hermes to a messaging platform

The minimum viable integration path

If you configure it manually, at least know these variables

Why Hermes sometimes does not reply in Feishu group chats

The Hermes commands beginners should remember first

Frequently asked questions

Can Windows install Hermes directly?

What should I do if typing hermes does nothing after installation?

Why does the bot not reply in a Feishu group?

Recommended next steps

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Background

Skills are scattered across different locations, making management costly

Lack of a standardized maintenance workflow

Failing to meet reproducibility requirements

About HagiCode

Analysis

Technical challenges

Design trade-offs

Architecture decisions

Solution

CLI architecture

Core workflow

Synchronization flow design

Repository structure

Practice

Initialize a skills repository

Add skills

Synchronize skills

Generate CI configuration

sources.yaml configuration example

.skill-source.json metadata example

Security and validation

GitHub Actions integration

Best practices

Team collaboration

Conclusion

References

Copyright Notice

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Background

Hagicode’s CLI abstraction architecture

GLM model support system

Key differences between GLM-5.1 and GLM-5.0

Standalone implementation of GLM-5.1

Model provider configuration

Zhipu AI (ZAI)

Using GLM-5.1 in different CLIs

1. Claude Code CLI + GLM-5.1

2. OpenCode CLI + GLM-5.1

3. Codebuddy CLI + GLM-5.1

Environment variable configuration

Using Zhipu AI ZAI

Using Alibaba Cloud DashScope

Get an API key

Improvement advantages of GLM-5.1

1. Better reasoning capability

2. Comprehensive multi-CLI compatibility

Notes

1. API key configuration

2. Model availability

3. OpenCode CLI format

4. Reasoning parameter

Conclusion

Continue With HagiCode

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Background

What should I do if typing `hermes` does nothing after installation?

`sources.yaml` configuration example

`.skill-source.json` metadata example