Skip to content

CLI

5 posts with the tag “CLI”

如何安装和使用 Hermes:从本地 CLI 到 Feishu 接入快速上手

想安装并开始使用 Hermes,最短路径其实只有三步:

  1. 运行官方安装命令
  2. 在终端里用 hermes 启动 CLI
  3. 如果你想在飞书里继续用,再配置 hermes gateway setup

这篇文章不打算把 Hermes 的所有能力一次讲完,而是先帮你完成最关键的入门闭环:装上、跑起来、开始用,然后再接一个最常见的消息平台场景。

Hermes Agent 是一个既可以在本地终端使用,也可以通过消息平台网关使用的 AI agent。

对大多数开发者来说,它有两个最常见的入口:

  • CLI:在终端里输入 hermes,直接进入交互式界面
  • Messaging Gateway:运行 hermes gateway,再从 Feishu、Telegram、Discord、Slack 等平台和它对话

如果你现在的目标只是快速上手,建议顺序不要反过来,先走这条路线:

  • 先安装 Hermes
  • 先从 CLI 验证是否可用
  • 再决定要不要接消息平台

这样更容易定位问题,也更适合第一次接触 Hermes 的用户。

根据 Hermes README,官方快速安装路径支持这些环境:

  • Linux
  • macOS
  • WSL2
  • Android via Termux

Hermes 当前不支持原生 Windows 直接运行。如果你使用 Windows,推荐先安装 WSL2,再在 WSL2 里执行安装命令。

这点最好在文章开头就说清楚,因为很多安装失败其实不是命令问题,而是运行环境不符合要求。

Hermes README 给出的快速安装命令是:

Terminal window
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

这条命令会执行官方安装脚本,处理平台相关的初始化流程。

安装结束后,先重新加载 shell 环境。最常见的是:

Terminal window
source ~/.bashrc

如果你用的是 zsh,也可以改成:

Terminal window
source ~/.zshrc

最直接的检查方式就是运行:

Terminal window
hermes

如果你想进一步确认配置和依赖是否正常,可以再执行:

Terminal window
hermes doctor

hermes doctor 适合在这些时候使用:

  • 安装后命令行为不正常
  • 模型配置失败
  • gateway 启动失败
  • 不确定环境依赖是否完整

如果你只是想尽快确认 Hermes 能不能用,最简单的方法就是:

Terminal window
hermes

这会启动 Hermes 的交互式 CLI。对于第一次接触 Hermes 的用户,这也是最推荐的起点,因为你可以先验证最核心的几件事:

  • 命令是否真的可用
  • 当前模型配置是否正常
  • 终端工具链是否工作正常
  • 交互体验是不是你需要的那种方式

这几个命令足够你完成第一轮配置

Section titled “这几个命令足够你完成第一轮配置”

Hermes README 里列出的几条高频命令,基本就构成了第一轮上手路径:

Terminal window
hermes model
hermes tools
hermes config set
hermes setup
hermes update
hermes doctor

如果你不知道它们各自是做什么的,可以先这样记:

  • hermes model:选模型、切模型
  • hermes tools:看和配当前可用工具
  • hermes config set:改具体配置项
  • hermes setup:跑一次完整初始化向导
  • hermes update:更新 Hermes
  • hermes doctor:做故障排查

对新手最实用的顺序通常是:

  1. 先运行 hermes model
  2. 如果你希望一次把常用项配完整,再运行 hermes setup

1. 在终端里把 Hermes 当成日常开发助手

Section titled “1. 在终端里把 Hermes 当成日常开发助手”

CLI 模式适合这些场景:

  • 本地写代码时直接问问题
  • 查项目、改文件、跑命令
  • 做一次性调试或 review
  • 在当前工作目录里持续协作

它最大的优点就是路径最短:不用额外接平台,不用先处理机器人配置,也最适合建立第一轮使用习惯。

如果你希望在飞书、Telegram、Discord 等平台上和 Hermes 对话,就需要使用 messaging gateway。

最常见的入口命令是:

Terminal window
hermes gateway setup
hermes gateway

其中:

  • hermes gateway setup 用来做交互式平台配置
  • hermes gateway 用来启动网关进程

根据官方文档,gateway 是一个统一的后台进程,用来连接你已配置的平台、管理会话,并处理定时任务等功能。

以 Feishu 为例,如何把 Hermes 接入消息平台

Section titled “以 Feishu 为例,如何把 Hermes 接入消息平台”

如果你的日常工作主要在飞书里,那么 Feishu/Lark 会是一个很自然的 Hermes 接入方式。

官方文档对 Feishu/Lark 的推荐入口是:

Terminal window
hermes gateway setup

运行后,在向导中选择 Feishu / Lark 即可。

Feishu 文档给出的两种连接模式是:

  • websocket:推荐
  • webhook:可选

如果 Hermes 跑在你的笔记本、工作站或者私有服务器上,优先使用 websocket 会更简单,因为不需要额外暴露公网回调地址。

如果你手动配置,至少要知道这些变量

Section titled “如果你手动配置,至少要知道这些变量”

如果你不是通过向导配置,而是手动写配置,Feishu 文档里列出的核心变量包括:

Terminal window
FEISHU_APP_ID=cli_xxx
FEISHU_APP_SECRET=***
FEISHU_DOMAIN=feishu
FEISHU_CONNECTION_MODE=websocket
FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy
FEISHU_HOME_CHANNEL=oc_xxx

其中最值得注意的是两项:

  • FEISHU_ALLOWED_USERS:建议配置,避免任何能接触到 bot 的人都可以直接使用它
  • FEISHU_HOME_CHANNEL:可以预先指定一个 home chat,用来接收 cron 结果或默认通知

这一点很容易被忽略:在 Feishu 群聊里,Hermes 默认不是看到每条消息都响应。

官方文档明确说明:

  • 私聊时,Hermes 会响应消息
  • 群聊里,必须显式 @ bot,它才会处理消息

如果你想把某个飞书会话设成 home channel,也可以在聊天里使用:

/set-home

或者提前在配置里写:

Terminal window
FEISHU_HOME_CHANNEL=oc_xxx

不管你是在 CLI 里,还是在消息平台里,先记住下面这些命令就已经够用了:

  • /new/reset:开始新会话
  • /model:查看或切换模型
  • /retry:重试上一轮
  • /undo:撤销上一轮交互
  • /compress:手动压缩上下文
  • /help:查看帮助

如果你主要在消息平台里使用,再额外记住一个:

  • /sethome/set-home:把当前聊天设为 home channel

这些命令覆盖了新手阶段最常见的操作:重开、调整、回退、查看和继续用。

不能。当前官方文档明确说明,原生 Windows 不支持,推荐使用 WSL2。

安装之后输入 hermes 没反应怎么办?

Section titled “安装之后输入 hermes 没反应怎么办?”

建议按下面顺序排查:

  1. 先重新加载 shell,例如 source ~/.bashrc
  2. 再重新运行 hermes
  3. 如果还是异常,执行 hermes doctor

先检查这三项:

  • 你有没有在群里 @ Hermes
  • FEISHU_ALLOWED_USERS 是否限制了当前用户
  • 当前群聊策略是否允许处理群消息

根据官方 Feishu 文档,群聊场景里,显式 @mention 是必要条件。

如果你只是想尽快开始使用 Hermes,最推荐的顺序是:

  1. 先执行安装命令
  2. 先用 hermes 在本地 CLI 里开始
  3. 再用 hermes modelhermes setup 补齐基础配置
  4. 如果你希望在飞书里继续使用,再配置 hermes gateway setup

如果这篇文章作为一个系列的第一篇,它最适合承担的角色不是“把所有高级功能一次讲完”,而是先把用户带进门。

后续更适合继续拆分成这些主题:

  • Hermes Feishu 接入完整指南
  • Hermes 常用 slash commands 指南
  • Hermes gateway 配置与排错指南

如果你准备继续做 Hermes 内容,这篇就可以作为后续文章的起点,并逐步把内部链接体系补起来。

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Section titled “Why Use Skillsbase to Maintain Your Own Skills Collection Repository”

It is kind of funny when you think about it: the era of AI programming has arrived, and the Agent Skills we keep on hand are becoming more and more numerous. But along with that comes more and more hassle. This article is about how we used skillsbase to solve those problems.

In the age of AI programming, developers need to maintain an increasing number of Agent Skills - reusable instruction sets that extend the capabilities of coding assistants such as Claude Code, OpenCode, and Cursor. However, as the number of skills grows, a practical problem gradually emerges:

It is not exactly a major problem, but once you have too many things, managing them becomes troublesome.

Skills are scattered across different locations, making management costly

Section titled “Skills are scattered across different locations, making management costly”
  • Local skills are scattered in multiple places: ~/.agents/skills/, ~/.claude/skills/, ~/.codex/skills/.system/, and so on
  • Different locations may have naming conflicts, for example skill-creator existing in both the user directory and the system directory
  • There is no unified management entry point, which makes backup and migration difficult

This part is genuinely annoying. Sometimes you do not even know where a certain skill actually is. It feels like losing something and then struggling to find it.

Lack of a standardized maintenance workflow

Section titled “Lack of a standardized maintenance workflow”
  • Manually copying skills is error-prone and makes it difficult to trace their origins
  • Without a unified validation mechanism, there is no guarantee that the skill repository remains complete
  • During team collaboration, synchronizing and sharing a skill collection is difficult

Manual work is always prone to mistakes. Human memory is limited, after all. Who can remember where every single thing came from?

Failing to meet reproducibility requirements

Section titled “Failing to meet reproducibility requirements”
  • When switching development machines, all skills need to be configured again
  • In CI/CD environments, the skill repository cannot be validated and synchronized automatically

Changing to a different computer means doing everything all over again. It feels, in a way, just like moving house - troublesome every single time. You have to adapt to the new environment and reconfigure everything again.

To address these pain points, we tried many different approaches: from manual copying to scripted automation, from directly managing directories to globally installing and then recovering files. Each approach had its own flaws. Some could not guarantee consistency, some polluted the environment, and some were hard to use in CI.

We definitely took quite a few detours.

In the end, we found a more elegant solution: skillsbase. The core idea behind this approach is to install and validate locally first, then convert the structure and write it into the repository, and finally uninstall the temporary files. This ensures that the repository contents match the actual installation result while avoiding pollution of the global environment.

It sounds simple when you put it that way, but we only figured it out after stepping into quite a few pitfalls.

The solution shared in this article comes from our hands-on experience in the HagiCode project.

HagiCode is an AI coding assistant project. During development, we need to maintain a large number of Agent Skills to extend various coding capabilities. These real-world needs are exactly what pushed us to build the skillsbase toolset for standardized management of skill repositories.

This was not invented out of thin air. We were pushed into it by real needs. Once the number of skills grows, management naturally becomes necessary. When problems appear during management, solutions become necessary too. Step by step, that is how we got here.

If you are interested in HagiCode, you can visit the official website to learn more or check the source code on GitHub.

To build a maintainable skills collection repository, the following core problems need to be solved:

  1. Unified namespace conflicts: when multiple sources contain skills with the same name, how do we avoid overwriting them?
  2. Source traceability: how do we record the source of each skill for future updates and audits?
  3. Synchronization and validation: how do we ensure that repository contents stay consistent with the actual installation results?
  4. Automation integration: how do we integrate with CI/CD workflows to enable automatic synchronization and validation?

These problems may look simple, but every single one of them is a headache. Then again, what worthwhile work is ever easy?

Option 1: Copy directories directly

Pros: simple to implement Cons: cannot guarantee consistency with the actual installation result of the skills CLI

We did think about this approach. Later, however, we realized that the CLI may apply some preprocessing logic during installation. Direct copying skips that step. As a result, what you copy is not the same as what is actually installed, and that becomes a problem.

Option 2: Install globally and then recover

Pros: the installation process can be validated Cons: pollutes the execution environment, and it is hard to keep CI and local results consistent

This approach is even worse. A global installation pollutes the environment. More importantly, it is difficult to keep the CI environment consistent with the local environment, which leads to the classic “works on my machine, fails in CI” problem. Anyone who has dealt with that knows how painful it is.

Option 3: Local install -> convert -> uninstall (final solution)

This is the approach adopted by skillsbase:

  • First install skills into a temporary location with npx skills
  • Convert the directory structure and add source metadata
  • Write the result into the target repository
  • Finally uninstall the temporary files

This approach ensures that repository contents are consistent with the actual installation results seen by consumers, avoids polluting the global environment, standardizes the conversion process, and supports idempotent operations.

This solution was not obvious from the beginning either. We simply learned through enough trial and error what works and what does not.

Decision ItemChoiceReason
RuntimeNode.js ESMNo build step required; .mjs is enough to orchestrate the file system
Configuration formatYAML (sources.yaml)Highly readable and suitable for manual maintenance
Naming strategyNamespace prefixUser skills keep their original names, while system skills receive the system- prefix
Workflowadd updates the manifest -> sync executes synchronizationA single synchronization engine avoids implementing the same rules twice
File managementManaged file markersAdd a comment header to support safe overwrites

These decisions all come down to one goal: making things simple. Simplicity wins in the end.

The skillsbase CLI provides four core commands:

skillsbase
├── init # Initialize repository structure
├── sync # Synchronize skill content
├── add # Add new skills
└── github_action # Generate GitHub Actions configuration

There are not many commands, but they are enough. A tool only needs to be useful.

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ init │───▶│ add │───▶│ sync │───▶│github_action│
│ initialize │ │ add source │ │ sync content│ │ generate CI │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Take it one step at a time. No need to rush.

sources.yaml -> parse sources -> npx skills install -> convert structure -> write to skills/ -> uninstall temporary files
.skill-source.json (source metadata)

This workflow is fairly clear. At least when I look at it, I can understand what each step is doing.

repos/skillsbase/
├── sources.yaml # Source manifest (single source of truth)
├── skills/ # Skills directory
│ ├── frontend-design/ # User skill
│ ├── skill-creator/ # User skill
│ └── system-skill-creator/ # System skill (with prefix)
├── scripts/
│ ├── sync-skills.mjs # Synchronization script
│ └── validate-skills.mjs # Validation script
├── docs/
│ └── maintainer-workflow.md # Maintainer documentation
└── .github/
├── workflows/
│ └── skills-sync.yml # CI workflow
└── actions/
└── skillsbase-sync/
└── action.yml # Reusable Action

There are quite a few files, but that is fine. Once the structure is organized clearly, maintenance becomes much easier.

Terminal window
# 1. Create an empty repository
mkdir repos/myskills && cd repos/myskills
git init
# 2. Initialize it with skillsbase
npx skillsbase init
# Output:
# [1/4] create manifest ................. done
# [2/4] create scripts .................. done
# [3/4] create docs ..................... done
# [4/4] create github workflow .......... done
#
# next: skillsbase add <skill-name>

This step generates a lot of files, but there is no need to worry - they are all generated automatically. After that, you can start adding skills.

Terminal window
# Add a single skill (this automatically triggers synchronization)
npx skillsbase add frontend-design --source vercel-labs/agent-skills
# Add from a local source
npx skillsbase add documentation-writer --source /home/user/.agents/skills
# Output:
# source: first-party ......... updated
# target: skills/frontend-design ... synced
# status: 1 skill added, 0 removed

Adding a skill is very simple. One command is enough. Sometimes, though, you may hit unexpected issues such as poor network conditions or permission problems. Those are manageable - just take them one at a time.

Terminal window
# Perform synchronization (reconcile all sources)
npx skillsbase sync
# Only check for drift (do not modify files)
npx skillsbase sync --check
# Allow missing sources (CI scenario)
npx skillsbase sync --allow-missing-sources

During synchronization, the system checks every source defined in sources.yaml and reconciles them with the contents under the skills/ directory. If differences exist, it updates them; if there are no differences, it skips them. This prevents the “configuration changed but files did not” problem.

Terminal window
# Generate workflow
npx skillsbase github_action --kind workflow
# Generate action
npx skillsbase github_action --kind action
# Generate everything
npx skillsbase github_action --kind all

The CI configuration is generated automatically as well. You still need to adjust some details yourself, such as trigger conditions and runtime environments, but that is not difficult.

# Skills root directory configuration
skillsRoot: skills/
metadataFile: .skill-source.json
# Source definitions
sources:
# First-party: local user skills
first-party:
type: local
path: /home/user/.agents/skills
naming: original # Keep original name
includes:
- documentation-writer
- frontend-design
- skill-creator
# System: skills provided by the system
system:
type: local
path: /home/user/.codex/skills/.system
naming: prefix-system # Add system- prefix
includes:
- imagegen
- openai-docs
- skill-creator # Becomes system-skill-creator
# Remote: third-party repository
vercel:
type: remote
url: vercel-labs/agent-skills
naming: original
includes:
- web-design-guidelines

This configuration file is the core of the entire system. All sources are defined here. Change this file, and the next synchronization will apply the new state. In that sense, it is truly a “single source of truth.”

{
"source": "first-party",
"originalPath": "/home/user/.agents/skills/documentation-writer",
"originalName": "documentation-writer",
"targetName": "documentation-writer",
"syncedAt": "2026-04-07T00:00:00.000Z",
"version": "unknown"
}

Every skill directory contains this file, recording its source information. That way, when something goes wrong later, you can quickly locate where it came from and when it was synchronized.

Terminal window
# Validate repository structure
node scripts/validate-skills.mjs
# Validate with the skills CLI
npx skills add . --list
# Check for updates
npx skills check

Validation is one of those things that can feel both important and optional. Still, for the sake of safety, it never hurts to run it from time to time. After all, you never know when something unexpected might happen.

.github/workflows/skills-sync.yml
name: Skills Sync
on:
push:
paths:
- 'sources.yaml'
- 'skills/**'
workflow_dispatch:
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Validate repository
run: |
npx skills add . --list
node scripts/validate-skills.mjs
- name: Sync check
run: npx skillsbase sync --check

Once CI integration is in place, every change to sources.yaml or the skills/ directory automatically triggers validation. That prevents the situation where changes were made locally but synchronization was forgotten.

  1. Handle naming conflicts: add the system- prefix to system skills consistently. This keeps every skill available while avoiding naming conflicts.
  2. Idempotent operations: all commands support repeated execution, and running sync multiple times does not produce side effects. This is especially important in CI.
  3. Managed files: generated files include the # Managed by skillsbase CLI comment, making them easy to identify and manage. These files can be safely overwritten, and manual modifications are not preserved.
  4. Non-interactive mode: CI environments use deterministic behavior by default, so interactive prompts do not interrupt execution. All configuration is declared through sources.yaml.
  5. Source traceability: every skill has a .skill-source.json file recording its source information, making troubleshooting much faster.
Terminal window
# Team members install the shared skills repository
npx skills add your-org/myskills -g --all
# Clone locally and validate
git clone https://github.com/your-org/myskills.git
cd myskills
npx skills add . --list

By managing the skills repository with Git, team members can easily synchronize their skill collection and ensure that everyone uses the same versions of tools and configuration.

This is especially useful in team collaboration. You no longer run into situations where “it works for me but not for you.” Once the environment is unified, half the problems disappear.

The core value of using skillsbase to maintain a skills collection repository lies in the following:

  • Security: source validation, conflict detection, and managed file protection
  • Maintainability: a unified entry point, idempotent operations, and configuration-as-documentation
  • Standardization: a unified directory structure, naming conventions, and metadata format
  • Automation: CI/CD integration, automatic synchronization, and automatic validation

With this approach, developers can manage their own Agent Skills the same way they manage npm packages, building a reproducible, shareable, and maintainable skills repository system.

The tools and workflow shared in this article are exactly what we refined through real mistakes and real optimization while building HagiCode. If you find this approach valuable, that is a good sign that our engineering direction is the right one - and that HagiCode itself is worth your attention as well.

After all, good tools deserve to be used by more people.

If this article helped you:


This article was first published on the HagiCode Blog.

Thank you for reading. If you found this article useful, you are welcome to like it, save it, and share it in support. This content was created with AI-assisted collaboration, and the final version was reviewed and confirmed by the author.

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Section titled “Hagicode and GLM-5.1 Multi-CLI Integration Guide”

In the Hagicode project, users can choose from multiple CLI tools to drive AI programming assistants, including Claude Code CLI, GitHub Copilot, OpenCode CLI, Codebuddy CLI, Hermes CLI, and more. These CLI tools are general-purpose AI programming tools on their own, but through Hagicode’s abstraction layer, they can flexibly connect to different AI model providers.

Zhipu AI (ZAI) provides an interface compatible with the Anthropic Claude API, allowing these CLI tools to directly use domestic GLM series models. Among them, GLM-5.1 is Zhipu’s latest large language model release, with significant improvements over GLM-5.0.

Hagicode defines 11 CLI provider types through the AIProviderType enum, covering mainstream AI programming CLI tools:

public enum AIProviderType
{
ClaudeCodeCli = 0, // Claude Code CLI
CodexCli = 1, // GitHub Copilot Codex
GitHubCopilot = 2, // GitHub Copilot
CodebuddyCli = 3, // Codebuddy CLI
OpenCodeCli = 4, // OpenCode CLI
IFlowCli = 5, // IFlow CLI
HermesCli = 6, // Hermes CLI
QoderCli = 7, // Qoder CLI
KiroCli = 8, // Kiro CLI
KimiCli = 9, // Kimi CLI
GeminiCli = 10, // Gemini CLI
}

Each CLI has corresponding model parameter configuration and supports the model and reasoning parameters:

private static readonly IReadOnlyDictionary<AIProviderType, IReadOnlyList<string>> ManagedModelParameterKeysByProvider =
new Dictionary<AIProviderType, IReadOnlyList<string>>
{
[AIProviderType.ClaudeCodeCli] = ["model", "reasoning"],
[AIProviderType.CodexCli] = ["model", "reasoning"],
[AIProviderType.OpenCodeCli] = ["model", "reasoning"],
[AIProviderType.HermesCli] = ["model", "reasoning"],
[AIProviderType.CodebuddyCli] = ["model", "reasoning"],
[AIProviderType.QoderCli] = ["model", "reasoning"],
[AIProviderType.KiroCli] = ["model", "reasoning"],
[AIProviderType.GeminiCli] = ["model"], // Gemini does not support the reasoning parameter
// ...
};

Hagicode’s Secondary Professions Catalog defines complete support for the GLM model series:

Model IDNameDefault ReasoningCompatible CLI Families
glm-4.7GLM 4.7highclaude, codebuddy, hermes, qoder, kiro
glm-5GLM 5highclaude, codebuddy, hermes, qoder, kiro
glm-5-turboGLM 5 Turbohighclaude, codebuddy, hermes, qoder, kiro
glm-5.0GLM 5.0 (Legacy)highclaude, codebuddy, hermes, qoder, kiro
glm-5.1GLM 5.1highclaude, codebuddy, hermes, qoder, kiro

Key differences between GLM-5.1 and GLM-5.0

Section titled “Key differences between GLM-5.1 and GLM-5.0”

From the implementation in AcpSessionModelBootstrapper.cs, we can clearly see the differences between GLM-5.1 and GLM-5.0:

GLM-5.1 is a standalone new model identifier with no legacy handling logic:

private const string Glm51ModelValue = "glm-5.1";

Definition in the Secondary Professions Catalog:

{
"id": "secondary-glm-5-1",
"name": "GLM 5.1",
"family": "anthropic",
"summary": "hero.professionCopy.secondary.glm51.summary",
"sourceLabel": "hero.professionCopy.sources.aiSharedAnthropicModel",
"sortOrder": 64,
"supportsImage": true,
"compatiblePrimaryFamilies": [
"claude",
"codebuddy",
"hermes",
"qoder",
"kiro"
],
"defaultParameters": {
"model": "glm-5.1",
"reasoning": "high"
}
}

Zhipu AI provides the most complete GLM model support:

{
"providerId": "zai",
"name": "智谱 AI",
"description": "智谱 AI 提供的 Claude API 兼容服务",
"category": "china-providers",
"apiUrl": {
"codingPlanForAnthropic": "https://open.bigmodel.cn/api/anthropic"
},
"recommended": true,
"region": "cn",
"defaultModels": {
"sonnet": "glm-4.7",
"opus": "glm-5",
"haiku": "glm-4.5-air"
},
"supportedModels": [
"glm-4.7",
"glm-5",
"glm-4.5-air",
"qwen3-coder-next",
"qwen3-coder-plus"
],
"features": ["experimental-agent-teams"],
"authTokenEnv": "ANTHROPIC_AUTH_TOKEN",
"referralUrl": "https://www.bigmodel.cn/claude-code?ic=14BY54APZA",
"documentationUrl": "https://open.bigmodel.cn/dev/api"
}

Features:

  • Supports the widest variety of GLM model variants
  • Provides default mapping across the Sonnet/Opus/Haiku tiers
  • Supports the experimental-agent-teams feature

Claude Code CLI is one of Hagicode’s core CLIs and is configured through the Hero configuration system:

{
"primaryProfessionId": "profession-claude-code",
"secondaryProfessionId": "secondary-glm-5-1",
"model": "glm-5.1",
"reasoning": "high"
}

Corresponding HeroEquipmentCatalogItem configuration:

{
id: 'secondary-glm-5-1',
name: 'GLM 5.1',
family: 'anthropic',
kind: 'model',
primaryFamily: 'claude',
compatiblePrimaryFamilies: ['claude', 'codebuddy', 'hermes', 'qoder', 'kiro'],
defaultParameters: {
model: 'glm-5.1',
reasoning: 'high'
}
}

OpenCode CLI is the most flexible CLI and supports specifying any model in the provider/model format:

Method 1: Use the ZAI provider prefix

{
"primaryProfessionId": "profession-opencode",
"model": "zai/glm-5.1",
"reasoning": "high"
}

Method 2: Use the model ID directly

{
"model": "glm-5.1"
}

Method 3: Frontend configuration UI

In HeroModelEquipmentForm.tsx, OpenCode CLI has a dedicated placeholder hint:

const OPEN_CODE_MODEL_PLACEHOLDER = 'myprovider/glm-4.7';
const modelPlaceholder = primaryProviderType === PCode_Models_AIProviderType.OPEN_CODE_CLI
? OPEN_CODE_MODEL_PLACEHOLDER
: 'gpt-5.4';

Users can enter:

zai/glm-5.1
glm-5.1

OpenCode CLI model parsing logic:

internal OpenCodeModelSelection? ResolveModelSelection(string? rawModel)
{
var normalized = NormalizeOptionalValue(rawModel);
if (normalized == null) return null;
var slashIndex = normalized.IndexOf('/');
if (slashIndex < 0)
{
// No slash: use the model ID directly
return new OpenCodeModelSelection {
ProviderId = string.Empty,
ModelId = normalized,
};
}
// Slash exists: parse the provider/model format
var providerId = normalized[..slashIndex].Trim();
var modelId = normalized[(slashIndex + 1)..].Trim();
return new OpenCodeModelSelection {
ProviderId = providerId,
ModelId = modelId,
};
}

Codebuddy CLI has dedicated legacy handling logic:

{
"primaryProfessionId": "profession-codebuddy",
"model": "glm-5.1",
"reasoning": "high"
}

Note: Codebuddy retains special handling for GLM-5.0 and does not use legacy normalization:

return !string.Equals(providerName, "CodebuddyCli", StringComparison.OrdinalIgnoreCase)
&& string.Equals(normalizedModel, LegacyGlm5TurboModelValue, StringComparison.OrdinalIgnoreCase)
? Glm5TurboModelValue
: normalizedModel;
// For CodebuddyCli, glm-5.0 is not normalized to glm-5-turbo
Terminal window
# Set the API key
export ANTHROPIC_AUTH_TOKEN="***"
# Optional: specify the API endpoint (ZAI uses this endpoint by default)
export ANTHROPIC_BASE_URL="https://open.bigmodel.cn/api/anthropic"
Terminal window
# Set the API key
export ANTHROPIC_AUTH_TOKEN="your-a...-key"
# Specify the Alibaba Cloud endpoint
export ANTHROPIC_BASE_URL="https://coding.dashscope.aliyuncs.com/apps/anthropic"

Compared with GLM-5.0, GLM-5.1 brings the following significant improvements:

According to Zhipu’s official release information, improvements in GLM-5.1 include:

  • Stronger code understanding: More accurate analysis of complex code structures
  • Longer context comprehension: Supports longer conversational context
  • Enhanced tool calling: Higher success rate for MCP tool calls
  • Output stability: Reduces randomness and hallucinations

GLM-5.1 covers all mainstream CLIs supported by Hagicode:

compatiblePrimaryFamilies: [
"claude", // Claude Code CLI
"codebuddy", // Codebuddy CLI
"hermes", // Hermes CLI
"qoder", // Qoder CLI
"kiro" // Kiro CLI
]

Make sure the ANTHROPIC_AUTH_TOKEN environment variable is set correctly. It is the required credential for every CLI to connect to the model.

GLM-5.1 needs to be enabled by the corresponding model provider:

  • The Zhipu AI ZAI platform supports it by default
  • Alibaba Cloud DashScope may require a separate application

When using the provider/model format, make sure the provider ID is correct:

  • Zhipu AI: zai or zhipuai
  • Alibaba Cloud: aliyun or dashscope
  • high is recommended for the best code generation results
  • Gemini CLI does not support the reasoning parameter and will ignore this configuration automatically

Through a unified abstraction layer, Hagicode enables flexible integration between GLM-5.1 and multiple CLIs. Developers can choose the CLI tool that best fits their preferences and usage scenarios, then use the latest GLM-5.1 model through simple configuration.

As Zhipu’s latest model version, GLM-5.1 offers clear improvements over GLM-5.0:

  • An independent version identifier with no legacy burden
  • Stronger reasoning and code understanding
  • Broad multi-CLI compatibility
  • Flexible reasoning level configuration

With the correct environment variables and Hero equipment configured, users can fully unlock the power of GLM-5.1 across different CLI environments.

Thank you for reading. If you found this article useful, feel free to like, bookmark, and share it to show your support. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Section titled “Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs”

During the development of the HagiCode project, we needed to integrate multiple AI coding assistant CLIs at the same time, including Claude Code, Codex, and CodeBuddy. Each CLI has different interfaces, parameters, and output formats, and the repeated integration code made the project harder and harder to maintain. In this article, we share how we built a unified abstraction layer with HagiCode.Libs to solve this engineering pain point. You could also say it is simply some hard-earned experience gathered from the pitfalls we have already hit.

The market for AI coding assistants is quite lively now. Besides Claude Code, there are also OpenAI’s Codex, Zhipu’s CodeBuddy, and more. As an AI coding assistant project, HagiCode needs to integrate these different CLI tools across multiple subprojects, including desktop, backend, and web.

At first, the problem was manageable. Integrating one CLI was only a few hundred lines of code. But as the number of CLIs we needed to support kept growing, things started to get messy.

Each CLI has its own command-line argument format, different environment variable requirements, and a wide variety of output formats. Some output JSON, some output streaming JSON, and some output plain text. On top of that, there are cross-platform compatibility issues. Executable discovery and process management work very differently between Windows and Unix systems, so code duplication kept increasing. In truth, it was just a bit more Ctrl+C and Ctrl+V, but maintenance quickly became painful.

The most frustrating part was that every time we wanted to add support for a new CLI capability, we had to change the same code in several projects. That approach was clearly not sustainable in the long run. Code has a temper too; duplicate it too many times and it starts causing trouble.

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project that needs to maintain multiple subprojects at the same time, including a frontend VSCode extension, backend AI services, and a cross-platform desktop client. In a way, it was exactly this complex, multi-language, multi-platform environment that led to the birth of HagiCode.Libs. You could say we were forced into it, and so be it.

Although these AI coding assistant CLIs each have their own characteristics, from a technical perspective they share several obvious traits:

Similar interaction patterns: they all start a CLI process, send a prompt, receive streaming responses, parse messages, and then either end or continue the session. At the end of the day, the whole flow follows the same basic mold.

Similar configuration needs: they all need API key authentication, working directory setup, model selection, tool permission control, and session management. After all, everyone is making a living from APIs; the differences are mostly a matter of flavor.

The same cross-platform challenges: they all need to solve executable path resolution (claude vs claude.exe vs /usr/local/bin/claude), process startup and environment variable handling, shell command escaping, and argument construction. Cross-platform work is painful no matter how you describe it. Only people who have stepped into the traps really understand the difference between Windows and Unix.

Based on this analysis, we needed a unified abstraction layer that could provide a consistent interface, encapsulate cross-platform CLI discovery logic, handle streaming output parsing, and support both dependency injection and non-DI scenarios. It is the kind of problem that makes your head hurt just thinking about it, but you still have to face it. After all, it is our own project, so we have to finish it even if we have to cry our way through it.

We created HagiCode.Libs, a lightweight .NET 10 library workspace released under the MIT license and now published on GitHub. It may not be some world-shaking masterpiece, but it is genuinely useful for solving real problems.

HagiCode.Libs/
├── src/
│ ├── HagiCode.Libs.Core/ # Core capabilities
│ │ ├── Discovery/ # CLI executable discovery
│ │ ├── Process/ # Cross-platform process management
│ │ ├── Transport/ # Streaming message transport
│ │ └── Environment/ # Runtime environment resolution
│ ├── HagiCode.Libs.Providers/ # Provider implementations
│ │ ├── ClaudeCode/ # Claude Code provider
│ │ ├── Codex/ # Codex provider
│ │ └── Codebuddy/ # CodeBuddy provider
│ ├── HagiCode.Libs.ConsoleTesting/ # Testing framework
│ ├── HagiCode.Libs.ClaudeCode.Console/
│ ├── HagiCode.Libs.Codex.Console/
│ └── HagiCode.Libs.Codebuddy.Console/
└── tests/ # xUnit tests

When designing HagiCode.Libs, we followed a few principles. They all came from lessons learned the hard way:

Zero heavy framework dependencies: it does not depend on ABP or any other large framework, which keeps it lightweight. These days, the fewer dependencies you have, the fewer headaches you get. Most people have already been beaten up by dependency hell at least once.

Cross-platform support: native support for Windows, macOS, and Linux, without writing separate code for different platforms. One codebase that runs everywhere is a pretty good thing.

Streaming processing: CLI output is handled with asynchronous streams, which fits modern .NET programming patterns much better. Times change, and async is king.

Flexible integration: it supports dependency injection scenarios while also allowing direct instantiation. Different people have different preferences, so we wanted it to be convenient either way.

If your project already uses dependency injection, such as ASP.NET Core or the generic host, you can integrate it directly. It is a small thing, but a well-behaved one:

using HagiCode.Libs.Providers;
using Microsoft.Extensions.DependencyInjection;
var services = new ServiceCollection();
services.AddHagiCodeLibs();
await using var provider = services.BuildServiceProvider();
var claude = provider.GetRequiredService<ICliProvider<ClaudeCodeOptions>>();
var options = new ClaudeCodeOptions
{
ApiKey = "your-api-key",
Model = "claude-sonnet-4-20250514"
};
await foreach (var message in claude.ExecuteAsync(options, "Hello, Claude!"))
{
Console.WriteLine($"{message.Type}: {message.Content}");
}

If you are writing a simple script or working in a non-DI scenario, creating an instance directly also works. Put simply, it depends on your personal preference:

var claude = new ClaudeCodeProvider();
var options = new ClaudeCodeOptions
{
ApiKey = "sk-ant-xxx",
Model = "claude-sonnet-4-20250514"
};
await foreach (var message in claude.ExecuteAsync(options, "Help me write a quicksort"))
{
// Handle messages
}

Both approaches use the same underlying implementation, so you can choose the integration style that best fits your project. There is no universal right answer in this world. What suits you is the best option. It may sound cliché, but it is true.

Each provider has its own dedicated testing console project, making it easier to validate the integration independently. Testing is one of those things where if you are going to do it, you should do it properly:

Terminal window
# Claude Code tests
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-provider
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-all claude
# CodeBuddy tests
dotnet run --project src/HagiCode.Libs.Codebuddy.Console -- --test-provider codebuddy-cli
# Codex tests
dotnet run --project src/HagiCode.Libs.Codex.Console -- --test-provider codex-cli

The testing scenarios cover several key cases:

  • Ping: health check to confirm the CLI is available
  • Simple Prompt: basic prompt test
  • Complex Prompt: multi-turn conversation test
  • Session Restore/Resume: session recovery test
  • Repository Analysis: repository analysis test

This standalone testing console design is especially useful during debugging because it lets us quickly identify whether the issue is in the HagiCode.Libs layer or in the CLI itself. Debugging is really just about finding where the problem is. Once the direction is right, you are already halfway there.

Cross-platform compatibility is one of the core goals of HagiCode.Libs. We configured the GitHub Actions workflow .github/workflows/cli-discovery-cross-platform.yml to run real CLI discovery validation across ubuntu-latest, macos-latest, and windows-latest.

This ensures that every code change does not break cross-platform compatibility. During local development, you can also reproduce it with the following commands. After all, you cannot ask CI to take the blame for everything. Your local environment should be able to run it too:

Terminal window
npm install --global @anthropic-ai/claude-code@2.1.79
HAGICODE_REAL_CLI_TESTS=1 dotnet test --filter "Category=RealCli"

HagiCode.Libs uses asynchronous streams to process CLI output. Compared with traditional callback or event-based approaches, this fits the asynchronous programming style of modern .NET much better. In the end, this is simply how technology moves forward, whether anyone likes it or not:

public async IAsyncEnumerable<CliMessage> ExecuteAsync(
TOptions options,
string prompt,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Start the CLI process
// Parse streaming JSON output
// Yield the CliMessage sequence
}

The message types include:

  • user: user message
  • assistant: assistant response
  • tool_use: tool invocation
  • result: session end

This design lets callers handle streaming output flexibly, whether for real-time display, buffered post-processing, or forwarding to other services. Why worry whether the sky is sunny or cloudy? What matters is that once the idea opens up, you can use it however you like.

The HagiCode.Libs.Exploration module provides Git repository discovery and status checking, which is especially useful in repository analysis scenarios. This feature was also born out of necessity, because HagiCode needs to analyze repositories:

// Discover Git repositories
var repositories = await GitRepositoryDiscovery.DiscoverAsync("/path/to/search");
// Get repository information
var info = await GitRepository.GetInfoAsync(repoPath);
Console.WriteLine($"Branch: {info.Branch}, Remote: {info.RemoteUrl}");
Console.WriteLine($"Has uncommitted changes: {info.HasUncommittedChanges}");

HagiCode’s code analysis capabilities use this module to identify project structure and Git status. It is a good example of making full use of what we built.

Based on our practice in the HagiCode project, there are several points that deserve special attention. They are all real issues that need to be handled carefully:

API key security: do not hardcode API keys in your code. Use environment variables or configuration management instead. HagiCode.Libs supports passing configuration through Options objects, making it easier to integrate with different configuration sources. When it comes to security, there is no such thing as being too careful.

CLI version pinning: in CI/CD, we pin specific versions, such as @anthropic-ai/claude-code@2.1.79, to reduce uncertainty caused by version drift. It is also a good idea to use fixed versions in local development. Versioning can be painful. If you do not pin versions, the problem will teach you a lesson very quickly.

Test categorization: default tests use fake providers to keep them deterministic and fast, while real CLI tests must be enabled explicitly. This gives CI fast feedback while still allowing real-environment validation when needed. Striking that balance is never easy. Speed and stability always require trade-offs.

Session management: different CLIs have different session recovery mechanisms. Claude Code uses the .claude/ directory to store sessions, while Codex and CodeBuddy each have their own approaches. When using them, be sure to check their respective documentation and understand the details of their session persistence mechanisms. There is no harm in understanding it clearly.

HagiCode.Libs is the unified abstraction layer we built during the development of HagiCode to solve the repeated engineering work involved in multi-CLI integration. By providing a consistent interface, encapsulating cross-platform details, and supporting flexible integration patterns, it greatly reduces the engineering complexity of integrating multiple AI coding assistants. Much may fade away, but the experience remains.

If you also need to integrate multiple AI CLI tools in your project, or if you are interested in cross-platform process management and streaming message handling, feel free to check it out on GitHub. The project is released under the MIT license, and contributions and feedback are welcome. In the end, it is a happy coincidence that we met here, so since you are already here, we might as well become friends.

The approach shared in this article was shaped by real pitfalls and real optimization work inside HagiCode. What else could we do? Running into pitfalls is normal. If you think this solution is valuable, then perhaps our engineering work is doing all right. And HagiCode itself may also be worth your attention. You might even find a pleasant surprise.


If this article helped you:

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

ImgBin CLI Tool Design: HagiCode's Image Asset Management Approach

ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach

Section titled “ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach”

This article explains how to build an automatable image asset pipeline from scratch, covering CLI tool design, a Provider Adapter architecture, and metadata management strategies.

Honestly, I did not expect image asset management to keep us tangled up for this long.

During HagiCode development, we ran into a problem that looked simple on the surface but was surprisingly thorny in practice: generating and managing image assets. In a way, it was like the dramas of adolescence - calm on the outside, turbulent underneath.

As the project accumulated more documentation and marketing materials, we needed a large number of supporting images. Some had to be AI-generated, some had to be selected from an existing asset library, and others needed AI recognition plus automatic labeling. The problem was that all of this had long been handled through scattered scripts and manual steps. Every time we generated an image, we had to run a script by hand, organize metadata by hand, and create thumbnails by hand. That alone was annoying enough, but the bigger issue was that everything was scattered everywhere. When we wanted to find something, we could not. When we needed to reuse something, we could not.

The pain points were concrete:

  1. No unified entry point: the logic for image generation was spread across different scripts, so batch execution was basically impossible.
  2. Missing metadata: generated images had no unified metadata.json, which meant no reliable searchability or traceability.
  3. High manual organization cost: titles and tags had to be sorted out one by one by hand, which was inefficient.
  4. No automation: automatically generating visual assets in a CI/CD pipeline? Not a chance.

We did think about just leaving it alone. But projects still need to move forward. Since we could not avoid the problem, we figured we might as well solve it. So we decided to upgrade ImgBin from a set of scattered scripts into an image asset pipeline that can be executed automatically. Some problems, after all, do not disappear just because you look away.

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI coding assistant project that simultaneously maintains multiple components, including a VSCode extension, backend AI services, and a cross-platform desktop client. In a complex, multilingual, cross-platform environment like this, standardized image asset management becomes a key part of improving development efficiency.

You could say this was one of those small growing pains in HagiCode’s journey. Every project has moments like that: a minor issue that looks insignificant, yet somehow manages to take up half the day.

HagiCode’s build system is based on the TypeScript + Node.js ecosystem, so ImgBin naturally adopted the same tech stack to keep the project technically consistent. Once you are used to one stack, switching to something else just feels like unnecessary trouble.


ImgBin uses a layered architecture that cleanly separates CLI commands, application services, third-party API adapters, and the infrastructure layer:

Component hierarchy
├── CLI Entry (cli.ts) Global argument parsing, command routing
├── Commands (commands/*) generate | batch | annotate | thumbnail
├── Application Services job-runner | metadata | thumbnail | asset-writer
├── Provider Adapters image-api-provider | vision-api-provider
└── Infrastructure Layer config | logger | paths | schema

The benefit of this layered design is clear responsibility boundaries. It also makes testing easier because external dependencies can be mocked cleanly. In practice, it just means each layer does its own job without getting in the way of the others, so when something breaks, it is easier to figure out why.

ImgBin uses a model of “one asset, one directory.” Every time an image is generated, it creates a structure like this:

library/
└── 2026-03/
└── orange-dashboard/
├── original.png # Original image
├── thumbnail.webp # 512x512 thumbnail
└── metadata.json # Structured metadata

The advantages of this model are:

  1. Self-contained: all files for a single asset live in the same directory, making migration and backup convenient.
  2. Traceable: metadata.json makes it possible to trace generation time, prompt, model, and other details.
  3. Extensible: if more variants are needed later, such as thumbnails in multiple sizes, we can simply add new files in the same directory.

Beautiful things do not always need to be possessed. Sometimes it is enough that they remain beautiful, and that you can quietly appreciate them. That may sound a little far afield, but the logic still holds here: once images are kept together, they are more pleasant to look at and much easier to find.

metadata.json is the core of the entire system. It uses a layered storage strategy that separates fields into three categories:

{
"schemaVersion": 2,
"assetId": "orange-dashboard",
"slug": "orange-dashboard",
"title": "Orange Dashboard",
"tags": ["dashboard", "hero", "orange"],
"source": { "type": "generated" },
"paths": {
"assetDir": "library/2026-03/orange-dashboard",
"original": "original.png",
"thumbnail": "thumbnail.webp"
},
"generated": {
"prompt": "orange dashboard for docs hero",
"provider": "azure-openai-image-api",
"model": "gpt-image-1.5"
},
"recognized": {
"title": "Orange Dashboard",
"tags": ["dashboard", "ui", "orange"],
"description": "A modern orange dashboard with charts and metrics"
},
"status": {
"generation": "succeeded",
"recognition": "succeeded",
"thumbnail": "succeeded"
},
"timestamps": {
"createdAt": "2026-03-11T04:01:19.570Z",
"updatedAt": "2026-03-11T04:02:09.132Z"
}
}
  • generated: records the original information from image generation, such as the prompt, provider, and model.
  • recognized: stores AI recognition results, such as auto-generated titles, tags, and descriptions.
  • manual: stores manually curated results. Data in this area has the highest priority and will not be overwritten by AI recognition.

This layered strategy resolves one of our earlier core conflicts: when AI recognition and manual curation disagree, which one should win? The answer is manual input. AI recognition is there to assist, not to decide. That question also became clearer over time - machines are still machines, and in the end, people still need to make the call.


Another core part of ImgBin is the Provider Adapter pattern. We abstract external APIs behind a unified interface so that even if we switch AI service providers, we do not need to change the business logic.

In a way, it is a bit like relationships - outward appearances can change, but what matters is that the inner structure stays the same. Once the interface is fixed, the internal implementation can vary freely.

interface ImageGenerationProvider {
// Generate an image and return its Buffer
generate(options: GenerateOptions): Promise<Buffer>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface GenerateOptions {
prompt: string;
model?: string;
size?: '1024x1024' | '1792x1024' | '1024x1792';
quality?: 'standard' | 'hd';
format?: 'png' | 'webp' | 'jpeg';
}
interface VisionRecognitionProvider {
// Recognize image content and return structured metadata
recognize(imageBuffer: Buffer): Promise<RecognitionResult>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface RecognitionResult {
title?: string;
tags: string[];
description?: string;
confidence: number;
}

The advantages of this interface design are:

  1. Testable: in unit tests, we can pass in mock providers instead of making real external API calls.
  2. Extensible: adding a new provider only requires implementing the interface; caller code does not need to change.
  3. Replaceable: production can use Azure OpenAI while testing can use a local model, with configuration being the only thing that changes.

Sometimes project work feels like that too. On the surface it looks like we just swapped an API, but the internal logic remains exactly the same, and that makes the whole thing a lot less scary.


ImgBin provides four core commands to cover different usage scenarios:

Terminal window
# Simplest usage
imgbin generate --prompt "orange dashboard for docs hero"
# Generate a thumbnail and AI annotations at the same time
imgbin generate --prompt "orange dashboard" --annotate --thumbnail
# Specify an output directory
imgbin generate --prompt "orange dashboard" --output ./library

Batch jobs are defined through YAML or JSON manifest files, which makes them suitable for CI/CD workflows:

assets/jobs/launch.yaml
defaults:
annotate: true
thumbnail: true
libraryRoot: ./library
jobs:
- prompt: "orange dashboard hero"
slug: orange-dashboard
tags: [dashboard, hero, orange]
- prompt: "pricing grid for docs"
slug: pricing-grid
tags: [pricing, grid, docs]

Run the command:

Terminal window
imgbin batch assets/jobs/launch.yaml

The batch job design supports failure isolation: items in the manifest are processed one by one, and a failure in one item does not affect the others. You can also preview the job with --dry-run without actually executing it.

And the best part is that it tells you exactly what succeeded and what failed. Unlike some things in life, where failure happens and you are left not even knowing how it happened.

Run AI recognition on existing images to automatically generate titles, tags, and descriptions:

Terminal window
# Annotate a single image
imgbin annotate ./library/2026-03/orange-dashboard
# Annotate an entire directory in batch
imgbin annotate ./library/2026-03/

Generate thumbnails for existing images:

Terminal window
# Generate a thumbnail
imgbin thumbnail ./library/2026-03/orange-dashboard

The manifest format for batch jobs supports flexible configuration. Defaults can be set globally, and individual jobs can override them:

# Global defaults
defaults:
annotate: true # Enable AI annotation by default
thumbnail: true # Generate thumbnails by default
libraryRoot: ./library
model: gpt-image-1.5
jobs:
# Minimal configuration: only provide a prompt
- prompt: "first image"
# Full configuration
- prompt: "second image"
slug: custom-slug
tags: [tag1, tag2]
annotate: false # Do not run AI annotation for this job
model: dall-e-3 # Use a different model for this job

When executed, ImgBin processes jobs one by one. The result of each job is written to its corresponding metadata.json. Even if one job fails, the others are unaffected. After all jobs complete, the CLI outputs a summary report:

✓ orange-dashboard (succeeded)
✓ pricing-grid (succeeded)
✗ hero-banner (failed: API rate limit exceeded)
2/3 succeeded, 1 failed

Some things cannot be rushed. Taking them one at a time is often the steadier path. Maybe that is the philosophy behind batch jobs.


ImgBin supports flexible configuration through environment variables:

Terminal window
# ImgBin working directory
IMGBIN_WORKDIR=/path/to/imgbin
# Executable path (for invocation inside scripts)
IMGBIN_EXECUTABLE=/path/to/imgbin/dist/cli.js
# Asset library root
IMGBIN_LIBRARY_ROOT=./.imgbin-library
# Azure OpenAI configuration (if using the Azure provider)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=***
AZURE_OPENAI_IMAGE_DEPLOYMENT=gpt-image-1

Configuration is one of those things that can feel both important and not that important at the same time. In the end, whatever feels comfortable and fits your workflow best is usually the right choice.


During implementation, we summarized a few key points:

Interface definitions should be clear and complete, including input parameters, return values, and error handling. It is also a good idea to provide both synchronous and asynchronous invocation styles for different scenarios.

That is one small piece of hard-earned experience. Once an interface is set, nobody wants to keep changing it later.

When one item fails in a batch job, the CLI should:

  1. Write detailed error information to a separate log file.
  2. Continue executing other jobs instead of interrupting the whole process.
  3. Return a non-zero exit code at the end to indicate that some jobs failed.
  4. Clearly display the execution result of every job in the summary report.

Some failures are just failures. There is no point pretending otherwise. It is better to acknowledge them openly and then figure out how to solve them. The same logic applies to projects and to life.

Recognition results are written to the recognized section by default, while manually edited fields are marked in manual. Metadata updates follow an append-only strategy: unless --force is explicitly passed, existing manually curated results are not overwritten.

That point became clear too - some things, once overwritten, are just gone. It is often better to preserve them, because the record itself has value.

Use fs.mkdir({ recursive: true }) to ensure directory creation remains atomic and to avoid race conditions in concurrent scenarios.

Maybe that is what security feels like - being stable when stability matters, moving fast when speed matters, and never getting stuck second-guessing.


As the core tool for image asset management in the HagiCode project, ImgBin solves our problems through the following design choices:

  1. Unified entry point: the CLI covers generation, annotation, thumbnails, and all other core operations.
  2. Metadata-driven: every asset has a complete metadata.json, enabling search and traceability.
  3. Provider Adapter: flexible abstraction for external APIs, making testing and extension easier.
  4. Batch job support: batch image generation can be automated within CI/CD workflows.

Everything else may have faded, but this approach really did end up proving useful.

This solution not only improves HagiCode’s own development efficiency, but also forms a reusable framework for image asset management. If you are building a similarly multi-component project, I believe ImgBin’s design ideas may give you some inspiration.

Youth is all about trying things and making a bit of a mess. If you never put yourself through that, how would you know what you are really capable of?



Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was produced with AI-assisted collaboration, reviewed by me, and reflects my own views and position.