Skip to content

CLI

5 posts with the tag “CLI”

How to Install and Use Hermes: A Quick Start from the Local CLI to Feishu Integration

If you want to install Hermes and start using it, the shortest path is really just three steps:

  1. Run the official installation command
  2. Start the CLI in your terminal with hermes
  3. If you want to keep using it in Feishu, then configure hermes gateway setup

This article does not try to explain every Hermes capability all at once. Instead, it helps you complete the most important beginner loop first: install it, get it running, start using it, and then connect it to one of the most common messaging-platform scenarios.

Hermes Agent is an AI agent that you can use either from a local terminal or through a messaging-platform gateway.

For most developers, it has two common entry points:

  • CLI: Type hermes in your terminal to enter the interactive interface directly.
  • Messaging Gateway: Run hermes gateway, then chat with it from platforms such as Feishu, Telegram, Discord, and Slack.

If your goal right now is simply to get started quickly, do not reverse the order. Start with this path instead:

  • Install Hermes first
  • Verify it works from the CLI first
  • Then decide whether you want to connect a messaging platform

This makes problems easier to diagnose and is more suitable for people using Hermes for the first time.

According to the Hermes README, the official quick-install path supports these environments:

  • Linux
  • macOS
  • WSL2
  • Android via Termux

Hermes does not currently support running directly on native Windows. If you are using Windows, the recommended approach is to install WSL2 first and then run the installation command inside WSL2.

It is best to make this clear at the beginning, because many installation failures are not caused by the command itself, but by using an unsupported runtime environment.

The quick installation command provided in the Hermes README is:

Terminal window
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

This command runs the official installation script and handles platform-specific initialization steps.

Once installation finishes, reload your shell environment first. The most common command is:

Terminal window
source ~/.bashrc

If you use zsh, you can use:

Terminal window
source ~/.zshrc

How to confirm Hermes is installed correctly

Section titled “How to confirm Hermes is installed correctly”

The most direct way to check is to run:

Terminal window
hermes

If you want additional confirmation that your configuration and dependencies are working, you can also run:

Terminal window
hermes doctor

hermes doctor is especially useful in these situations:

  • The command behaves abnormally after installation
  • Model configuration fails
  • The gateway fails to start
  • You are not sure whether your environment dependencies are complete

How to start using Hermes for the first time

Section titled “How to start using Hermes for the first time”

If you just want to confirm as quickly as possible that Hermes works, the simplest method is:

Terminal window
hermes

This launches the interactive Hermes CLI. For first-time Hermes users, it is also the most recommended starting point, because you can verify the most essential things first:

  • Whether the command is actually available
  • Whether the current model configuration works properly
  • Whether the terminal toolchain is working correctly
  • Whether the interaction style matches what you need

These commands are enough for your first round of setup

Section titled “These commands are enough for your first round of setup”

The Hermes README lists several high-frequency commands, and together they form a practical first-use path:

Terminal window
hermes model
hermes tools
hermes config set
hermes setup
hermes update
hermes doctor

If you are not sure what each one does, remember them like this:

  • hermes model: choose or switch models
  • hermes tools: view and configure currently available tools
  • hermes config set: change specific configuration items
  • hermes setup: run the full initialization wizard once
  • hermes update: update Hermes
  • hermes doctor: troubleshoot problems

For beginners, the most practical order is usually:

  1. Run hermes model first
  2. If you want to configure all common options at once, then run hermes setup

1. Use Hermes in the terminal as a daily development assistant

Section titled “1. Use Hermes in the terminal as a daily development assistant”

CLI mode is a good fit for these scenarios:

  • Ask questions directly while writing code locally
  • Inspect projects, edit files, and run commands
  • Do one-off debugging or review work
  • Collaborate continuously in the current working directory

Its biggest advantage is that it is the shortest path: no extra platform integration, no bot configuration to handle up front, and it is the best way to build your first set of usage habits.

2. Use Hermes through a messaging platform

Section titled “2. Use Hermes through a messaging platform”

If you want to chat with Hermes on platforms such as Feishu, Telegram, or Discord, you need to use the messaging gateway.

The most common entry commands are:

Terminal window
hermes gateway setup
hermes gateway

Specifically:

  • hermes gateway setup is used for interactive platform configuration
  • hermes gateway is used to start the gateway process

According to the official documentation, the gateway is a unified background process that connects your configured platforms, manages sessions, and handles features such as cron jobs.

Using Feishu as an example: how to connect Hermes to a messaging platform

Section titled “Using Feishu as an example: how to connect Hermes to a messaging platform”

If most of your daily work happens in Feishu, then Feishu/Lark is a very natural way to use Hermes.

The official documentation recommends this entry command for Feishu/Lark:

Terminal window
hermes gateway setup

After you run it, simply choose Feishu / Lark in the wizard.

The Feishu documentation describes two connection modes:

  • websocket: recommended
  • webhook: optional

If Hermes runs on your laptop, workstation, or private server, using websocket first is usually simpler because you do not need to expose a public callback URL.

If you configure it manually, at least know these variables

Section titled “If you configure it manually, at least know these variables”

If you are not using the wizard and are writing the configuration manually, the Feishu documentation lists these core variables:

Terminal window
FEISHU_APP_ID=cli_xxx
FEISHU_APP_SECRET=***
FEISHU_DOMAIN=feishu
FEISHU_CONNECTION_MODE=websocket
FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy
FEISHU_HOME_CHANNEL=oc_xxx

Two of them deserve special attention:

  • FEISHU_ALLOWED_USERS: recommended, so not everyone who can reach the bot can use it directly
  • FEISHU_HOME_CHANNEL: lets you predefine a home chat to receive cron results or default notifications

Why Hermes sometimes does not reply in Feishu group chats

Section titled “Why Hermes sometimes does not reply in Feishu group chats”

This detail is easy to miss: in Feishu group chats, Hermes does not respond to every message by default.

The official documentation clearly states:

  • In direct messages, Hermes responds to messages
  • In group chats, you must explicitly @ the bot before it will process the message

If you want to set a Feishu conversation as the home channel, you can also use this in the chat:

/set-home

Or define it in the configuration ahead of time:

Terminal window
FEISHU_HOME_CHANNEL=oc_xxx

The Hermes commands beginners should remember first

Section titled “The Hermes commands beginners should remember first”

Whether you use Hermes in the CLI or on a messaging platform, remembering the following commands is already enough to get started:

  • /new or /reset: start a new session
  • /model: view or switch the model
  • /retry: retry the previous turn
  • /undo: undo the previous interaction
  • /compress: manually compress the context
  • /help: view help

If you mainly use Hermes on a messaging platform, remember one more:

  • /sethome or /set-home: set the current chat as the home channel

These commands cover the most common beginner-stage operations: restarting, adjusting, rolling back, checking, and continuing.

No. The current official documentation clearly states that native Windows is not supported, and WSL2 is recommended.

What should I do if typing hermes does nothing after installation?

Section titled “What should I do if typing hermes does nothing after installation?”

It is best to troubleshoot in this order:

  1. Reload your shell first, for example with source ~/.bashrc
  2. Run hermes again
  3. If it is still abnormal, run hermes doctor

Why does the bot not reply in a Feishu group?

Section titled “Why does the bot not reply in a Feishu group?”

Check these three things first:

  • Whether you @ mentioned Hermes in the group
  • Whether FEISHU_ALLOWED_USERS restricts the current user
  • Whether the current group-chat policy allows handling group messages

According to the official Feishu documentation, explicitly using an @mention is required in group-chat scenarios.

If you simply want to start using Hermes as quickly as possible, this is the most recommended order:

  1. Run the installation command first
  2. Start with hermes in the local CLI first
  3. Use hermes model and hermes setup to complete the basic configuration
  4. If you want to keep using it in Feishu, then configure hermes gateway setup

If this article is the first part of a series, its best role is not to explain every advanced feature all at once, but to get users in the door first.

The following topics are better split into follow-up articles:

  • A complete Hermes Feishu integration guide
  • A guide to common Hermes slash commands
  • A guide to Hermes gateway configuration and troubleshooting

If you plan to keep creating Hermes content, this article can serve as the starting point for later posts, while you gradually build out the internal link structure.

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Why Use Skillsbase to Maintain Your Own Skills Collection Repository

Section titled “Why Use Skillsbase to Maintain Your Own Skills Collection Repository”

It is kind of funny when you think about it: the era of AI programming has arrived, and the Agent Skills we keep on hand are becoming more and more numerous. But along with that comes more and more hassle. This article is about how we used skillsbase to solve those problems.

In the age of AI programming, developers need to maintain an increasing number of Agent Skills - reusable instruction sets that extend the capabilities of coding assistants such as Claude Code, OpenCode, and Cursor. However, as the number of skills grows, a practical problem gradually emerges:

It is not exactly a major problem, but once you have too many things, managing them becomes troublesome.

Skills are scattered across different locations, making management costly

Section titled “Skills are scattered across different locations, making management costly”
  • Local skills are scattered in multiple places: ~/.agents/skills/, ~/.claude/skills/, ~/.codex/skills/.system/, and so on
  • Different locations may have naming conflicts, for example skill-creator existing in both the user directory and the system directory
  • There is no unified management entry point, which makes backup and migration difficult

This part is genuinely annoying. Sometimes you do not even know where a certain skill actually is. It feels like losing something and then struggling to find it.

Lack of a standardized maintenance workflow

Section titled “Lack of a standardized maintenance workflow”
  • Manually copying skills is error-prone and makes it difficult to trace their origins
  • Without a unified validation mechanism, there is no guarantee that the skill repository remains complete
  • During team collaboration, synchronizing and sharing a skill collection is difficult

Manual work is always prone to mistakes. Human memory is limited, after all. Who can remember where every single thing came from?

Failing to meet reproducibility requirements

Section titled “Failing to meet reproducibility requirements”
  • When switching development machines, all skills need to be configured again
  • In CI/CD environments, the skill repository cannot be validated and synchronized automatically

Changing to a different computer means doing everything all over again. It feels, in a way, just like moving house - troublesome every single time. You have to adapt to the new environment and reconfigure everything again.

To address these pain points, we tried many different approaches: from manual copying to scripted automation, from directly managing directories to globally installing and then recovering files. Each approach had its own flaws. Some could not guarantee consistency, some polluted the environment, and some were hard to use in CI.

We definitely took quite a few detours.

In the end, we found a more elegant solution: skillsbase. The core idea behind this approach is to install and validate locally first, then convert the structure and write it into the repository, and finally uninstall the temporary files. This ensures that the repository contents match the actual installation result while avoiding pollution of the global environment.

It sounds simple when you put it that way, but we only figured it out after stepping into quite a few pitfalls.

The solution shared in this article comes from our hands-on experience in the HagiCode project.

HagiCode is an AI coding assistant project. During development, we need to maintain a large number of Agent Skills to extend various coding capabilities. These real-world needs are exactly what pushed us to build the skillsbase toolset for standardized management of skill repositories.

This was not invented out of thin air. We were pushed into it by real needs. Once the number of skills grows, management naturally becomes necessary. When problems appear during management, solutions become necessary too. Step by step, that is how we got here.

If you are interested in HagiCode, you can visit the official website to learn more or check the source code on GitHub.

To build a maintainable skills collection repository, the following core problems need to be solved:

  1. Unified namespace conflicts: when multiple sources contain skills with the same name, how do we avoid overwriting them?
  2. Source traceability: how do we record the source of each skill for future updates and audits?
  3. Synchronization and validation: how do we ensure that repository contents stay consistent with the actual installation results?
  4. Automation integration: how do we integrate with CI/CD workflows to enable automatic synchronization and validation?

These problems may look simple, but every single one of them is a headache. Then again, what worthwhile work is ever easy?

Option 1: Copy directories directly

Pros: simple to implement Cons: cannot guarantee consistency with the actual installation result of the skills CLI

We did think about this approach. Later, however, we realized that the CLI may apply some preprocessing logic during installation. Direct copying skips that step. As a result, what you copy is not the same as what is actually installed, and that becomes a problem.

Option 2: Install globally and then recover

Pros: the installation process can be validated Cons: pollutes the execution environment, and it is hard to keep CI and local results consistent

This approach is even worse. A global installation pollutes the environment. More importantly, it is difficult to keep the CI environment consistent with the local environment, which leads to the classic “works on my machine, fails in CI” problem. Anyone who has dealt with that knows how painful it is.

Option 3: Local install -> convert -> uninstall (final solution)

This is the approach adopted by skillsbase:

  • First install skills into a temporary location with npx skills
  • Convert the directory structure and add source metadata
  • Write the result into the target repository
  • Finally uninstall the temporary files

This approach ensures that repository contents are consistent with the actual installation results seen by consumers, avoids polluting the global environment, standardizes the conversion process, and supports idempotent operations.

This solution was not obvious from the beginning either. We simply learned through enough trial and error what works and what does not.

Decision ItemChoiceReason
RuntimeNode.js ESMNo build step required; .mjs is enough to orchestrate the file system
Configuration formatYAML (sources.yaml)Highly readable and suitable for manual maintenance
Naming strategyNamespace prefixUser skills keep their original names, while system skills receive the system- prefix
Workflowadd updates the manifest -> sync executes synchronizationA single synchronization engine avoids implementing the same rules twice
File managementManaged file markersAdd a comment header to support safe overwrites

These decisions all come down to one goal: making things simple. Simplicity wins in the end.

The skillsbase CLI provides four core commands:

skillsbase
├── init # Initialize repository structure
├── sync # Synchronize skill content
├── add # Add new skills
└── github_action # Generate GitHub Actions configuration

There are not many commands, but they are enough. A tool only needs to be useful.

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ init │───▶│ add │───▶│ sync │───▶│github_action│
│ initialize │ │ add source │ │ sync content│ │ generate CI │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Take it one step at a time. No need to rush.

sources.yaml -> parse sources -> npx skills install -> convert structure -> write to skills/ -> uninstall temporary files
.skill-source.json (source metadata)

This workflow is fairly clear. At least when I look at it, I can understand what each step is doing.

repos/skillsbase/
├── sources.yaml # Source manifest (single source of truth)
├── skills/ # Skills directory
│ ├── frontend-design/ # User skill
│ ├── skill-creator/ # User skill
│ └── system-skill-creator/ # System skill (with prefix)
├── scripts/
│ ├── sync-skills.mjs # Synchronization script
│ └── validate-skills.mjs # Validation script
├── docs/
│ └── maintainer-workflow.md # Maintainer documentation
└── .github/
├── workflows/
│ └── skills-sync.yml # CI workflow
└── actions/
└── skillsbase-sync/
└── action.yml # Reusable Action

There are quite a few files, but that is fine. Once the structure is organized clearly, maintenance becomes much easier.

Terminal window
# 1. Create an empty repository
mkdir repos/myskills && cd repos/myskills
git init
# 2. Initialize it with skillsbase
npx skillsbase init
# Output:
# [1/4] create manifest ................. done
# [2/4] create scripts .................. done
# [3/4] create docs ..................... done
# [4/4] create github workflow .......... done
#
# next: skillsbase add <skill-name>

This step generates a lot of files, but there is no need to worry - they are all generated automatically. After that, you can start adding skills.

Terminal window
# Add a single skill (this automatically triggers synchronization)
npx skillsbase add frontend-design --source vercel-labs/agent-skills
# Add from a local source
npx skillsbase add documentation-writer --source /home/user/.agents/skills
# Output:
# source: first-party ......... updated
# target: skills/frontend-design ... synced
# status: 1 skill added, 0 removed

Adding a skill is very simple. One command is enough. Sometimes, though, you may hit unexpected issues such as poor network conditions or permission problems. Those are manageable - just take them one at a time.

Terminal window
# Perform synchronization (reconcile all sources)
npx skillsbase sync
# Only check for drift (do not modify files)
npx skillsbase sync --check
# Allow missing sources (CI scenario)
npx skillsbase sync --allow-missing-sources

During synchronization, the system checks every source defined in sources.yaml and reconciles them with the contents under the skills/ directory. If differences exist, it updates them; if there are no differences, it skips them. This prevents the “configuration changed but files did not” problem.

Terminal window
# Generate workflow
npx skillsbase github_action --kind workflow
# Generate action
npx skillsbase github_action --kind action
# Generate everything
npx skillsbase github_action --kind all

The CI configuration is generated automatically as well. You still need to adjust some details yourself, such as trigger conditions and runtime environments, but that is not difficult.

# Skills root directory configuration
skillsRoot: skills/
metadataFile: .skill-source.json
# Source definitions
sources:
# First-party: local user skills
first-party:
type: local
path: /home/user/.agents/skills
naming: original # Keep original name
includes:
- documentation-writer
- frontend-design
- skill-creator
# System: skills provided by the system
system:
type: local
path: /home/user/.codex/skills/.system
naming: prefix-system # Add system- prefix
includes:
- imagegen
- openai-docs
- skill-creator # Becomes system-skill-creator
# Remote: third-party repository
vercel:
type: remote
url: vercel-labs/agent-skills
naming: original
includes:
- web-design-guidelines

This configuration file is the core of the entire system. All sources are defined here. Change this file, and the next synchronization will apply the new state. In that sense, it is truly a “single source of truth.”

{
"source": "first-party",
"originalPath": "/home/user/.agents/skills/documentation-writer",
"originalName": "documentation-writer",
"targetName": "documentation-writer",
"syncedAt": "2026-04-07T00:00:00.000Z",
"version": "unknown"
}

Every skill directory contains this file, recording its source information. That way, when something goes wrong later, you can quickly locate where it came from and when it was synchronized.

Terminal window
# Validate repository structure
node scripts/validate-skills.mjs
# Validate with the skills CLI
npx skills add . --list
# Check for updates
npx skills check

Validation is one of those things that can feel both important and optional. Still, for the sake of safety, it never hurts to run it from time to time. After all, you never know when something unexpected might happen.

.github/workflows/skills-sync.yml
name: Skills Sync
on:
push:
paths:
- 'sources.yaml'
- 'skills/**'
workflow_dispatch:
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Validate repository
run: |
npx skills add . --list
node scripts/validate-skills.mjs
- name: Sync check
run: npx skillsbase sync --check

Once CI integration is in place, every change to sources.yaml or the skills/ directory automatically triggers validation. That prevents the situation where changes were made locally but synchronization was forgotten.

  1. Handle naming conflicts: add the system- prefix to system skills consistently. This keeps every skill available while avoiding naming conflicts.
  2. Idempotent operations: all commands support repeated execution, and running sync multiple times does not produce side effects. This is especially important in CI.
  3. Managed files: generated files include the # Managed by skillsbase CLI comment, making them easy to identify and manage. These files can be safely overwritten, and manual modifications are not preserved.
  4. Non-interactive mode: CI environments use deterministic behavior by default, so interactive prompts do not interrupt execution. All configuration is declared through sources.yaml.
  5. Source traceability: every skill has a .skill-source.json file recording its source information, making troubleshooting much faster.
Terminal window
# Team members install the shared skills repository
npx skills add your-org/myskills -g --all
# Clone locally and validate
git clone https://github.com/your-org/myskills.git
cd myskills
npx skills add . --list

By managing the skills repository with Git, team members can easily synchronize their skill collection and ensure that everyone uses the same versions of tools and configuration.

This is especially useful in team collaboration. You no longer run into situations where “it works for me but not for you.” Once the environment is unified, half the problems disappear.

The core value of using skillsbase to maintain a skills collection repository lies in the following:

  • Security: source validation, conflict detection, and managed file protection
  • Maintainability: a unified entry point, idempotent operations, and configuration-as-documentation
  • Standardization: a unified directory structure, naming conventions, and metadata format
  • Automation: CI/CD integration, automatic synchronization, and automatic validation

With this approach, developers can manage their own Agent Skills the same way they manage npm packages, building a reproducible, shareable, and maintainable skills repository system.

The tools and workflow shared in this article are exactly what we refined through real mistakes and real optimization while building HagiCode. If you find this approach valuable, that is a good sign that our engineering direction is the right one - and that HagiCode itself is worth your attention as well.

After all, good tools deserve to be used by more people.

If this article helped you:


This article was first published on the HagiCode Blog.

Thank you for reading. If you found this article useful, you are welcome to like it, save it, and share it in support. This content was created with AI-assisted collaboration, and the final version was reviewed and confirmed by the author.

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Hagicode and GLM-5.1 Multi-CLI Integration Guide

Section titled “Hagicode and GLM-5.1 Multi-CLI Integration Guide”

In the Hagicode project, users can choose from multiple CLI tools to drive AI programming assistants, including Claude Code CLI, GitHub Copilot, OpenCode CLI, Codebuddy CLI, Hermes CLI, and more. These CLI tools are general-purpose AI programming tools on their own, but through Hagicode’s abstraction layer, they can flexibly connect to different AI model providers.

Zhipu AI (ZAI) provides an interface compatible with the Anthropic Claude API, allowing these CLI tools to directly use domestic GLM series models. Among them, GLM-5.1 is Zhipu’s latest large language model release, with significant improvements over GLM-5.0.

Hagicode defines 11 CLI provider types through the AIProviderType enum, covering mainstream AI programming CLI tools:

public enum AIProviderType
{
ClaudeCodeCli = 0, // Claude Code CLI
CodexCli = 1, // GitHub Copilot Codex
GitHubCopilot = 2, // GitHub Copilot
CodebuddyCli = 3, // Codebuddy CLI
OpenCodeCli = 4, // OpenCode CLI
IFlowCli = 5, // IFlow CLI
HermesCli = 6, // Hermes CLI
QoderCli = 7, // Qoder CLI
KiroCli = 8, // Kiro CLI
KimiCli = 9, // Kimi CLI
GeminiCli = 10, // Gemini CLI
}

Each CLI has corresponding model parameter configuration and supports the model and reasoning parameters:

private static readonly IReadOnlyDictionary<AIProviderType, IReadOnlyList<string>> ManagedModelParameterKeysByProvider =
new Dictionary<AIProviderType, IReadOnlyList<string>>
{
[AIProviderType.ClaudeCodeCli] = ["model", "reasoning"],
[AIProviderType.CodexCli] = ["model", "reasoning"],
[AIProviderType.OpenCodeCli] = ["model", "reasoning"],
[AIProviderType.HermesCli] = ["model", "reasoning"],
[AIProviderType.CodebuddyCli] = ["model", "reasoning"],
[AIProviderType.QoderCli] = ["model", "reasoning"],
[AIProviderType.KiroCli] = ["model", "reasoning"],
[AIProviderType.GeminiCli] = ["model"], // Gemini does not support the reasoning parameter
// ...
};

Hagicode’s Secondary Professions Catalog defines complete support for the GLM model series:

Model IDNameDefault ReasoningCompatible CLI Families
glm-4.7GLM 4.7highclaude, codebuddy, hermes, qoder, kiro
glm-5GLM 5highclaude, codebuddy, hermes, qoder, kiro
glm-5-turboGLM 5 Turbohighclaude, codebuddy, hermes, qoder, kiro
glm-5.0GLM 5.0 (Legacy)highclaude, codebuddy, hermes, qoder, kiro
glm-5.1GLM 5.1highclaude, codebuddy, hermes, qoder, kiro

Key differences between GLM-5.1 and GLM-5.0

Section titled “Key differences between GLM-5.1 and GLM-5.0”

From the implementation in AcpSessionModelBootstrapper.cs, we can clearly see the differences between GLM-5.1 and GLM-5.0:

GLM-5.1 is a standalone new model identifier with no legacy handling logic:

private const string Glm51ModelValue = "glm-5.1";

Definition in the Secondary Professions Catalog:

{
"id": "secondary-glm-5-1",
"name": "GLM 5.1",
"family": "anthropic",
"summary": "hero.professionCopy.secondary.glm51.summary",
"sourceLabel": "hero.professionCopy.sources.aiSharedAnthropicModel",
"sortOrder": 64,
"supportsImage": true,
"compatiblePrimaryFamilies": [
"claude",
"codebuddy",
"hermes",
"qoder",
"kiro"
],
"defaultParameters": {
"model": "glm-5.1",
"reasoning": "high"
}
}

Zhipu AI provides the most complete GLM model support:

{
"providerId": "zai",
"name": "智谱 AI",
"description": "智谱 AI 提供的 Claude API 兼容服务",
"category": "china-providers",
"apiUrl": {
"codingPlanForAnthropic": "https://open.bigmodel.cn/api/anthropic"
},
"recommended": true,
"region": "cn",
"defaultModels": {
"sonnet": "glm-4.7",
"opus": "glm-5",
"haiku": "glm-4.5-air"
},
"supportedModels": [
"glm-4.7",
"glm-5",
"glm-4.5-air",
"qwen3-coder-next",
"qwen3-coder-plus"
],
"features": ["experimental-agent-teams"],
"authTokenEnv": "ANTHROPIC_AUTH_TOKEN",
"referralUrl": "https://www.bigmodel.cn/claude-code?ic=14BY54APZA",
"documentationUrl": "https://open.bigmodel.cn/dev/api"
}

Features:

  • Supports the widest variety of GLM model variants
  • Provides default mapping across the Sonnet/Opus/Haiku tiers
  • Supports the experimental-agent-teams feature

Claude Code CLI is one of Hagicode’s core CLIs and is configured through the Hero configuration system:

{
"primaryProfessionId": "profession-claude-code",
"secondaryProfessionId": "secondary-glm-5-1",
"model": "glm-5.1",
"reasoning": "high"
}

Corresponding HeroEquipmentCatalogItem configuration:

{
id: 'secondary-glm-5-1',
name: 'GLM 5.1',
family: 'anthropic',
kind: 'model',
primaryFamily: 'claude',
compatiblePrimaryFamilies: ['claude', 'codebuddy', 'hermes', 'qoder', 'kiro'],
defaultParameters: {
model: 'glm-5.1',
reasoning: 'high'
}
}

OpenCode CLI is the most flexible CLI and supports specifying any model in the provider/model format:

Method 1: Use the ZAI provider prefix

{
"primaryProfessionId": "profession-opencode",
"model": "zai/glm-5.1",
"reasoning": "high"
}

Method 2: Use the model ID directly

{
"model": "glm-5.1"
}

Method 3: Frontend configuration UI

In HeroModelEquipmentForm.tsx, OpenCode CLI has a dedicated placeholder hint:

const OPEN_CODE_MODEL_PLACEHOLDER = 'myprovider/glm-4.7';
const modelPlaceholder = primaryProviderType === PCode_Models_AIProviderType.OPEN_CODE_CLI
? OPEN_CODE_MODEL_PLACEHOLDER
: 'gpt-5.4';

Users can enter:

zai/glm-5.1
glm-5.1

OpenCode CLI model parsing logic:

internal OpenCodeModelSelection? ResolveModelSelection(string? rawModel)
{
var normalized = NormalizeOptionalValue(rawModel);
if (normalized == null) return null;
var slashIndex = normalized.IndexOf('/');
if (slashIndex < 0)
{
// No slash: use the model ID directly
return new OpenCodeModelSelection {
ProviderId = string.Empty,
ModelId = normalized,
};
}
// Slash exists: parse the provider/model format
var providerId = normalized[..slashIndex].Trim();
var modelId = normalized[(slashIndex + 1)..].Trim();
return new OpenCodeModelSelection {
ProviderId = providerId,
ModelId = modelId,
};
}

Codebuddy CLI has dedicated legacy handling logic:

{
"primaryProfessionId": "profession-codebuddy",
"model": "glm-5.1",
"reasoning": "high"
}

Note: Codebuddy retains special handling for GLM-5.0 and does not use legacy normalization:

return !string.Equals(providerName, "CodebuddyCli", StringComparison.OrdinalIgnoreCase)
&& string.Equals(normalizedModel, LegacyGlm5TurboModelValue, StringComparison.OrdinalIgnoreCase)
? Glm5TurboModelValue
: normalizedModel;
// For CodebuddyCli, glm-5.0 is not normalized to glm-5-turbo
Terminal window
# Set the API key
export ANTHROPIC_AUTH_TOKEN="***"
# Optional: specify the API endpoint (ZAI uses this endpoint by default)
export ANTHROPIC_BASE_URL="https://open.bigmodel.cn/api/anthropic"
Terminal window
# Set the API key
export ANTHROPIC_AUTH_TOKEN="your-a...-key"
# Specify the Alibaba Cloud endpoint
export ANTHROPIC_BASE_URL="https://coding.dashscope.aliyuncs.com/apps/anthropic"

Compared with GLM-5.0, GLM-5.1 brings the following significant improvements:

According to Zhipu’s official release information, improvements in GLM-5.1 include:

  • Stronger code understanding: More accurate analysis of complex code structures
  • Longer context comprehension: Supports longer conversational context
  • Enhanced tool calling: Higher success rate for MCP tool calls
  • Output stability: Reduces randomness and hallucinations

GLM-5.1 covers all mainstream CLIs supported by Hagicode:

compatiblePrimaryFamilies: [
"claude", // Claude Code CLI
"codebuddy", // Codebuddy CLI
"hermes", // Hermes CLI
"qoder", // Qoder CLI
"kiro" // Kiro CLI
]

Make sure the ANTHROPIC_AUTH_TOKEN environment variable is set correctly. It is the required credential for every CLI to connect to the model.

GLM-5.1 needs to be enabled by the corresponding model provider:

  • The Zhipu AI ZAI platform supports it by default
  • Alibaba Cloud DashScope may require a separate application

When using the provider/model format, make sure the provider ID is correct:

  • Zhipu AI: zai or zhipuai
  • Alibaba Cloud: aliyun or dashscope
  • high is recommended for the best code generation results
  • Gemini CLI does not support the reasoning parameter and will ignore this configuration automatically

Through a unified abstraction layer, Hagicode enables flexible integration between GLM-5.1 and multiple CLIs. Developers can choose the CLI tool that best fits their preferences and usage scenarios, then use the latest GLM-5.1 model through simple configuration.

As Zhipu’s latest model version, GLM-5.1 offers clear improvements over GLM-5.0:

  • An independent version identifier with no legacy burden
  • Stronger reasoning and code understanding
  • Broad multi-CLI compatibility
  • Flexible reasoning level configuration

With the correct environment variables and Hero equipment configured, users can fully unlock the power of GLM-5.1 across different CLI environments.

If you want to put GLM-5.1, multi-CLI orchestration, and HagiCode’s configuration model into real use, these are the fastest entry points:

Once you compare Kimi, Claude Code, OpenCode, and other CLIs inside the same abstraction layer, questions about model switching, parameter mapping, and engineering boundaries tend to become much easier to reason about.

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs

Section titled “Hagicode.Libs: Engineering Practice for Unified Integration of Multiple AI Coding Assistant CLIs”

During the development of the HagiCode project, we needed to integrate multiple AI coding assistant CLIs at the same time, including Claude Code, Codex, and CodeBuddy. Each CLI has different interfaces, parameters, and output formats, and the repeated integration code made the project harder and harder to maintain. In this article, we share how we built a unified abstraction layer with HagiCode.Libs to solve this engineering pain point. You could also say it is simply some hard-earned experience gathered from the pitfalls we have already hit.

The market for AI coding assistants is quite lively now. Besides Claude Code, there are also OpenAI’s Codex, Zhipu’s CodeBuddy, and more. As an AI coding assistant project, HagiCode needs to integrate these different CLI tools across multiple subprojects, including desktop, backend, and web.

At first, the problem was manageable. Integrating one CLI was only a few hundred lines of code. But as the number of CLIs we needed to support kept growing, things started to get messy.

Each CLI has its own command-line argument format, different environment variable requirements, and a wide variety of output formats. Some output JSON, some output streaming JSON, and some output plain text. On top of that, there are cross-platform compatibility issues. Executable discovery and process management work very differently between Windows and Unix systems, so code duplication kept increasing. In truth, it was just a bit more Ctrl+C and Ctrl+V, but maintenance quickly became painful.

The most frustrating part was that every time we wanted to add support for a new CLI capability, we had to change the same code in several projects. That approach was clearly not sustainable in the long run. Code has a temper too; duplicate it too many times and it starts causing trouble.

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project that needs to maintain multiple subprojects at the same time, including a frontend VSCode extension, backend AI services, and a cross-platform desktop client. In a way, it was exactly this complex, multi-language, multi-platform environment that led to the birth of HagiCode.Libs. You could say we were forced into it, and so be it.

Although these AI coding assistant CLIs each have their own characteristics, from a technical perspective they share several obvious traits:

Similar interaction patterns: they all start a CLI process, send a prompt, receive streaming responses, parse messages, and then either end or continue the session. At the end of the day, the whole flow follows the same basic mold.

Similar configuration needs: they all need API key authentication, working directory setup, model selection, tool permission control, and session management. After all, everyone is making a living from APIs; the differences are mostly a matter of flavor.

The same cross-platform challenges: they all need to solve executable path resolution (claude vs claude.exe vs /usr/local/bin/claude), process startup and environment variable handling, shell command escaping, and argument construction. Cross-platform work is painful no matter how you describe it. Only people who have stepped into the traps really understand the difference between Windows and Unix.

Based on this analysis, we needed a unified abstraction layer that could provide a consistent interface, encapsulate cross-platform CLI discovery logic, handle streaming output parsing, and support both dependency injection and non-DI scenarios. It is the kind of problem that makes your head hurt just thinking about it, but you still have to face it. After all, it is our own project, so we have to finish it even if we have to cry our way through it.

We created HagiCode.Libs, a lightweight .NET 10 library workspace released under the MIT license and now published on GitHub. It may not be some world-shaking masterpiece, but it is genuinely useful for solving real problems.

HagiCode.Libs/
├── src/
│ ├── HagiCode.Libs.Core/ # Core capabilities
│ │ ├── Discovery/ # CLI executable discovery
│ │ ├── Process/ # Cross-platform process management
│ │ ├── Transport/ # Streaming message transport
│ │ └── Environment/ # Runtime environment resolution
│ ├── HagiCode.Libs.Providers/ # Provider implementations
│ │ ├── ClaudeCode/ # Claude Code provider
│ │ ├── Codex/ # Codex provider
│ │ └── Codebuddy/ # CodeBuddy provider
│ ├── HagiCode.Libs.ConsoleTesting/ # Testing framework
│ ├── HagiCode.Libs.ClaudeCode.Console/
│ ├── HagiCode.Libs.Codex.Console/
│ └── HagiCode.Libs.Codebuddy.Console/
└── tests/ # xUnit tests

When designing HagiCode.Libs, we followed a few principles. They all came from lessons learned the hard way:

Zero heavy framework dependencies: it does not depend on ABP or any other large framework, which keeps it lightweight. These days, the fewer dependencies you have, the fewer headaches you get. Most people have already been beaten up by dependency hell at least once.

Cross-platform support: native support for Windows, macOS, and Linux, without writing separate code for different platforms. One codebase that runs everywhere is a pretty good thing.

Streaming processing: CLI output is handled with asynchronous streams, which fits modern .NET programming patterns much better. Times change, and async is king.

Flexible integration: it supports dependency injection scenarios while also allowing direct instantiation. Different people have different preferences, so we wanted it to be convenient either way.

If your project already uses dependency injection, such as ASP.NET Core or the generic host, you can integrate it directly. It is a small thing, but a well-behaved one:

using HagiCode.Libs.Providers;
using Microsoft.Extensions.DependencyInjection;
var services = new ServiceCollection();
services.AddHagiCodeLibs();
await using var provider = services.BuildServiceProvider();
var claude = provider.GetRequiredService<ICliProvider<ClaudeCodeOptions>>();
var options = new ClaudeCodeOptions
{
ApiKey = "your-api-key",
Model = "claude-sonnet-4-20250514"
};
await foreach (var message in claude.ExecuteAsync(options, "Hello, Claude!"))
{
Console.WriteLine($"{message.Type}: {message.Content}");
}

If you are writing a simple script or working in a non-DI scenario, creating an instance directly also works. Put simply, it depends on your personal preference:

var claude = new ClaudeCodeProvider();
var options = new ClaudeCodeOptions
{
ApiKey = "sk-ant-xxx",
Model = "claude-sonnet-4-20250514"
};
await foreach (var message in claude.ExecuteAsync(options, "Help me write a quicksort"))
{
// Handle messages
}

Both approaches use the same underlying implementation, so you can choose the integration style that best fits your project. There is no universal right answer in this world. What suits you is the best option. It may sound cliché, but it is true.

Each provider has its own dedicated testing console project, making it easier to validate the integration independently. Testing is one of those things where if you are going to do it, you should do it properly:

Terminal window
# Claude Code tests
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-provider
dotnet run --project src/HagiCode.Libs.ClaudeCode.Console -- --test-all claude
# CodeBuddy tests
dotnet run --project src/HagiCode.Libs.Codebuddy.Console -- --test-provider codebuddy-cli
# Codex tests
dotnet run --project src/HagiCode.Libs.Codex.Console -- --test-provider codex-cli

The testing scenarios cover several key cases:

  • Ping: health check to confirm the CLI is available
  • Simple Prompt: basic prompt test
  • Complex Prompt: multi-turn conversation test
  • Session Restore/Resume: session recovery test
  • Repository Analysis: repository analysis test

This standalone testing console design is especially useful during debugging because it lets us quickly identify whether the issue is in the HagiCode.Libs layer or in the CLI itself. Debugging is really just about finding where the problem is. Once the direction is right, you are already halfway there.

Cross-platform compatibility is one of the core goals of HagiCode.Libs. We configured the GitHub Actions workflow .github/workflows/cli-discovery-cross-platform.yml to run real CLI discovery validation across ubuntu-latest, macos-latest, and windows-latest.

This ensures that every code change does not break cross-platform compatibility. During local development, you can also reproduce it with the following commands. After all, you cannot ask CI to take the blame for everything. Your local environment should be able to run it too:

Terminal window
npm install --global @anthropic-ai/claude-code@2.1.79
HAGICODE_REAL_CLI_TESTS=1 dotnet test --filter "Category=RealCli"

HagiCode.Libs uses asynchronous streams to process CLI output. Compared with traditional callback or event-based approaches, this fits the asynchronous programming style of modern .NET much better. In the end, this is simply how technology moves forward, whether anyone likes it or not:

public async IAsyncEnumerable<CliMessage> ExecuteAsync(
TOptions options,
string prompt,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Start the CLI process
// Parse streaming JSON output
// Yield the CliMessage sequence
}

The message types include:

  • user: user message
  • assistant: assistant response
  • tool_use: tool invocation
  • result: session end

This design lets callers handle streaming output flexibly, whether for real-time display, buffered post-processing, or forwarding to other services. Why worry whether the sky is sunny or cloudy? What matters is that once the idea opens up, you can use it however you like.

The HagiCode.Libs.Exploration module provides Git repository discovery and status checking, which is especially useful in repository analysis scenarios. This feature was also born out of necessity, because HagiCode needs to analyze repositories:

// Discover Git repositories
var repositories = await GitRepositoryDiscovery.DiscoverAsync("/path/to/search");
// Get repository information
var info = await GitRepository.GetInfoAsync(repoPath);
Console.WriteLine($"Branch: {info.Branch}, Remote: {info.RemoteUrl}");
Console.WriteLine($"Has uncommitted changes: {info.HasUncommittedChanges}");

HagiCode’s code analysis capabilities use this module to identify project structure and Git status. It is a good example of making full use of what we built.

Based on our practice in the HagiCode project, there are several points that deserve special attention. They are all real issues that need to be handled carefully:

API key security: do not hardcode API keys in your code. Use environment variables or configuration management instead. HagiCode.Libs supports passing configuration through Options objects, making it easier to integrate with different configuration sources. When it comes to security, there is no such thing as being too careful.

CLI version pinning: in CI/CD, we pin specific versions, such as @anthropic-ai/claude-code@2.1.79, to reduce uncertainty caused by version drift. It is also a good idea to use fixed versions in local development. Versioning can be painful. If you do not pin versions, the problem will teach you a lesson very quickly.

Test categorization: default tests use fake providers to keep them deterministic and fast, while real CLI tests must be enabled explicitly. This gives CI fast feedback while still allowing real-environment validation when needed. Striking that balance is never easy. Speed and stability always require trade-offs.

Session management: different CLIs have different session recovery mechanisms. Claude Code uses the .claude/ directory to store sessions, while Codex and CodeBuddy each have their own approaches. When using them, be sure to check their respective documentation and understand the details of their session persistence mechanisms. There is no harm in understanding it clearly.

HagiCode.Libs is the unified abstraction layer we built during the development of HagiCode to solve the repeated engineering work involved in multi-CLI integration. By providing a consistent interface, encapsulating cross-platform details, and supporting flexible integration patterns, it greatly reduces the engineering complexity of integrating multiple AI coding assistants. Much may fade away, but the experience remains.

If you also need to integrate multiple AI CLI tools in your project, or if you are interested in cross-platform process management and streaming message handling, feel free to check it out on GitHub. The project is released under the MIT license, and contributions and feedback are welcome. In the end, it is a happy coincidence that we met here, so since you are already here, we might as well become friends.

The approach shared in this article was shaped by real pitfalls and real optimization work inside HagiCode. What else could we do? Running into pitfalls is normal. If you think this solution is valuable, then perhaps our engineering work is doing all right. And HagiCode itself may also be worth your attention. You might even find a pleasant surprise.


If this article helped you:

Thank you for reading. If you found this article useful, you are welcome to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.

ImgBin CLI Tool Design: HagiCode's Image Asset Management Approach

ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach

Section titled “ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach”

This article explains how to build an automatable image asset pipeline from scratch, covering CLI tool design, a Provider Adapter architecture, and metadata management strategies.

Honestly, I did not expect image asset management to keep us tangled up for this long.

During HagiCode development, we ran into a problem that looked simple on the surface but was surprisingly thorny in practice: generating and managing image assets. In a way, it was like the dramas of adolescence - calm on the outside, turbulent underneath.

As the project accumulated more documentation and marketing materials, we needed a large number of supporting images. Some had to be AI-generated, some had to be selected from an existing asset library, and others needed AI recognition plus automatic labeling. The problem was that all of this had long been handled through scattered scripts and manual steps. Every time we generated an image, we had to run a script by hand, organize metadata by hand, and create thumbnails by hand. That alone was annoying enough, but the bigger issue was that everything was scattered everywhere. When we wanted to find something, we could not. When we needed to reuse something, we could not.

The pain points were concrete:

  1. No unified entry point: the logic for image generation was spread across different scripts, so batch execution was basically impossible.
  2. Missing metadata: generated images had no unified metadata.json, which meant no reliable searchability or traceability.
  3. High manual organization cost: titles and tags had to be sorted out one by one by hand, which was inefficient.
  4. No automation: automatically generating visual assets in a CI/CD pipeline? Not a chance.

We did think about just leaving it alone. But projects still need to move forward. Since we could not avoid the problem, we figured we might as well solve it. So we decided to upgrade ImgBin from a set of scattered scripts into an image asset pipeline that can be executed automatically. Some problems, after all, do not disappear just because you look away.

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI coding assistant project that simultaneously maintains multiple components, including a VSCode extension, backend AI services, and a cross-platform desktop client. In a complex, multilingual, cross-platform environment like this, standardized image asset management becomes a key part of improving development efficiency.

You could say this was one of those small growing pains in HagiCode’s journey. Every project has moments like that: a minor issue that looks insignificant, yet somehow manages to take up half the day.

HagiCode’s build system is based on the TypeScript + Node.js ecosystem, so ImgBin naturally adopted the same tech stack to keep the project technically consistent. Once you are used to one stack, switching to something else just feels like unnecessary trouble.


ImgBin uses a layered architecture that cleanly separates CLI commands, application services, third-party API adapters, and the infrastructure layer:

Component hierarchy
├── CLI Entry (cli.ts) Global argument parsing, command routing
├── Commands (commands/*) generate | batch | annotate | thumbnail
├── Application Services job-runner | metadata | thumbnail | asset-writer
├── Provider Adapters image-api-provider | vision-api-provider
└── Infrastructure Layer config | logger | paths | schema

The benefit of this layered design is clear responsibility boundaries. It also makes testing easier because external dependencies can be mocked cleanly. In practice, it just means each layer does its own job without getting in the way of the others, so when something breaks, it is easier to figure out why.

ImgBin uses a model of “one asset, one directory.” Every time an image is generated, it creates a structure like this:

library/
└── 2026-03/
└── orange-dashboard/
├── original.png # Original image
├── thumbnail.webp # 512x512 thumbnail
└── metadata.json # Structured metadata

The advantages of this model are:

  1. Self-contained: all files for a single asset live in the same directory, making migration and backup convenient.
  2. Traceable: metadata.json makes it possible to trace generation time, prompt, model, and other details.
  3. Extensible: if more variants are needed later, such as thumbnails in multiple sizes, we can simply add new files in the same directory.

Beautiful things do not always need to be possessed. Sometimes it is enough that they remain beautiful, and that you can quietly appreciate them. That may sound a little far afield, but the logic still holds here: once images are kept together, they are more pleasant to look at and much easier to find.

metadata.json is the core of the entire system. It uses a layered storage strategy that separates fields into three categories:

{
"schemaVersion": 2,
"assetId": "orange-dashboard",
"slug": "orange-dashboard",
"title": "Orange Dashboard",
"tags": ["dashboard", "hero", "orange"],
"source": { "type": "generated" },
"paths": {
"assetDir": "library/2026-03/orange-dashboard",
"original": "original.png",
"thumbnail": "thumbnail.webp"
},
"generated": {
"prompt": "orange dashboard for docs hero",
"provider": "azure-openai-image-api",
"model": "gpt-image-1.5"
},
"recognized": {
"title": "Orange Dashboard",
"tags": ["dashboard", "ui", "orange"],
"description": "A modern orange dashboard with charts and metrics"
},
"status": {
"generation": "succeeded",
"recognition": "succeeded",
"thumbnail": "succeeded"
},
"timestamps": {
"createdAt": "2026-03-11T04:01:19.570Z",
"updatedAt": "2026-03-11T04:02:09.132Z"
}
}
  • generated: records the original information from image generation, such as the prompt, provider, and model.
  • recognized: stores AI recognition results, such as auto-generated titles, tags, and descriptions.
  • manual: stores manually curated results. Data in this area has the highest priority and will not be overwritten by AI recognition.

This layered strategy resolves one of our earlier core conflicts: when AI recognition and manual curation disagree, which one should win? The answer is manual input. AI recognition is there to assist, not to decide. That question also became clearer over time - machines are still machines, and in the end, people still need to make the call.


Another core part of ImgBin is the Provider Adapter pattern. We abstract external APIs behind a unified interface so that even if we switch AI service providers, we do not need to change the business logic.

In a way, it is a bit like relationships - outward appearances can change, but what matters is that the inner structure stays the same. Once the interface is fixed, the internal implementation can vary freely.

interface ImageGenerationProvider {
// Generate an image and return its Buffer
generate(options: GenerateOptions): Promise<Buffer>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface GenerateOptions {
prompt: string;
model?: string;
size?: '1024x1024' | '1792x1024' | '1024x1792';
quality?: 'standard' | 'hd';
format?: 'png' | 'webp' | 'jpeg';
}
interface VisionRecognitionProvider {
// Recognize image content and return structured metadata
recognize(imageBuffer: Buffer): Promise<RecognitionResult>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface RecognitionResult {
title?: string;
tags: string[];
description?: string;
confidence: number;
}

The advantages of this interface design are:

  1. Testable: in unit tests, we can pass in mock providers instead of making real external API calls.
  2. Extensible: adding a new provider only requires implementing the interface; caller code does not need to change.
  3. Replaceable: production can use Azure OpenAI while testing can use a local model, with configuration being the only thing that changes.

Sometimes project work feels like that too. On the surface it looks like we just swapped an API, but the internal logic remains exactly the same, and that makes the whole thing a lot less scary.


ImgBin provides four core commands to cover different usage scenarios:

Terminal window
# Simplest usage
imgbin generate --prompt "orange dashboard for docs hero"
# Generate a thumbnail and AI annotations at the same time
imgbin generate --prompt "orange dashboard" --annotate --thumbnail
# Specify an output directory
imgbin generate --prompt "orange dashboard" --output ./library

Batch jobs are defined through YAML or JSON manifest files, which makes them suitable for CI/CD workflows:

assets/jobs/launch.yaml
defaults:
annotate: true
thumbnail: true
libraryRoot: ./library
jobs:
- prompt: "orange dashboard hero"
slug: orange-dashboard
tags: [dashboard, hero, orange]
- prompt: "pricing grid for docs"
slug: pricing-grid
tags: [pricing, grid, docs]

Run the command:

Terminal window
imgbin batch assets/jobs/launch.yaml

The batch job design supports failure isolation: items in the manifest are processed one by one, and a failure in one item does not affect the others. You can also preview the job with --dry-run without actually executing it.

And the best part is that it tells you exactly what succeeded and what failed. Unlike some things in life, where failure happens and you are left not even knowing how it happened.

Run AI recognition on existing images to automatically generate titles, tags, and descriptions:

Terminal window
# Annotate a single image
imgbin annotate ./library/2026-03/orange-dashboard
# Annotate an entire directory in batch
imgbin annotate ./library/2026-03/

Generate thumbnails for existing images:

Terminal window
# Generate a thumbnail
imgbin thumbnail ./library/2026-03/orange-dashboard

The manifest format for batch jobs supports flexible configuration. Defaults can be set globally, and individual jobs can override them:

# Global defaults
defaults:
annotate: true # Enable AI annotation by default
thumbnail: true # Generate thumbnails by default
libraryRoot: ./library
model: gpt-image-1.5
jobs:
# Minimal configuration: only provide a prompt
- prompt: "first image"
# Full configuration
- prompt: "second image"
slug: custom-slug
tags: [tag1, tag2]
annotate: false # Do not run AI annotation for this job
model: dall-e-3 # Use a different model for this job

When executed, ImgBin processes jobs one by one. The result of each job is written to its corresponding metadata.json. Even if one job fails, the others are unaffected. After all jobs complete, the CLI outputs a summary report:

✓ orange-dashboard (succeeded)
✓ pricing-grid (succeeded)
✗ hero-banner (failed: API rate limit exceeded)
2/3 succeeded, 1 failed

Some things cannot be rushed. Taking them one at a time is often the steadier path. Maybe that is the philosophy behind batch jobs.


ImgBin supports flexible configuration through environment variables:

Terminal window
# ImgBin working directory
IMGBIN_WORKDIR=/path/to/imgbin
# Executable path (for invocation inside scripts)
IMGBIN_EXECUTABLE=/path/to/imgbin/dist/cli.js
# Asset library root
IMGBIN_LIBRARY_ROOT=./.imgbin-library
# Azure OpenAI configuration (if using the Azure provider)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=***
AZURE_OPENAI_IMAGE_DEPLOYMENT=gpt-image-1

Configuration is one of those things that can feel both important and not that important at the same time. In the end, whatever feels comfortable and fits your workflow best is usually the right choice.


During implementation, we summarized a few key points:

Interface definitions should be clear and complete, including input parameters, return values, and error handling. It is also a good idea to provide both synchronous and asynchronous invocation styles for different scenarios.

That is one small piece of hard-earned experience. Once an interface is set, nobody wants to keep changing it later.

When one item fails in a batch job, the CLI should:

  1. Write detailed error information to a separate log file.
  2. Continue executing other jobs instead of interrupting the whole process.
  3. Return a non-zero exit code at the end to indicate that some jobs failed.
  4. Clearly display the execution result of every job in the summary report.

Some failures are just failures. There is no point pretending otherwise. It is better to acknowledge them openly and then figure out how to solve them. The same logic applies to projects and to life.

Recognition results are written to the recognized section by default, while manually edited fields are marked in manual. Metadata updates follow an append-only strategy: unless --force is explicitly passed, existing manually curated results are not overwritten.

That point became clear too - some things, once overwritten, are just gone. It is often better to preserve them, because the record itself has value.

Use fs.mkdir({ recursive: true }) to ensure directory creation remains atomic and to avoid race conditions in concurrent scenarios.

Maybe that is what security feels like - being stable when stability matters, moving fast when speed matters, and never getting stuck second-guessing.


As the core tool for image asset management in the HagiCode project, ImgBin solves our problems through the following design choices:

  1. Unified entry point: the CLI covers generation, annotation, thumbnails, and all other core operations.
  2. Metadata-driven: every asset has a complete metadata.json, enabling search and traceability.
  3. Provider Adapter: flexible abstraction for external APIs, making testing and extension easier.
  4. Batch job support: batch image generation can be automated within CI/CD workflows.

Everything else may have faded, but this approach really did end up proving useful.

This solution not only improves HagiCode’s own development efficiency, but also forms a reusable framework for image asset management. If you are building a similarly multi-component project, I believe ImgBin’s design ideas may give you some inspiration.

Youth is all about trying things and making a bit of a mess. If you never put yourself through that, how would you know what you are really capable of?



Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was produced with AI-assisted collaboration, reviewed by me, and reflects my own views and position.