Skip to content

TypeScript

4 posts with the tag “TypeScript”

ImgBin CLI Tool Design: HagiCode's Image Asset Management Approach

ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach

Section titled “ImgBin CLI Tool Design: HagiCode’s Image Asset Management Approach”

This article explains how to build an automatable image asset pipeline from scratch, covering CLI tool design, a Provider Adapter architecture, and metadata management strategies.

Honestly, I did not expect image asset management to keep us tangled up for this long.

During HagiCode development, we ran into a problem that looked simple on the surface but was surprisingly thorny in practice: generating and managing image assets. In a way, it was like the dramas of adolescence - calm on the outside, turbulent underneath.

As the project accumulated more documentation and marketing materials, we needed a large number of supporting images. Some had to be AI-generated, some had to be selected from an existing asset library, and others needed AI recognition plus automatic labeling. The problem was that all of this had long been handled through scattered scripts and manual steps. Every time we generated an image, we had to run a script by hand, organize metadata by hand, and create thumbnails by hand. That alone was annoying enough, but the bigger issue was that everything was scattered everywhere. When we wanted to find something, we could not. When we needed to reuse something, we could not.

The pain points were concrete:

  1. No unified entry point: the logic for image generation was spread across different scripts, so batch execution was basically impossible.
  2. Missing metadata: generated images had no unified metadata.json, which meant no reliable searchability or traceability.
  3. High manual organization cost: titles and tags had to be sorted out one by one by hand, which was inefficient.
  4. No automation: automatically generating visual assets in a CI/CD pipeline? Not a chance.

We did think about just leaving it alone. But projects still need to move forward. Since we could not avoid the problem, we figured we might as well solve it. So we decided to upgrade ImgBin from a set of scattered scripts into an image asset pipeline that can be executed automatically. Some problems, after all, do not disappear just because you look away.

The approach shared in this article comes from our hands-on experience in the HagiCode project. HagiCode is an AI coding assistant project that simultaneously maintains multiple components, including a VSCode extension, backend AI services, and a cross-platform desktop client. In a complex, multilingual, cross-platform environment like this, standardized image asset management becomes a key part of improving development efficiency.

You could say this was one of those small growing pains in HagiCode’s journey. Every project has moments like that: a minor issue that looks insignificant, yet somehow manages to take up half the day.

HagiCode’s build system is based on the TypeScript + Node.js ecosystem, so ImgBin naturally adopted the same tech stack to keep the project technically consistent. Once you are used to one stack, switching to something else just feels like unnecessary trouble.


ImgBin uses a layered architecture that cleanly separates CLI commands, application services, third-party API adapters, and the infrastructure layer:

Component hierarchy
├── CLI Entry (cli.ts) Global argument parsing, command routing
├── Commands (commands/*) generate | batch | annotate | thumbnail
├── Application Services job-runner | metadata | thumbnail | asset-writer
├── Provider Adapters image-api-provider | vision-api-provider
└── Infrastructure Layer config | logger | paths | schema

The benefit of this layered design is clear responsibility boundaries. It also makes testing easier because external dependencies can be mocked cleanly. In practice, it just means each layer does its own job without getting in the way of the others, so when something breaks, it is easier to figure out why.

ImgBin uses a model of “one asset, one directory.” Every time an image is generated, it creates a structure like this:

library/
└── 2026-03/
└── orange-dashboard/
├── original.png # Original image
├── thumbnail.webp # 512x512 thumbnail
└── metadata.json # Structured metadata

The advantages of this model are:

  1. Self-contained: all files for a single asset live in the same directory, making migration and backup convenient.
  2. Traceable: metadata.json makes it possible to trace generation time, prompt, model, and other details.
  3. Extensible: if more variants are needed later, such as thumbnails in multiple sizes, we can simply add new files in the same directory.

Beautiful things do not always need to be possessed. Sometimes it is enough that they remain beautiful, and that you can quietly appreciate them. That may sound a little far afield, but the logic still holds here: once images are kept together, they are more pleasant to look at and much easier to find.

metadata.json is the core of the entire system. It uses a layered storage strategy that separates fields into three categories:

{
"schemaVersion": 2,
"assetId": "orange-dashboard",
"slug": "orange-dashboard",
"title": "Orange Dashboard",
"tags": ["dashboard", "hero", "orange"],
"source": { "type": "generated" },
"paths": {
"assetDir": "library/2026-03/orange-dashboard",
"original": "original.png",
"thumbnail": "thumbnail.webp"
},
"generated": {
"prompt": "orange dashboard for docs hero",
"provider": "azure-openai-image-api",
"model": "gpt-image-1.5"
},
"recognized": {
"title": "Orange Dashboard",
"tags": ["dashboard", "ui", "orange"],
"description": "A modern orange dashboard with charts and metrics"
},
"status": {
"generation": "succeeded",
"recognition": "succeeded",
"thumbnail": "succeeded"
},
"timestamps": {
"createdAt": "2026-03-11T04:01:19.570Z",
"updatedAt": "2026-03-11T04:02:09.132Z"
}
}
  • generated: records the original information from image generation, such as the prompt, provider, and model.
  • recognized: stores AI recognition results, such as auto-generated titles, tags, and descriptions.
  • manual: stores manually curated results. Data in this area has the highest priority and will not be overwritten by AI recognition.

This layered strategy resolves one of our earlier core conflicts: when AI recognition and manual curation disagree, which one should win? The answer is manual input. AI recognition is there to assist, not to decide. That question also became clearer over time - machines are still machines, and in the end, people still need to make the call.


Another core part of ImgBin is the Provider Adapter pattern. We abstract external APIs behind a unified interface so that even if we switch AI service providers, we do not need to change the business logic.

In a way, it is a bit like relationships - outward appearances can change, but what matters is that the inner structure stays the same. Once the interface is fixed, the internal implementation can vary freely.

interface ImageGenerationProvider {
// Generate an image and return its Buffer
generate(options: GenerateOptions): Promise<Buffer>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface GenerateOptions {
prompt: string;
model?: string;
size?: '1024x1024' | '1792x1024' | '1024x1792';
quality?: 'standard' | 'hd';
format?: 'png' | 'webp' | 'jpeg';
}
interface VisionRecognitionProvider {
// Recognize image content and return structured metadata
recognize(imageBuffer: Buffer): Promise<RecognitionResult>;
// Get the list of supported models
getSupportedModels(): Promise<string[]>;
}
interface RecognitionResult {
title?: string;
tags: string[];
description?: string;
confidence: number;
}

The advantages of this interface design are:

  1. Testable: in unit tests, we can pass in mock providers instead of making real external API calls.
  2. Extensible: adding a new provider only requires implementing the interface; caller code does not need to change.
  3. Replaceable: production can use Azure OpenAI while testing can use a local model, with configuration being the only thing that changes.

Sometimes project work feels like that too. On the surface it looks like we just swapped an API, but the internal logic remains exactly the same, and that makes the whole thing a lot less scary.


ImgBin provides four core commands to cover different usage scenarios:

Terminal window
# Simplest usage
imgbin generate --prompt "orange dashboard for docs hero"
# Generate a thumbnail and AI annotations at the same time
imgbin generate --prompt "orange dashboard" --annotate --thumbnail
# Specify an output directory
imgbin generate --prompt "orange dashboard" --output ./library

Batch jobs are defined through YAML or JSON manifest files, which makes them suitable for CI/CD workflows:

assets/jobs/launch.yaml
defaults:
annotate: true
thumbnail: true
libraryRoot: ./library
jobs:
- prompt: "orange dashboard hero"
slug: orange-dashboard
tags: [dashboard, hero, orange]
- prompt: "pricing grid for docs"
slug: pricing-grid
tags: [pricing, grid, docs]

Run the command:

Terminal window
imgbin batch assets/jobs/launch.yaml

The batch job design supports failure isolation: items in the manifest are processed one by one, and a failure in one item does not affect the others. You can also preview the job with --dry-run without actually executing it.

And the best part is that it tells you exactly what succeeded and what failed. Unlike some things in life, where failure happens and you are left not even knowing how it happened.

Run AI recognition on existing images to automatically generate titles, tags, and descriptions:

Terminal window
# Annotate a single image
imgbin annotate ./library/2026-03/orange-dashboard
# Annotate an entire directory in batch
imgbin annotate ./library/2026-03/

Generate thumbnails for existing images:

Terminal window
# Generate a thumbnail
imgbin thumbnail ./library/2026-03/orange-dashboard

The manifest format for batch jobs supports flexible configuration. Defaults can be set globally, and individual jobs can override them:

# Global defaults
defaults:
annotate: true # Enable AI annotation by default
thumbnail: true # Generate thumbnails by default
libraryRoot: ./library
model: gpt-image-1.5
jobs:
# Minimal configuration: only provide a prompt
- prompt: "first image"
# Full configuration
- prompt: "second image"
slug: custom-slug
tags: [tag1, tag2]
annotate: false # Do not run AI annotation for this job
model: dall-e-3 # Use a different model for this job

When executed, ImgBin processes jobs one by one. The result of each job is written to its corresponding metadata.json. Even if one job fails, the others are unaffected. After all jobs complete, the CLI outputs a summary report:

✓ orange-dashboard (succeeded)
✓ pricing-grid (succeeded)
✗ hero-banner (failed: API rate limit exceeded)
2/3 succeeded, 1 failed

Some things cannot be rushed. Taking them one at a time is often the steadier path. Maybe that is the philosophy behind batch jobs.


ImgBin supports flexible configuration through environment variables:

Terminal window
# ImgBin working directory
IMGBIN_WORKDIR=/path/to/imgbin
# Executable path (for invocation inside scripts)
IMGBIN_EXECUTABLE=/path/to/imgbin/dist/cli.js
# Asset library root
IMGBIN_LIBRARY_ROOT=./.imgbin-library
# Azure OpenAI configuration (if using the Azure provider)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=***
AZURE_OPENAI_IMAGE_DEPLOYMENT=gpt-image-1

Configuration is one of those things that can feel both important and not that important at the same time. In the end, whatever feels comfortable and fits your workflow best is usually the right choice.


During implementation, we summarized a few key points:

Interface definitions should be clear and complete, including input parameters, return values, and error handling. It is also a good idea to provide both synchronous and asynchronous invocation styles for different scenarios.

That is one small piece of hard-earned experience. Once an interface is set, nobody wants to keep changing it later.

When one item fails in a batch job, the CLI should:

  1. Write detailed error information to a separate log file.
  2. Continue executing other jobs instead of interrupting the whole process.
  3. Return a non-zero exit code at the end to indicate that some jobs failed.
  4. Clearly display the execution result of every job in the summary report.

Some failures are just failures. There is no point pretending otherwise. It is better to acknowledge them openly and then figure out how to solve them. The same logic applies to projects and to life.

Recognition results are written to the recognized section by default, while manually edited fields are marked in manual. Metadata updates follow an append-only strategy: unless --force is explicitly passed, existing manually curated results are not overwritten.

That point became clear too - some things, once overwritten, are just gone. It is often better to preserve them, because the record itself has value.

Use fs.mkdir({ recursive: true }) to ensure directory creation remains atomic and to avoid race conditions in concurrent scenarios.

Maybe that is what security feels like - being stable when stability matters, moving fast when speed matters, and never getting stuck second-guessing.


As the core tool for image asset management in the HagiCode project, ImgBin solves our problems through the following design choices:

  1. Unified entry point: the CLI covers generation, annotation, thumbnails, and all other core operations.
  2. Metadata-driven: every asset has a complete metadata.json, enabling search and traceability.
  3. Provider Adapter: flexible abstraction for external APIs, making testing and extension easier.
  4. Batch job support: batch image generation can be automated within CI/CD workflows.

Everything else may have faded, but this approach really did end up proving useful.

This solution not only improves HagiCode’s own development efficiency, but also forms a reusable framework for image asset management. If you are building a similarly multi-component project, I believe ImgBin’s design ideas may give you some inspiration.

Youth is all about trying things and making a bit of a mess. If you never put yourself through that, how would you know what you are really capable of?



Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was produced with AI-assisted collaboration, reviewed by me, and reflects my own views and position.

Complete Guide to Codex SDK Console Message Parsing

Complete Guide to Codex SDK Console Message Parsing

Section titled “Complete Guide to Codex SDK Console Message Parsing”

This article explains the Codex SDK event stream mechanism, message type parsing, and best practices in real projects, helping developers quickly master the core skills behind AI execution services.

When building an AI execution service based on the Codex SDK, we inevitably run into a practical question: how should we handle the streamed event messages returned by Codex? These messages contain important information such as execution status, output content, and error details, so they deserve careful handling.

As part of the HagiCode project, we needed a reliable executor for AI coding assistant scenarios. That is exactly why we decided to study the Codex SDK event stream mechanism in depth. After all, only by understanding how the underlying messages work can we build a truly enterprise-grade AI execution platform.

The Codex SDK is a programming-assistance SDK released by OpenAI. It returns execution results through an Event Stream. Unlike the traditional request-response model, Codex uses streamed events so that we can:

  • Get execution progress in real time
  • Handle errors promptly
  • Obtain detailed token usage statistics
  • Support long-running complex tasks

Understanding these event types and parsing them correctly is essential for implementing a fully capable AI executor. In the end, nobody wants to work with a black box.

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI coding assistant project dedicated to providing developers with intelligent coding support. During development, we needed to build a reliable AI execution service to handle user code execution requests, which is the direct reason we introduced the Codex SDK.

As an AI coding assistant, HagiCode needs to deal with a variety of complex code execution scenarios: getting execution progress in real time, handling errors promptly, and collecting detailed token usage statistics. By deeply understanding the Codex SDK event stream mechanism, we can build an executor that meets production environment requirements. Ultimately, whether it is software or real life, everything benefits from steady accumulation and refinement.

The Codex SDK uses the thread.runStreamed() method to return an asynchronous event iterator:

import { Codex } from '@openai/codex-sdk';
const client = new Codex({
apiKey: process.env.CODEX_API_KEY,
baseUrl: process.env.CODEX_BASE_URL,
});
const thread = client.startThread({
workingDirectory: '/path/to/project',
skipGitRepoCheck: false,
});
const { events } = await thread.runStreamed('your prompt here', {
outputSchema: {
type: 'object',
properties: {
output: { type: 'string' },
status: { type: 'string', enum: ['ok', 'action_required'] },
},
required: ['output', 'status'],
},
});
for await (const event of events) {
// Handle each event
}
Event TypeDescriptionKey Data
thread.startedThread started successfullythread_id
item.updatedMessage content updateditem.text
item.completedMessage completeditem.text
turn.completedExecution completedusage (token usage)
turn.failedExecution failederror.message
errorError eventmessage

In real projects, HagiCode’s executor component is built on top of these event types. We need to handle each kind of event carefully to ensure a smooth user experience. Good systems are built by taking details seriously.

Message content is extracted through an event handler:

private handleThreadEvent(event: ThreadEvent, onMessage: (content: string) => void): void {
// Only handle message update and completion events
if (event.type !== 'item.updated' && event.type !== 'item.completed') {
return;
}
// Only handle agent message content
if (event.item.type !== 'agent_message') {
return;
}
// Extract text content
onMessage(event.item.text);
}

Key points:

  • Only handle item.updated and item.completed events
  • Only handle content of type agent_message
  • The message content is in the event.item.text field

Codex supports JSON structured output. You can specify the return format through the outputSchema parameter:

const DEFAULT_OUTPUT_SCHEMA = {
type: 'object',
properties: {
output: { type: 'string' },
status: { type: 'string', enum: ['ok', 'action_required'] },
},
required: ['output', 'status'],
additionalProperties: false,
} as const;

The parsing function attempts to parse JSON, and if that fails it falls back to the raw text.

function toStructuredOutput(raw: string): StructuredOutput {
try {
const parsed = JSON.parse(raw) as Partial<StructuredOutput>;
if (typeof parsed.output === 'string') {
return {
output: parsed.output,
status: parsed.status === 'action_required' ? 'action_required' : 'ok',
};
}
} catch {
// JSON parsing failed, fall back to the raw text
}
return {
output: raw,
status: 'ok',
};
}
private async runWithStreaming(
thread: Thread,
input: CodexStageExecutionInput
): Promise<{ output: string; usage: Usage | null }> {
const abortController = new AbortController();
const timeoutHandle = setTimeout(() => {
abortController.abort();
}, Math.max(1000, input.timeoutMs));
let latestMessage = '';
let usage: Usage | null = null;
let emittedLength = 0;
try {
const { events } = await thread.runStreamed(input.prompt, {
outputSchema: DEFAULT_OUTPUT_SCHEMA,
signal: abortController.signal,
});
for await (const event of events) {
// Handle message content
this.handleThreadEvent(event, (nextContent) => {
const delta = nextContent.slice(emittedLength);
if (delta.length > 0) {
emittedLength = nextContent.length;
input.callbacks?.onChunk?.(delta); // Streaming callback
}
latestMessage = nextContent;
});
// Process different data based on the event type
if (event.type === 'thread.started') {
this.threadId = event.thread_id;
} else if (event.type === 'turn.completed') {
usage = event.usage;
} else if (event.type === 'turn.failed') {
throw new CodexExecutorError('gateway_unavailable', event.error.message, true);
} else if (event.type === 'error') {
throw new CodexExecutorError('gateway_unavailable', event.message, true);
}
}
} catch (error) {
if (abortController.signal.aborted) {
throw new CodexExecutorError(
'upstream_timeout',
`Codex stage timed out after ${input.timeoutMs}ms`,
true
);
}
throw error;
} finally {
clearTimeout(timeoutHandle);
}
const structured = toStructuredOutput(latestMessage);
return { output: structured.output, usage };
}

Map specific error patterns to concrete error codes so the upper layers can handle them more easily:

function mapError(error: unknown): CodexExecutorError {
if (error instanceof CodexExecutorError) {
return error;
}
const message = error instanceof Error ? error.message : String(error);
const normalized = message.toLowerCase();
// Authentication errors - not retryable
if (normalized.includes('401') ||
normalized.includes('403') ||
normalized.includes('api key') ||
normalized.includes('auth')) {
return new CodexExecutorError('auth_invalid', message, false);
}
// Rate limit errors - retryable
if (normalized.includes('429') || normalized.includes('rate limit')) {
return new CodexExecutorError('rate_limited', message, true);
}
// Timeout errors - retryable
if (normalized.includes('timeout') || normalized.includes('aborted')) {
return new CodexExecutorError('upstream_timeout', message, true);
}
// Default error
return new CodexExecutorError('gateway_unavailable', message, true);
}
export type CodexErrorCode =
| 'auth_invalid' // Authentication failure
| 'upstream_timeout' // Upstream timeout
| 'rate_limited' // Rate limited
| 'gateway_unavailable'; // Gateway unavailable
export class CodexExecutorError extends Error {
readonly code: CodexErrorCode;
readonly retryable: boolean;
constructor(code: CodexErrorCode, message: string, retryable: boolean) {
super(message);
this.name = 'CodexExecutorError';
this.code = code;
this.retryable = retryable;
}
}

Working Directory and Environment Configuration

Section titled “Working Directory and Environment Configuration”

The Codex SDK requires the working directory to be a valid Git repository.

export function validateWorkingDirectory(
workingDirectory: string,
skipGitRepoCheck: boolean
): void {
const resolvedWorkingDirectory = path.resolve(workingDirectory);
if (!existsSync(resolvedWorkingDirectory)) {
throw new CodexExecutorError(
'gateway_unavailable',
'Working directory does not exist.',
false
);
}
if (!statSync(resolvedWorkingDirectory).isDirectory()) {
throw new CodexExecutorError(
'gateway_unavailable',
'Working directory is not a directory.',
false
);
}
if (skipGitRepoCheck) {
return;
}
const gitDir = path.join(resolvedWorkingDirectory, '.git');
if (!existsSync(gitDir)) {
throw new CodexExecutorError(
'gateway_unavailable',
'Working directory is not a git repository.',
false
);
}
}

The Codex SDK needs to load environment variables from the login shell so the AI Agent can access system commands:

function parseEnvironmentOutput(output: Buffer): Record<string, string> {
const parsed: Record<string, string> = {};
for (const entry of output.toString('utf8').split('\0')) {
if (!entry) continue;
const separatorIndex = entry.indexOf('=');
if (separatorIndex <= 0) continue;
const key = entry.slice(0, separatorIndex);
const value = entry.slice(separatorIndex + 1);
if (key.length > 0) {
parsed[key] = value;
}
}
return parsed;
}
function tryLoadEnvironmentFromShell(shellPath: string): Record<string, string> | null {
const result = spawnSync(shellPath, ['-ilc', 'env -0'], {
env: process.env,
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 5000,
});
if (result.error || result.status !== 0) {
return null;
}
return parseEnvironmentOutput(result.stdout);
}
export function createExecutorEnvironment(
envOverrides: Record<string, string> = {}
): Record<string, string> {
// Load environment variables from the login shell
const consoleEnv = loadConsoleEnvironmentFromShell();
return {
...process.env,
...consoleEnv,
...envOverrides,
};
}

In the HagiCode project, we use the following approach to initialize the Codex client and execute tasks:

import { Codex } from '@openai/codex-sdk';
async function executeWithCodex(prompt: string, workingDir: string) {
const client = new Codex({
apiKey: process.env.CODEX_API_KEY,
env: { PATH: process.env.PATH },
});
const thread = client.startThread({
workingDirectory: workingDir,
});
const { events } = await thread.runStreamed(prompt);
let result = '';
for await (const event of events) {
if (event.type === 'item.updated' && event.item.type === 'agent_message') {
result = event.item.text;
}
if (event.type === 'turn.completed') {
console.log('Token usage:', event.usage);
}
}
// Try to parse JSON output
try {
const parsed = JSON.parse(result);
return parsed.output;
} catch {
return result;
}
}
export class CodexSdkExecutor {
private readonly config: CodexRuntimeConfig;
private readonly client: Codex;
private threadId: string | null = null;
async executeStage(input: CodexStageExecutionInput): Promise<CodexStageExecutionResult> {
const maxAttempts = Math.max(1, this.config.retryCount + 1);
let attempt = 0;
let lastError: CodexExecutorError | null = null;
while (attempt < maxAttempts) {
attempt += 1;
try {
const thread = this.getThread(input.workingDirectory);
const { output, usage } = await this.runWithStreaming(thread, input);
return {
output,
usage,
threadId: this.threadId!,
attempts: attempt,
latencyMs: Date.now() - startedAt,
};
} catch (error) {
const mappedError = mapError(error);
lastError = mappedError;
// Non-retryable error or max retry attempts reached
if (!mappedError.retryable || attempt >= maxAttempts) {
throw mappedError;
}
// Wait before retrying
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
throw lastError!;
}
}
  • Make sure the working directory is a valid Git repository
  • Use the PROJECT_ROOT environment variable to specify it explicitly
  • During debugging, you can set CODEX_SKIP_GIT_REPO_CHECK=true to skip the check
  • Pass only the required environment variables through a whitelist mechanism
  • Use the login shell to load the full environment
  • Avoid passing sensitive information
  • Set reasonable timeouts based on task complexity
  • Implement exponential backoff for retryable errors
  • Record retry counts and reasons
  • Distinguish between retryable and non-retryable errors
  • Provide clear error messages and suggestions
  • Use unified error codes so upper layers can handle them consistently
  • Implement incremental output callbacks to improve user experience
  • Correctly handle incremental message updates
  • Record token usage for cost analysis

In the actual production environment of the HagiCode project, we have already verified the effectiveness of the best practices above. This approach has helped us build a stable and reliable AI execution service. In the end, practical validation matters more than theory alone.

The Codex SDK event stream mechanism provides strong capabilities for building AI execution services. By correctly parsing different kinds of events, we can:

  • Get execution status and output in real time
  • Implement reliable error handling and retry mechanisms
  • Obtain detailed execution statistics
  • Build a full-featured AI execution platform

The core concepts and code samples introduced in this article can be applied directly in real projects, helping developers get started quickly with Codex SDK integration. If you find this approach valuable, it also reflects the strength of HagiCode’s engineering practice and makes HagiCode itself worth following.


Thank you for reading. If you found this article helpful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by the author, and reflects the author’s own views and positions.

From TypeScript to C#: Cross-Language Porting Practice for the Codex SDK

From TypeScript to C#: Cross-Language Porting Practice for the Codex SDK

Section titled “From TypeScript to C#: Cross-Language Porting Practice for the Codex SDK”

Put simply, this article is also a bit of a baby of ours: it records the full process of porting the official TypeScript Codex SDK to C#. Calling it a “port” almost makes it sound too easy - it was more like a long adventure, because these two languages have very different personalities, and we had to find a way to make them cooperate.

Codex is the AI Agent CLI tool released by OpenAI, and it is genuinely powerful. The official team provides a TypeScript SDK in the @openai/codex package. It interacts with the Codex CLI by calling the codex exec --experimental-json command and parsing a JSONL event stream.

The problem is that in the HagiCode project, we need to use it in a pure .NET environment - specifically in C# backend services and desktop applications. We could not reasonably introduce a Node.js runtime into a .NET project just to call a CLI tool. That would be far too cumbersome.

So we were left with two choices: maintain a complex Node.js bridge layer, or build a native C# SDK ourselves.

We chose the latter.

This article also comes directly from our hands-on experience in the HagiCode project. HagiCode is an open-source AI coding assistant project. In plain terms, it means maintaining multiple components at once: a VSCode extension on the frontend, AI services on the backend, and a cross-platform desktop client. That multi-language, multi-platform complexity is exactly why we needed a native C# SDK - we really did not want to run Node.js inside a .NET project.

If you find this article helpful, feel free to give us a star on GitHub: github.com/HagiCode-org/site. You can also visit the official website to learn more: hagicode.com. It is always encouraging when an open-source project receives support.

Before translating code one-to-one, we first had to understand the architectural design of both SDKs. You have to understand both sides before you can port them well.

The core architecture of the TypeScript SDK looks like this:

Codex (entry class)
└── CodexExec (executor, manages child processes)
└── Thread (conversation thread)
├── run() / runStreamed() (synchronous/asynchronous execution)
└── event stream parsing

The C# SDK keeps the same architectural layering, but adapts the implementation details. The overall idea is straightforward: preserve API consistency while fully leveraging C# language features in the implementation.

This is the most fundamental and also the most important part of the work. If the foundation is weak, everything that follows becomes harder.

TypeScript’s type system is more flexible than C#‘s, and that is simply a fact. We needed to find an appropriate mapping strategy:

TypeScriptC#Notes
interface / typerecordC# uses record for immutable data structures
string | nullstring?Nullable reference type
boolean | undefinedbool?Nullable Boolean
AsyncGeneratorIAsyncEnumerableAsync iterator

The event type system is a typical example. TypeScript uses union types to define events:

export type ThreadEvent =
| ThreadStartedEvent
| TurnStartedEvent
| TurnCompletedEvent
| ...

In C#, we use an inheritance hierarchy and pattern matching to achieve a similar effect:

public abstract record ThreadEvent(string Type);
public sealed record ThreadStartedEvent(string ThreadId) : ThreadEvent("thread.started");
public sealed record TurnStartedEvent() : ThreadEvent("turn.started");
public sealed record TurnCompletedEvent(Usage Usage) : ThreadEvent("turn.completed");
// ...

We chose record instead of class because event objects should be immutable, which matches the intent behind using plain objects in TypeScript. The sealed keyword also prevents additional inheritance and gives the compiler room to optimize.

Event parsing is the core of the entire SDK, because it determines whether we can correctly understand every message returned by the Codex CLI. If parsing is wrong, everything after that is wasted effort.

The TypeScript version uses JSON.parse() to parse each line of JSON:

export function parseEvent(line: string): ThreadEvent {
const data = JSON.parse(line);
// Handle different event types...
}

The C# version uses System.Text.Json.JsonDocument instead:

public static ThreadEvent Parse(string line)
{
using var document = JsonDocument.Parse(line);
var root = document.RootElement;
var type = GetRequiredString(root, "type", "event.type");
return type switch
{
"thread.started" => new ThreadStartedEvent(GetRequiredString(root, "thread_id", ...)),
"turn.started" => new TurnStartedEvent(),
"turn.completed" => new TurnCompletedEvent(ParseUsage(...)),
// ...
_ => new UnknownThreadEvent(type, root.Clone()),
};
}

There is one small but important trick here: root.Clone() is required, because elements from JsonDocument become invalid after the document is disposed. We need to retain a copy for unknown event types. That is simply one of the differences between C# JSON handling and JavaScript.

This is where the two SDKs differ the most. Node.js and .NET have different runtime conventions, so the implementation has to adapt.

TypeScript uses Node.js’s spawn() function:

const child = spawn(this.executablePath, commandArgs, { env, signal });

C# uses .NET’s System.Diagnostics.Process:

using var process = new Process { StartInfo = startInfo };
process.Start();
// stdin/stdout/stderr must be managed manually

More specifically, the C# version needs to configure the process like this:

var startInfo = new ProcessStartInfo
{
FileName = _executablePath,
RedirectStandardInput = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true,
};

The biggest difference is the cancellation mechanism. TypeScript uses AbortSignal, which is part of the Web API and very convenient to work with:

const child = spawn(cmd, args, { signal: cancellationSignal });

C# uses CancellationToken instead:

public async IAsyncEnumerable<string> RunAsync(
CodexExecArgs args,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Check cancellation status inside the loop
while (!cancellationToken.IsCancellationRequested)
{
// Process output...
}
// Terminate the process when cancellation is requested
if (cancellationToken.IsCancellationRequested)
{
try { process.Kill(entireProcessTree: true); } catch { }
}
}

At a high level, this is just another example of the difference between the Web API ecosystem and the .NET ecosystem.

Both SDKs implement the logic that converts JSON configuration into TOML configuration, because the Codex CLI accepts configuration overrides in TOML format. This part must remain completely consistent, otherwise the same configuration will behave differently in the two SDKs.

That is the kind of detail you cannot compromise on. Success or failure often comes down to details like this.

We created the following project structure:

CodexSdk/
├── CodexSdk.csproj
├── Codex.cs # Entry class
├── CodexThread.cs # Conversation thread
├── CodexExec.cs # Executor
├── Events.cs # Event type definitions
├── Items.cs # Item type definitions
├── EventParser.cs # Event parser
├── OutputSchemaTempFile.cs # Temporary file management
└── ...

It is a fairly clean structure, and that helped a lot during the port.

The basic usage remains consistent with the TypeScript SDK:

using CodexSdk;
// Create a Codex instance
var codex = new Codex();
var thread = codex.StartThread();
// Execute a query
var result = await thread.RunAsync("Summarize this repository.");
Console.WriteLine(result.FinalResponse);

Streaming event handling takes advantage of C# pattern matching:

await foreach (var @event in thread.RunStreamedAsync("Analyze the code."))
{
switch (@event)
{
case ItemCompletedEvent itemCompleted
when itemCompleted.Item is AgentMessageItem msg:
Console.WriteLine($"Assistant: {msg.Text}");
break;
case TurnCompletedEvent completed:
Console.WriteLine($"Tokens: in={completed.Usage.InputTokens}");
break;
case CommandExecutionItem command:
Console.WriteLine($"Command: {command.Command}");
break;
}
}

During implementation, we collected several practical lessons:

  1. Process management: The C# version must manage the full process lifecycle manually, including process termination during cancellation. Use Kill(entireProcessTree: true) to make sure child processes are also cleaned up.

  2. Error handling: We use InvalidOperationException to throw parsing errors, keeping the error handling style similar to the TypeScript SDK.

  3. Resource cleanup: OutputSchemaTempFile implements IAsyncDisposable to ensure temporary files are cleaned up correctly.

  4. Environment variables: The C# version supports fully overriding process environment variables through CodexOptions.Env. It is a small feature, but a very practical one.

  5. Platform differences: The C# version does not include the TypeScript version’s logic for automatically locating binaries inside npm packages. Since .NET projects typically do not depend on npm, the path to the codex executable must be specified via the CODEX_EXECUTABLE environment variable or CodexPathOverride.

Porting a mature TypeScript SDK to C# is not just a matter of syntax conversion - it also requires understanding the design philosophies of both languages. TypeScript’s flexibility and JavaScript ecosystem features such as AbortSignal need appropriate counterparts in C#.

The key takeaway is this: maintaining API consistency matters more than maintaining implementation-level consistency. Users care about whether the interface is easy to use, not whether the internal implementation is identical. That sounds simple, but making those trade-offs takes judgment.

If you are working on a similar cross-language port, our experience is to fully understand the original SDK architecture first, then translate it module by module, and finally use a complete test suite to ensure behavioral consistency. This kind of work cannot be rushed.

Everything will work out in the end.



If this article helped you:


Thank you for reading. If you found this article useful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration, reviewed by the author, and reflects the author’s own views and position.

Deep Integration and Practical Use of StreamJsonRpc in HagiCode

Deep Integration and Practical Use of StreamJsonRpc in HagiCode

Section titled “Deep Integration and Practical Use of StreamJsonRpc in HagiCode”

This article explains in detail how the HagiCode (formerly PCode) project successfully integrated Microsoft’s StreamJsonRpc communication library to replace its original custom JSON-RPC implementation, while also resolving the technical pain points and architectural challenges encountered during the integration process.

StreamJsonRpc is Microsoft’s officially maintained JSON-RPC communication library for .NET and TypeScript, known for its strong type safety, automatic proxy generation, and mature exception handling mechanisms. In the HagiCode project, the team decided to integrate StreamJsonRpc in order to communicate with external AI tools such as iflow CLI and OpenCode CLI through ACP (Agent Communication Protocol), while also eliminating the maintenance cost and potential bugs introduced by the earlier custom JSON-RPC implementation. However, the integration process ran into challenges specific to streaming JSON-RPC, especially when handling proxy target binding and generic parameter recognition.

To address these pain points, we made a bold decision: rebuild the entire build system from the ground up. The impact of that decision may be even bigger than you expect, and I will explain it in detail shortly.

Let me first introduce the “main project” featured in this article.

If you have run into these frustrations during development:

  • Multiple projects and multiple tech stacks make build scripts expensive to maintain
  • CI/CD pipeline configuration is cumbersome, and every change sends you back to the docs
  • Cross-platform compatibility issues keep surfacing
  • You want AI to help you write code, but existing tools are not intelligent enough

Then HagiCode, which we are building, may interest you.

What is HagiCode?

  • An AI-driven intelligent coding assistant
  • Supports multilingual, cross-platform code generation and optimization
  • Includes built-in gamification mechanisms so coding feels less dull

Why mention it here? The StreamJsonRpc integration approach shared in this article is distilled from our practical experience while developing HagiCode. If you find this engineering solution valuable, it probably means our technical taste is pretty solid, and HagiCode itself is worth checking out as well.

Want to learn more?

The current project is in a critical stage of ACP protocol integration and is facing the following technical pain points and architectural challenges:

1. Limitations of the Custom Implementation

Section titled “1. Limitations of the Custom Implementation”

The original JSON-RPC implementation is located in src/HagiCode.ClaudeHelper/AcpImp/ and includes components such as JsonRpcEndpoint and ClientSideConnection. Maintaining this custom codebase is costly, and it lacks the advanced capabilities of a mature library, such as progress reporting and cancellation support.

When attempting to migrate the existing CallbackProxyTarget pattern to StreamJsonRpc, we found that the _rpc.AddLocalRpcTarget(target) method could not recognize targets created through the proxy pattern. Specifically, StreamJsonRpc could not automatically split properties of the generic type T into RPC method parameters, causing the server side to fail when processing method calls initiated by the client.

The existing ClientSideConnection mixes the transport layer (WebSocket/Stdio), protocol layer (JSON-RPC), and business layer (ACP Agent interface), leading to unclear responsibilities. It also suffers from missing method bindings in AcpAgentCallbackRpcAdapter.

The WebSocket transport layer lacks raw JSON content logging, making it difficult to determine during RPC debugging whether a problem originates from serialization or from the network.

To address the problems above, we adopted the following systematic solution, optimizing from three dimensions: architectural refactoring, library integration, and enhanced debugging.

Delete JsonRpcEndpoint.cs, AgentSideConnection.cs, and related custom serialization converters such as JsonRpcMessageJsonConverter.

Introduce the StreamJsonRpc NuGet package and use its JsonRpc class to handle the core communication logic.

Define the IAcpTransport interface to handle both WebSocket and Stdio transport modes in a unified way, ensuring the protocol layer is decoupled from the transport layer.

// Definition of the IAcpTransport interface
public interface IAcpTransport
{
Task SendAsync(string message, CancellationToken cancellationToken = default);
Task<string> ReceiveAsync(CancellationToken cancellationToken = default);
Task CloseAsync(CancellationToken cancellationToken = default);
}
// WebSocket transport implementation
public class WebSocketTransport : IAcpTransport
{
private readonly WebSocket _webSocket;
public WebSocketTransport(WebSocket webSocket)
{
_webSocket = webSocket;
}
// Implement send and receive methods
// ...
}
// Stdio transport implementation
public class StdioTransport : IAcpTransport
{
private readonly StreamReader _reader;
private readonly StreamWriter _writer;
public StdioTransport(StreamReader reader, StreamWriter writer)
{
_reader = reader;
_writer = writer;
}
// Implement send and receive methods
// ...
}

Inspect the existing dynamic proxy generation logic and identify the root cause of why StreamJsonRpc cannot recognize it. In most cases, this happens because the proxy object does not publicly expose the actual method signatures, or it uses parameter types unsupported by StreamJsonRpc.

Split generic properties into explicit RPC method parameters. Instead of relying on dynamic properties, define concrete Request/Response DTOs (data transfer objects) so StreamJsonRpc can correctly recognize method signatures through reflection.

// Original generic property approach
public class CallbackProxyTarget<T>
{
public Func<T, Task> Callback { get; set; }
}
// Refactored concrete method approach
public class ReadTextFileRequest
{
public string FilePath { get; set; }
}
public class ReadTextFileResponse
{
public string Content { get; set; }
}
public interface IAcpAgentCallback
{
Task<ReadTextFileResponse> ReadTextFileAsync(ReadTextFileRequest request);
// Other methods...
}

In some complex scenarios, manually proxying the JsonRpc object and handling RpcConnection can be more flexible than directly adding a local target.

3. Implement Method Binding and Stronger Logging

Section titled “3. Implement Method Binding and Stronger Logging”

Ensure that this component explicitly implements the StreamJsonRpc proxy interface and maps methods defined by the ACP protocol, such as ReadTextFileAsync, to StreamJsonRpc callback handlers.

Intercept and record the raw text of JSON-RPC requests and responses in the WebSocket or Stdio message processing pipeline. Use ILogger to output the raw payload before parsing and after serialization so formatting issues can be diagnosed more easily.

// Transport wrapper with enhanced logging
public class LoggingAcpTransport : IAcpTransport
{
private readonly IAcpTransport _innerTransport;
private readonly ILogger<LoggingAcpTransport> _logger;
public LoggingAcpTransport(IAcpTransport innerTransport, ILogger<LoggingAcpTransport> logger)
{
_innerTransport = innerTransport;
_logger = logger;
}
public async Task SendAsync(string message, CancellationToken cancellationToken = default)
{
_logger.LogTrace("Sending message: {Message}", message);
await _innerTransport.SendAsync(message, cancellationToken);
}
public async Task<string> ReceiveAsync(CancellationToken cancellationToken = default)
{
var message = await _innerTransport.ReceiveAsync(cancellationToken);
_logger.LogTrace("Received message: {Message}", message);
return message;
}
public async Task CloseAsync(CancellationToken cancellationToken = default)
{
_logger.LogDebug("Closing connection");
await _innerTransport.CloseAsync(cancellationToken);
}
}

Wrap the StreamJsonRpc connection and make it responsible for InvokeAsync calls and connection lifecycle management.

public class AcpRpcClient : IDisposable
{
private readonly JsonRpc _rpc;
private readonly IAcpTransport _transport;
public AcpRpcClient(IAcpTransport transport)
{
_transport = transport;
_rpc = new JsonRpc(new StreamRpcTransport(transport));
_rpc.StartListening();
}
public async Task<TResponse> InvokeAsync<TResponse>(string methodName, object parameters)
{
return await _rpc.InvokeAsync<TResponse>(methodName, parameters);
}
public void Dispose()
{
_rpc.Dispose();
_transport.Dispose();
}
// StreamRpcTransport is the StreamJsonRpc adapter for IAcpTransport
private class StreamRpcTransport : IDuplexPipe
{
// Implement the IDuplexPipe interface
// ...
}
}

Protocol Layer (IAcpAgentClient / IAcpAgentCallback)

Section titled “Protocol Layer (IAcpAgentClient / IAcpAgentCallback)”

Define clear client-to-agent and agent-to-client interfaces. Remove the cyclic factory pattern Func<IAcpAgent, IAcpClient> and replace it with dependency injection or direct callback registration.

Based on StreamJsonRpc best practices and project experience, the following are the key recommendations for implementation:

1. Strongly Typed DTOs Are Better Than Dynamic Objects

Section titled “1. Strongly Typed DTOs Are Better Than Dynamic Objects”

The core advantage of StreamJsonRpc lies in strong typing. Do not use dynamic or JObject to pass parameters. Instead, define explicit C# POCO classes as parameters for each RPC method. This not only solves the proxy target recognition issue, but also catches type errors at compile time.

Example: replace the generic properties in CallbackProxyTarget with concrete classes such as ReadTextFileRequest and WriteTextFileRequest.

Use the [JsonRpcMethod] attribute to explicitly specify RPC method names instead of relying on default method-name mapping. This prevents invocation failures caused by naming-style differences such as PascalCase versus camelCase.

public interface IAcpAgentCallback
{
[JsonRpcMethod("readTextFile")]
Task<ReadTextFileResponse> ReadTextFileAsync(ReadTextFileRequest request);
[JsonRpcMethod("writeTextFile")]
Task WriteTextFileAsync(WriteTextFileRequest request);
}

3. Take Advantage of Connection State Callbacks

Section titled “3. Take Advantage of Connection State Callbacks”

StreamJsonRpc provides the JsonRpc.ConnectionLost event. You should absolutely listen for this event to handle unexpected process exits or network disconnections, which is more timely than relying only on Orleans Grain failure detection.

_rpc.ConnectionLost += (sender, e) =>
{
_logger.LogError("RPC connection lost: {Reason}", e.ToString());
// Handle reconnection logic or notify the user
};
  • Trace level: Record the full raw JSON request/response payload.
  • Debug level: Record method call stacks and parameter summaries.
  • Note: Make sure the logs do not include sensitive Authorization tokens or Base64-encoded large-file content.

5. Handle the Special Nature of Streaming Transport

Section titled “5. Handle the Special Nature of Streaming Transport”

StreamJsonRpc natively supports IAsyncEnumerable. When implementing streaming prompt responses for ACP, use IAsyncEnumerable directly instead of custom pagination logic. This can greatly simplify the amount of code needed for streaming processing.

public interface IAcpAgentCallback
{
[JsonRpcMethod("streamText")]
IAsyncEnumerable<string> StreamTextAsync(StreamTextRequest request);
}

Keep ACPSession and ClientSideConnection separate. ACPSession should focus on Orleans state management and business logic, such as message enqueueing, and should use the StreamJsonRpc connection object through composition rather than inheritance.

By comprehensively integrating StreamJsonRpc, the HagiCode project successfully addressed the high maintenance cost, functional limitations, and architectural layering confusion of the original custom implementation. The key improvements include:

  1. Replacing dynamic properties with strongly typed DTOs, improving maintainability and reliability
  2. Implementing transport-layer abstraction and protocol-layer separation, improving architectural clarity
  3. Strengthening logging capabilities to make communication problems easier to diagnose
  4. Introducing streaming support to simplify streaming implementation

These improvements provide HagiCode with a more stable and more efficient communication foundation, allowing it to interact better with external AI tools and laying a solid foundation for future feature expansion.


If this article helped you:


Thank you for reading. If you found this article useful, please click the like button below so more people can discover it.

This content was created with AI-assisted collaboration and reviewed by me, and it reflects my own views and position.