How to Reproduce Projects in the AI Era: Vault, a Cross-Project Persistent Storage System
How to Reproduce Projects in the AI Era: Vault, a Cross-Project Persistent Storage System
Section titled “How to Reproduce Projects in the AI Era: Vault, a Cross-Project Persistent Storage System”In the era of AI-assisted development, how can we help AI assistants better understand our learning resources? The HagiCode project built the Vault system as a unified knowledge storage abstraction layer that AI can understand, greatly improving the efficiency of learning through project reproduction.
Background
Section titled “Background”In the AI era, the way developers learn new technologies and architectures is changing profoundly. “Reproducing projects” - that is, deeply studying and learning from the code, architecture, and design patterns of excellent open source projects - has become an efficient way to learn. Compared with traditional methods like reading books or watching videos, directly reading and running high-quality open source projects helps you understand real-world engineering practices much faster.
Still, this learning method comes with quite a few challenges.
Learning materials are too scattered. Your notes may live in Obsidian, code repositories may be scattered across different folders, and your AI assistant’s conversation history becomes yet another isolated data island. When you want AI to help analyze a project, you have to manually copy code snippets and organize context, which is rather tedious.
What is even more troublesome is the broken context. AI assistants cannot directly access your local learning resources, so you have to provide background information again in every conversation. On top of that, reproduced code repositories update quickly, manual syncing is error-prone, and knowledge is hard to share across multiple learning projects.
At the root, all of these problems come from “data islands.” If there were a unified storage abstraction layer that allowed AI assistants to understand and access all your learning resources, the problem would be solved neatly.
About HagiCode
Section titled “About HagiCode”The Vault system shared in this article is exactly the solution we developed while building HagiCode. HagiCode is an AI coding assistant project, and in our daily development work we often need to study and refer to many different open source projects. To help AI assistants better understand these learning resources, we designed Vault, a cross-project persistent storage system.
This solution has already been validated in HagiCode in real use. If you are facing similar knowledge management challenges, I hope these experiences can offer some inspiration. After all, once you’ve fallen into a few pits yourself, you should leave something behind for the next person.
Vault system design philosophy
Section titled “Vault system design philosophy”The core idea of the Vault system is simple: create a unified knowledge storage abstraction layer that AI can understand. From an implementation perspective, the system has several key characteristics.
Multi-type support
Section titled “Multi-type support”The system supports four vault types, each corresponding to a different usage scenario:
// folder: general-purpose folder typeexport const DEFAULT_VAULT_TYPE = 'folder';
// coderef: a type specifically for reproduced code projectsexport const CODEREF_VAULT_TYPE = 'coderef';
// obsidian: integrated with Obsidian note-taking softwareexport const OBSIDIAN_VAULT_TYPE = 'obsidian';
// system-managed: vault automatically managed by the systemexport const SYSTEM_MANAGED_VAULT_TYPE = 'system-managed';Among them, the coderef type is the most commonly used in HagiCode. It is specifically designed for reproduced code projects, providing a standardized directory structure and AI-readable metadata descriptions.
Persistent storage mechanism
Section titled “Persistent storage mechanism”The Vault registry is stored persistently in JSON format, ensuring that the configuration remains available after the application restarts:
public class VaultRegistryStore : IVaultRegistryStore{ private readonly string _registryFilePath;
public VaultRegistryStore(IConfiguration configuration, ILogger<VaultRegistryStore> logger) { var dataDir = configuration["DataDir"] ?? "./data"; var absoluteDataDir = Path.IsPathRooted(dataDir) ? dataDir : Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), dataDir));
_registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json"); }}The advantage of this design is that it is simple and reliable. JSON is human-readable, which makes debugging and manual editing easier; filesystem storage avoids the complexity of a database and reduces system dependencies. After all, sometimes the simplest option really is the best one.
AI context integration
Section titled “AI context integration”Most importantly, the system can automatically inject vault information into the context of AI proposals:
export function buildTargetVaultsText( vaults: VaultForText[], template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,): string { const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read'); const editableVaults = vaults.filter((vault) => vault.accessType === 'write');
if (readOnlyVaults.length === 0 && editableVaults.length === 0) { return ''; }
const sections = [ buildVaultSection(readOnlyVaults, template.reference), buildVaultSection(editableVaults, template.editable), ].filter(Boolean);
return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;}This enables an important capability: AI assistants can automatically understand the available learning resources without users manually providing context. You could say that counts as a kind of tacit understanding.
The standardized structure of CodeRef Vault
Section titled “The standardized structure of CodeRef Vault”For the coderef type of vault, HagiCode provides a standardized directory structure:
my-coderef-vault/├── index.yaml # vault metadata description├── AGENTS.md # operating guide for AI assistants├── docs/ # stores study notes and documents└── repos/ # manages reproduced code repositories through Git submodulesWhen creating a vault, the system automatically initializes this structure:
private async Task EnsureCodeRefStructureAsync( string vaultName, string physicalPath, ICollection<VaultBootstrapDiagnosticDto> diagnostics, CancellationToken cancellationToken){ Directory.CreateDirectory(physicalPath);
var indexPath = Path.Combine(physicalPath, CodeRefIndexFileName); var docsPath = Path.Combine(physicalPath, CodeRefDocsDirectoryName); var reposPath = Path.Combine(physicalPath, CodeRefReposDirectoryName);
// Create the standard directory structure if (!Directory.Exists(docsPath)) { Directory.CreateDirectory(docsPath); }
if (!Directory.Exists(reposPath)) { Directory.CreateDirectory(reposPath); }
// Create the AGENTS.md guide await EnsureCodeRefAgentsDocumentAsync(physicalPath, cancellationToken);
// Create the index.yaml metadata await WriteCodeRefIndexDocumentAsync(indexPath, mergedDocument, cancellationToken);}This structure is carefully designed as well:
- docs/ stores your study notes, where you can record your understanding of the code, architecture analysis, lessons learned, and so on in Markdown
- repos/ manages reproduced repositories through Git submodules instead of copying code directly, which keeps the code in sync and saves space
- index.yaml contains the vault metadata so AI assistants can quickly understand the purpose and contents of the vault
- AGENTS.md is a guide written specifically for AI assistants, explaining how to handle the contents of the vault
Organized this way, perhaps AI can understand what you have in mind a little more easily.
Automatic initialization for system-managed vaults
Section titled “Automatic initialization for system-managed vaults”In addition to manually created vaults, HagiCode also supports system-managed vaults:
public async Task<IReadOnlyList<VaultRegistryEntry>> EnsureAllSystemManagedVaultsAsync( CancellationToken cancellationToken = default){ var definitions = GetAllResolvedDefinitions(); var entries = new List<VaultRegistryEntry>(definitions.Count);
foreach (var definition in definitions) { entries.Add(await EnsureResolvedSystemManagedVaultAsync(definition, cancellationToken)); }
return entries;}The system automatically creates and manages the following vaults:
- hagiprojectdata: project data storage used to save project configuration and state
- personaldata: personal data storage used to save user preferences
- hbsprompt: a prompt template library used to manage commonly used AI prompts
These vaults are initialized automatically when the system starts, so users do not need to configure them manually. Some things are simply better left to the system instead of humans worrying about them.
Access control mechanism
Section titled “Access control mechanism”An important part of the design is access control. The system divides vaults into two access types:
export interface VaultForText { id: string; name: string; type: string; physicalPath: string; accessType: 'read' | 'write'; // Key: distinguish read-only from editable}- reference (read-only): AI is only used for analysis and understanding and cannot modify content. Suitable for referenced open source projects, documents, and similar materials
- editable (editable): AI can modify content as needed for the task. Suitable for your notes, drafts, and similar materials
This distinction matters. It tells AI which content is “read-only reference” and which content is “safe to edit,” reducing the risk of accidental changes. After all, nobody wants their hard work to disappear because of an unintended edit.
In practice: creating and using Vault
Section titled “In practice: creating and using Vault”Now that we’ve covered the ideas, let’s look at how it works in practice.
Create a CodeRef Vault
Section titled “Create a CodeRef Vault”Here is a complete frontend call example:
const createCodeRefVault = async () => { const response = await VaultService.postApiVaults({ requestBody: { name: "React Learning Vault", type: "coderef", physicalPath: "/Users/developer/vaults/react-learning", gitUrl: "https://github.com/facebook/react.git" } });
// The system will automatically: // 1. Clone the React repository into vault/repos/react // 2. Create the docs/ directory for notes // 3. Generate the index.yaml metadata // 4. Create the AGENTS.md guide file
return response;};This API call completes a series of actions: creating the directory structure, initializing Git submodules, generating metadata files, and more. You only need to provide the basic information and let the system handle the rest. It is honestly a fairly worry-free approach.
Use Vault in an AI proposal
Section titled “Use Vault in an AI proposal”After creating the vault, you can reference it in an AI proposal:
const proposal = composeProposalChiefComplaint({ chiefComplaint: "Help me analyze React's concurrent rendering mechanism", repositories: [ { id: "react", gitUrl: "https://github.com/facebook/react.git" } ], vaults: [ { id: "react-learning", name: "React Learning Vault", type: "coderef", physicalPath: "/vaults/react-learning", accessType: "read" // AI can only read, not modify } ], quickRequestText: "Focus on the Fiber architecture and scheduler implementation"});The system automatically injects vault information into the AI context, letting AI know which learning resources are available. When AI can understand what you have in mind, that kind of tacit understanding is hard to come by.
Best practices and things to watch for
Section titled “Best practices and things to watch for”While using the Vault system, we have summarized a few lessons learned.
Path safety
Section titled “Path safety”The system strictly validates paths to prevent path traversal attacks:
private static string ResolveFilePath(string vaultRoot, string relativePath){ var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot)); var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath)); if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase)) { throw new BusinessException(VaultRelativePathTraversalCode, "Vault file paths must stay inside the registered vault root."); } return combinedPath;}This is important. If you customize a vault path, make sure it stays within the allowed range, otherwise the system will reject the operation. You really cannot overemphasize security.
Git submodule management
Section titled “Git submodule management”CodeRef Vault recommends Git submodules instead of directly copying code:
private static string BuildCodeRefAgentsContent(){ return """ # CodeRef Vault Guide
Repositories under `repos/` should be maintained through Git submodules rather than copied directly into the vault root.
Keep this structure stable so assistants and tools can understand the vault quickly. """ + Environment.NewLine;}This brings several advantages: keeping code synchronized with upstream, saving disk space, and making it easier to manage multiple versions of the code. After all, who wants to download the same thing again and again?
File preview limits
Section titled “File preview limits”To prevent performance problems, the system limits file size and type:
private const int FileEnumerationLimit = 500;private const int PreviewByteLimit = 256 * 1024; // 256KBIf your vault contains a large number of files or very large files, preview performance may be affected. In that case, you can consider processing files in batches or using specialized search tools. Sometimes when something gets too large, it becomes harder to handle, not easier.
Diagnostic information
Section titled “Diagnostic information”When creating a vault, the system returns diagnostic information to help with debugging:
List<VaultBootstrapDiagnosticDto> bootstrapDiagnostics = [];
if (IsCodeRefVaultType(normalizedType)){ bootstrapDiagnostics = await EnsureCodeRefBootstrapAsync( normalizedName, normalizedPhysicalPath, normalizedGitUrl, cancellationToken);}If creation fails, you can inspect the diagnostic information to understand the specific cause. When something goes wrong, checking the diagnostics is often the most direct way forward.
Summary
Section titled “Summary”Through a unified storage abstraction layer, the Vault system solves several core pain points of reproducing projects in the AI era:
- Centralized knowledge management: all learning resources are gathered in one place instead of scattered everywhere
- Automatic AI context injection: AI assistants can automatically understand the available learning resources without manual context setup
- Cross-project knowledge reuse: knowledge can be shared and reused across multiple learning projects
- Standardized directory structure: a consistent directory layout lowers the learning curve
This solution has already been validated in the HagiCode project. If you are also building tools related to AI-assisted development, or facing similar knowledge management problems, I hope these experiences can serve as a useful reference.
In truth, the value of a technical solution does not lie in how complicated it is, but in whether it solves real problems. The core idea of the Vault system is very simple: build a unified knowledge storage layer that AI can understand. Yet it is precisely this simple abstraction that improved our development efficiency quite a bit.
Sometimes the simple approach really is the best one. After all, complicated things often hide even more pitfalls…
References
Section titled “References”- HagiCode project: github.com/HagiCode-org/site
- HagiCode official website: hagicode.com
- HagiCode installation docs: docs.hagicode.com/installation/docker-compose
- Obsidian official website: obsidian.md
- Git submodule documentation: git-scm.com/docs/gitsubmodules
If this article helped you, feel free to give the project a Star on GitHub, or visit the official website to learn more about HagiCode. The public beta has already started, and you can experience the full AI coding assistant features as soon as you install it.
Maybe you should give it a try as well…
Copyright notice
Section titled “Copyright notice”Thank you for reading. If you found this article useful, feel free to like, bookmark, and share it. This content was created with AI-assisted collaboration, and the final content was reviewed and confirmed by the author.
- Author: newbe36524
- Original link: https://docs.hagicode.com/blog/2026-04-06-vault-persistent-storage-for-ai-era/
- Copyright notice: Unless otherwise stated, all blog posts on this site are licensed under BY-NC-SA. Please include attribution when reprinting.