Skip to content

Copy-on-Write Workspaces

NoKV is a CoW workspace runtime — "git for live, large, derived state." Where git versions small text at dev time, NoKV gives snapshot / clone / diff / rollback over live filesystem namespaces (code + data + model weights) at runtime speed (O(metadata), not O(data)) and at scale.

This is the differentiating capability: spin up 100 agent workspaces from one base in O(1), each writing its own delta over shared object blocks; see exactly what an agent changed; revert a workspace to an earlier point. A pluggable metadata engine (Redis, TiKV) cannot offer this — it requires owning the copy-on-write metadata engine and the object layout together, which NoKV does.

The operations

All operate on a directory subtree (a "workspace") and are atomic + GC-safe.

OpService APISemantics
snapshotsnapshot_subtree(root) -> SnapshotPin{root, read_version}Read-only, zero-copy MVCC version pin. Pins a stable view and protects its object blocks from GC.
cloneclone_subtree(src) -> CloneHandle{root, snapshot_id}Writable fork. New root sees all of the source; shares object blocks zero-copy (same generation → same object keys); diverges on write (CoW). O(metadata).
diffdiff_subtrees(a, b) -> Vec<SubtreeDelta>Reports Added / Removed / Modified paths. An unchanged shared file (same generation) is skipped; a rewrite surfaces as Modified.
rollbackrollback_subtree(target, snapshot_id)Revert a workspace to a prior snapshot. Clone-from-snapshot + atomic graft onto the target (keeps its inode identity). The discarded delta's blocks become GC-reclaimable; restored blocks survive.

Path variants (clone_subtree_path, clone_subtree_path_into, diff_subtrees_path, rollback_subtree_path) take string paths.

The agent workflow

text
base workspace  ──snapshot──▶  base@v1   (immutable, shareable)

   clone(base) ──▶  /forks/agent-1   (writable, shares base blocks)

   agent runs: stat / read / write / rename …  (local hot path, no quorum)

   diff(base, /forks/agent-1)  ──▶  what the agent changed

   ┌── success: keep / promote the fork
   └── failure: rollback(/forks/agent-1, base@v1)   or   discard the fork

100 agents off one base share the base's files and object blocks; each fork's writes get a fresh generation → new object keys, so forks never clobber each other or the base.

CLI

sh
# Clone a base subtree into a new, navigable workspace
nokv clone /base /forks/agent-1
# → cloned /base -> /forks/agent-1 root=N snapshot=M

# See what diverged between two subtrees
nokv diff /base /forks/agent-1
# A    /b          (added in the fork)
# M    /a          (modified)
# D    /old        (removed)

(snapshot and rollback are exposed at the service/RPC layer; CLI wiring lands with the surrounding workspace-management commands.)

Guarantees

  • Zero-copy sharing. A clone references the source's object blocks by key (blocks/{mount}/{inode}/{generation}/…); only small metadata records are copied. Cloning a 1 TB dataset workspace copies kilobytes of metadata, not the data.
  • CoW isolation. A write in any workspace mints a fresh generation under its own inode → new object keys. Forks and the base are fully isolated; neither sees the other's post-clone writes.
  • GC safety (the owns_block_object_key invariant). An inode's GC reclaims only blocks it minted (its {inode}/{generation} prefix). A workspace never enqueues another workspace's shared blocks for deletion, and a retained snapshot pin keeps still-shared base blocks protected (blocked_by_snapshots).
  • Atomicity. Clone-link, divergent publish, and the rollback graft each commit in a single predicate-guarded metadata transaction. Crash-in-between leaves orphan objects (GC-able), never a corrupt namespace.

Why single-node (and how it scales later)

A live workspace has a single writer-owner; read-only snapshots are shared by many readers. This relaxes POSIX cross-node coherence (a tax AI workloads don't need) and avoids consensus-replicated metadata entirely. Cloud scale comes from sharding + lease (compute disaggregated over durable S3 + a tiny control plane that owns only the shard map and owner leases), not from a distributed metadata quorum. See architecture.md.

Performance

  • Metadata hot path is local (no quorum, no Redis/TiKV round-trip): low-latency lookup/stat/readdir.
  • Writes: ~213 MiB/s sequential (release), durable — close drains to S3.
  • clone / snapshot / rollback are O(metadata) pointer/record operations, not data copies.