Docs navigation

Docs Operations

Atomic Publish & Checkpoints

The atomic publish primitive behind checkpoints, the semantics that ship today, and the explicit failure-handling contract.

Checkpointing needs atomic metadata publication even when bytes are stored in an external object store.

Current Primitive

PublishArtifact uploads the object body first, then publishes metadata with a single metadata command:

body bytes -> object store
metadata command:
  - inode attr
  - dentry projection
  - body descriptor

The namespace entry appears only after the metadata command commits.

Current And Remaining Semantics

  • atomic replace returns the old body descriptor;
  • durable object GC queue stores old body refs from remove/replace;
  • snapshot pins protect old body refs until retired;
  • chunk manifests describe large checkpoint files;
  • read-only snapshot FUSE mounts can expose a pinned subtree as /;
  • service-level typed watch replay lets checkpoint consumers observe publish and replace events;
  • live FUSE mounts consume typed watch replay to invalidate kernel entry/inode caches;

Remaining work:

  • SDK watch consumer integration and broader node-local cache invalidation;

Failure Handling

The product contract should be explicit:

  • object upload failure means no metadata publish;
  • metadata publish failure returns staged object refs for explicit cleanup;
  • metadata remove/replace success persists old body refs in the metadata GC queue and returns the old body descriptor to the caller;
  • snapshot pins protect old body refs from object cleanup until retired;
  • snapshot pins protect metadata history before history GC;
  • typed watch replay uses the durable watch log rather than MVCC history.