Skip to content

Memory

UGENT has a hybrid memory system that persists facts across conversations. Memory is scoped per workspace and optionally per user.

Enabling Memory

toml
[memory]
enabled = true
root_dir = "~/.ugent/workspace"
recall_cadence = "per_prompt"
  • recall_cadence = "per_prompt" — recall once per user message, reuse across iterations (recommended)
  • recall_cadence = "every_call" — recall every LLM call (legacy, higher token cost)

Backends

Full-Text Search (Default, No Embedding Model)

toml
[memory]
enabled = true
mode = "markdown_fts5"
backend = "legacy_markdown"

Uses Markdown files + SQLite FTS5 for lexical recall. No embedding model or API key needed. This is the default and works out of the box.

SQLite FTS Engine

toml
[memory]
enabled = true
mode = "workspace"
backend = "sqlite_fts"

New engine with reciprocal rank fusion. Still no embedding model required — pure lexical recall.

toml
[memory]
enabled = true
mode = "workspace"
backend = "lance_embedded"
embedding_dim = 1024

Requires building UGENT with the memory-lance cargo feature. Fuses vector search with FTS recall when an embedding model is configured:

toml
[memory.semantic]
endpoint = "https://dashscope.aliyuncs.com/compatible-mode/v1"
model = "text-embedding-v4"
api_key = "$BAILIAN_API_KEY"

The embedding dimension must match the model's output width. If you change models, delete the LanceDB store.

Memory Tiers

Memories are organized into quality tiers:

TierDescriptionToken Budget
CoreAlways injected into the promptFixed
ContextualInjected when relevant to the current queryTunable
ArchivalRarely injected, compacted over timeMinimal
EphemeralSession-scoped, not persisted

Recall Tuning

toml
[memory.search]
max_hits = 12
max_injected_tokens_est = 1200
  • max_hits — maximum number of memory entries recalled per turn
  • max_injected_tokens_est — approximate token budget for the injected [MEMORY] block

Compaction

Scheduled compaction summarizes old conversations into concise facts:

toml
[memory.pruning]
enabled = true
per_actor_cap = 200

Per-actor pruning caps each user's memory count. Workspace and global scopes are never pruned.

Per-User Isolation

In multi-user deployments (web channel, federated identity), memory is isolated per user:

  • Each user has their own memory scope
  • Workspace and global scopes are shared read-only context
  • Background extraction pulls durable facts after each turn without blocking the response

Sub-Agent Inheritance

toml
[memory]
subagent_inheritance = "none"
  • none — sub-agents do not inherit parent memory (default)
  • full — sub-agents inherit parent memory context

Per-delegation inherit_memory = true on delegate_task can still opt in when set to none.

Write-Back

After each turn, UGENT extracts durable facts from the conversation in the background and stores them. This never blocks the response — extraction happens asynchronously.

Import / Export

Existing Markdown memory can be imported explicitly into the workspace scope. Startup no longer runs automatic Markdown import — use the /memory import command when needed.

Released under the Private Beta License.