Memory
UGENT has a hybrid memory system that persists facts across conversations. Memory is scoped per workspace and optionally per user.
Enabling Memory
[memory]
enabled = true
root_dir = "~/.ugent/workspace"
recall_cadence = "per_prompt"recall_cadence = "per_prompt"— recall once per user message, reuse across iterations (recommended)recall_cadence = "every_call"— recall every LLM call (legacy, higher token cost)
Backends
Full-Text Search (Default, No Embedding Model)
[memory]
enabled = true
mode = "markdown_fts5"
backend = "legacy_markdown"Uses Markdown files + SQLite FTS5 for lexical recall. No embedding model or API key needed. This is the default and works out of the box.
SQLite FTS Engine
[memory]
enabled = true
mode = "workspace"
backend = "sqlite_fts"New engine with reciprocal rank fusion. Still no embedding model required — pure lexical recall.
LanceDB Vector (Semantic Search)
[memory]
enabled = true
mode = "workspace"
backend = "lance_embedded"
embedding_dim = 1024Requires building UGENT with the memory-lance cargo feature. Fuses vector search with FTS recall when an embedding model is configured:
[memory.semantic]
endpoint = "https://dashscope.aliyuncs.com/compatible-mode/v1"
model = "text-embedding-v4"
api_key = "$BAILIAN_API_KEY"The embedding dimension must match the model's output width. If you change models, delete the LanceDB store.
Memory Tiers
Memories are organized into quality tiers:
| Tier | Description | Token Budget |
|---|---|---|
| Core | Always injected into the prompt | Fixed |
| Contextual | Injected when relevant to the current query | Tunable |
| Archival | Rarely injected, compacted over time | Minimal |
| Ephemeral | Session-scoped, not persisted | — |
Recall Tuning
[memory.search]
max_hits = 12
max_injected_tokens_est = 1200max_hits— maximum number of memory entries recalled per turnmax_injected_tokens_est— approximate token budget for the injected[MEMORY]block
Compaction
Scheduled compaction summarizes old conversations into concise facts:
[memory.pruning]
enabled = true
per_actor_cap = 200Per-actor pruning caps each user's memory count. Workspace and global scopes are never pruned.
Per-User Isolation
In multi-user deployments (web channel, federated identity), memory is isolated per user:
- Each user has their own memory scope
- Workspace and global scopes are shared read-only context
- Background extraction pulls durable facts after each turn without blocking the response
Sub-Agent Inheritance
[memory]
subagent_inheritance = "none"none— sub-agents do not inherit parent memory (default)full— sub-agents inherit parent memory context
Per-delegation inherit_memory = true on delegate_task can still opt in when set to none.
Write-Back
After each turn, UGENT extracts durable facts from the conversation in the background and stores them. This never blocks the response — extraction happens asynchronously.
Import / Export
Existing Markdown memory can be imported explicitly into the workspace scope. Startup no longer runs automatic Markdown import — use the /memory import command when needed.