Memory

UGENT has a hybrid memory system that persists facts across conversations. Memory is scoped per workspace and optionally per user.

Enabling Memory

toml

[memory]
enabled = true
root_dir = "~/.ugent/workspace"
recall_cadence = "per_prompt"

recall_cadence = "per_prompt" — recall once per user message, reuse across iterations (recommended)
recall_cadence = "every_call" — recall every LLM call (legacy, higher token cost)

Backends

Full-Text Search (Default, No Embedding Model)

toml

[memory]
enabled = true
mode = "markdown_fts5"
backend = "legacy_markdown"

Uses Markdown files + SQLite FTS5 for lexical recall. No embedding model or API key needed. This is the default and works out of the box.

SQLite FTS Engine

toml

[memory]
enabled = true
mode = "workspace"
backend = "sqlite_fts"

New engine with reciprocal rank fusion. Still no embedding model required — pure lexical recall.

LanceDB Vector (Semantic Search)

toml

[memory]
enabled = true
mode = "workspace"
backend = "lance_embedded"
embedding_dim = 1024

Requires building UGENT with the memory-lance cargo feature. Fuses vector search with FTS recall when an embedding model is configured:

toml

[memory.semantic]
endpoint = "https://dashscope.aliyuncs.com/compatible-mode/v1"
model = "text-embedding-v4"
api_key = "$BAILIAN_API_KEY"

The embedding dimension must match the model's output width. If you change models, delete the LanceDB store.

Memory Tiers

Memories are organized into quality tiers:

Tier	Description	Token Budget
Core	Always injected into the prompt	Fixed
Contextual	Injected when relevant to the current query	Tunable
Archival	Rarely injected, compacted over time	Minimal
Ephemeral	Session-scoped, not persisted	—

Recall Tuning

toml

[memory.search]
max_hits = 12
max_injected_tokens_est = 1200

max_hits — maximum number of memory entries recalled per turn
max_injected_tokens_est — approximate token budget for the injected [MEMORY] block

Compaction

Scheduled compaction summarizes old conversations into concise facts:

toml

[memory.pruning]
enabled = true
per_actor_cap = 200

Per-actor pruning caps each user's memory count. Workspace and global scopes are never pruned.

Per-User Isolation

In multi-user deployments (web channel, federated identity), memory is isolated per user:

Each user has their own memory scope
Workspace and global scopes are shared read-only context
Background extraction pulls durable facts after each turn without blocking the response

Sub-Agent Inheritance

toml

[memory]
subagent_inheritance = "none"

none — sub-agents do not inherit parent memory (default)
full — sub-agents inherit parent memory context

Per-delegation inherit_memory = true on delegate_task can still opt in when set to none.

Write-Back

After each turn, UGENT extracts durable facts from the conversation in the background and stores them. This never blocks the response — extraction happens asynchronously.

Import / Export

Existing Markdown memory can be imported explicitly into the workspace scope. Startup no longer runs automatic Markdown import — use the /memory import command when needed.

Memory ​

Enabling Memory ​

Backends ​

Full-Text Search (Default, No Embedding Model) ​

SQLite FTS Engine ​

LanceDB Vector (Semantic Search) ​

Memory Tiers ​

Recall Tuning ​

Compaction ​

Per-User Isolation ​

Sub-Agent Inheritance ​

Write-Back ​

Import / Export ​

Memory

Enabling Memory

Backends

Full-Text Search (Default, No Embedding Model)

SQLite FTS Engine

LanceDB Vector (Semantic Search)

Memory Tiers

Recall Tuning

Compaction

Per-User Isolation

Sub-Agent Inheritance

Write-Back

Import / Export