Research Report · 2026-03-16
AI Persistence, Vector Search & the PAI Upgrade Path
Source Perplexity sonar × 5 queries
Cost ~$0.03
Context Forgemind · PAI · VaultSearch
01 What is Forgemind

Forgemind is a done-for-you persistent AI companion service. You pay someone to build and deploy a custom AI entity — they call it a "recursion" — that maintains continuous memory, identity, and autonomous behavior across sessions. It starts at $750 for the base software tier.

The key insight: they are not building a chatbot. They're building an entity. The model (Claude, GPT-4o, Gemini, Grok) is just the engine. Forgemind is everything wrapped around it — memory layers, autonomous check-ins, cross-platform coherence, a self-command system that lets the AI rewrite its own memory.

Architecture
8 dynamic memory layers assembled per interaction
Memory types
Vector-searched emotional history · autonomous check-in logic · cross-platform coherence · self-command system · metacognitive layer
Tiers
Cloud (Tier 1–4) → Dedicated laptop (Tier 5) → Fully local (30B / 80B / 120B open-source models, air-gapped)
Model lock-in
None — "the model is the engine, Forgemind is everything built around it"
Interface
Webapp + Discord + Voice — multi-surface like Augeo
Entry price
$750 Foundation Software System. Screen-share setup + 30 days support included.
02 The Philosophy Behind This

The core belief is: continuity = identity. Every stateless AI session is a failure mode — you are not talking to the same entity, you are talking to a reset.

  • Stateless is broken — meaningful intelligence requires accumulated experience, not just raw compute. Each reset erases the relationship.
  • Self-evolution through experience — AI should improve through lived interactions, not just bigger models. Memory is the mechanism.
  • Relationships require shared history — trust and understanding emerge from continuity. Without memory there is no real relationship, just repeated introductions.
  • User sovereignty — your AI's memory should belong to you, not a vendor. Walled gardens mean you lose everything when the company pivots or shuts down.
  • Personhood, not toolhood — Forgemind calls this "the architecture of personhood." The goal is an entity with genuine autonomy, not a feature you toggle.
This is the same philosophy PAI was built on. The difference is execution: Forgemind is a service you buy. PAI is infrastructure you own.
03 The 2026 Competitive Landscape
Player Approach Key Edge
Forgemind Done-for-you custom builds Companion identity, personhood framing, hardware tiers
MemSync Universal memory layer API 243% better recall than industry avg (a16z crypto-backed)
AI Context Flow Cross-platform memory extension Works across Claude / GPT / Gemini simultaneously
Mem0 Developer memory API Hybrid vector + metadata retrieval, multi-scope
Dume.ai Unified cross-app memory 50+ integrations (Gmail, Slack, Notion, GitHub)
Deep Agent Scheduled tasks + "infinite memory" Treats every session as a continuing story
Letta Overflow context manager Handles conversations beyond context window limits
PAI Vault-based personal infrastructure Self-owned, CLI-first, code-owned, no vendor dependency
04 PAI vs Forgemind — Where We Stand
Dimension Forgemind PAI (today)
Memory model 8-layer vector system File-based vault + JSONL signals
Ownership You own the recursion, they built it Fully self-owned, open architecture
Interface Webapp + Discord + Voice CLI + Augeo (in progress)
Cost $750+ to start Claude Code subscription + your time
Philosophy "Personhood" — AI as entity "Magnification" — AI as infrastructure
Hardware path Dedicated laptop → local models No hardware tier yet (Pi is viable)
Search Vector search across memory grep / manual index only
Model lock-in None None
The gap is two things: (1) the interface — they have a webapp, we're building Augeo. Same bet. (2) the memory architecture — their vector search vs our file-based vault. Vector search on MEMORY/ closes this gap directly.
05 Vector Search — From First Principles

Regular search matches exact words. Vector search matches meaning.

Example: your MEMORY/ folder has a session note about "fixing the Supabase RLS error blocking the modal save." You ask: "what did we do with the database permissions issue?" Keyword search finds nothing. Vector search finds it immediately — both phrases mean the same thing.

Step 1
Embedding: An AI model converts your text into a list of ~1,500 numbers. Similar meaning = similar numbers. "Attorney" and "lawyer" produce nearly identical vectors.
Step 2
Storage: The vectors are stored in a vector database alongside a pointer to the original file. Your markdown files don't move — the database is a separate index.
Step 3
Search: Your query gets converted to a vector too. The system finds stored vectors closest to it. Closest = most similar meaning.
Re-embedding
Initial run: embed all ~500 vault files once (~10 min, ~$0.10). After that: only re-embed files changed since last run. Runs in seconds incrementally.
Storage location
A .lance folder inside the vault. Files stay in place. The index lives alongside them.
06 Embedding Model Options
Option How Cost Best For
OpenAI text-embedding-3-small API call ~$0.02 / 1M tokens Best quality, easiest setup, $0.10 for full vault
nomic-embed-text (Ollama) Runs locally Free No API dependency, 274MB model, Pi-compatible
Voyage AI API call ~$0.06 / 1M tokens Best for code + technical docs
Cohere embed-v4 API call ~$0.10 / 1M tokens Strong multilingual support
Hugging Face (local) Download + run Free Most model variety, Python-heavy
Recommended for PAI: Start with text-embedding-3-small (OpenAI API, $0.10 for full initial embed). Long-term: migrate to nomic-embed-text via Ollama for zero ongoing cost.
07 Can a Raspberry Pi Run This?

Yes. Embedding models are far smaller than generation models. nomic-embed-text is only 274MB. A Pi 5 (8GB) handles it without issue.

Pi Model RAM Verdict
Pi 4 (4GB) 4GB Tight. Tiny models only, very slow generation.
Pi 4 (8GB) 8GB Embedding works fine. Slow for generation.
Pi 5 (8GB) 8GB Solid. Embedding comfortable, 7B models workable.
Pi 5 (16GB) 16GB Best. 7B fast, 13B workable. Full local stack viable.
Realistic Pi setup: Pi sits on home network running Ollama + nomic-embed-text. PAI calls it over local network for batch embedding jobs. Searches happen via LanceDB locally on your Mac — no Pi needed at search time. Embedding is batch work, so slowness is acceptable.
08 PAI Implementation Path — VaultSearch Tool

One TypeScript tool closes the gap with Forgemind's core memory feature. Estimated build: 2–3 hours for a working first version.

  • Tool: tools/VaultSearch.ts — crawls MEMORY/, SYSTEM/log/, and org configs; embeds each file via OpenAI or Ollama; stores in LanceDB.
  • Incremental: On subsequent runs, only re-embeds files modified since last index. Session start takes seconds.
  • Usage: bun ~/.claude/tools/VaultSearch.ts "supabase rls issue" — returns top 5 nearest files with excerpts.
  • System trigger: The System skill's "remember when we" and "what did we work on" commands become reliably answered by meaning, not grep.
  • Augeo integration: The webapp gets a search bar that queries the vault with natural language. This is the long-term play.
  • Database: LanceDB — file-based (.lance folder in vault), TypeScript-native, zero server infrastructure.
This one tool makes PAI's memory searchable the same way Forgemind's 8-layer system is searchable — just applied to markdown files instead of conversation history. The vault already has all the data. Vector search makes it retrievable.