Back to Projects
RAG Memory
Python 3.13MCPOllamasqlite-vecSQLite FTS5pytestuv
Claude Code users end up with dozens of skills, subagents, and curated references scattered across `~/.claude/`. Finding the right one at the right moment becomes the bottleneck — so this MCP server exposes a single `find_resource` tool that Claude can call before solving a task from scratch. It indexes three source trees (skills, superpowers agents, curated resources), returns the best-fit matches with ready-to-use invocation hints like `Skill("name")` or `Agent(subagent_type="name")`, and runs entirely offline against a local SQLite database. No network calls, no API keys, no external services — just a discovery layer that turns a growing personal catalog into something actually searchable.
Tech Details
Python 3.13 with `uv` for dependency management. The server talks MCP over stdio (`mcp>=1.2.0`) so it drops into any MCP-aware client with one config entry. Storage is SQLite with two extensions: `sqlite-vec` for 768-dim vector search and FTS5 for full-text search, kept in a single `~/.claude/rag/memory.db` file that's trivially portable.
Retrieval is hybrid. A query is embedded locally via Ollama (`nomic-embed-text`, 768-dim) and run as a cosine-distance search; the same query is tokenised and run through FTS5's BM25 ranker. The two ranked lists are merged via Reciprocal Rank Fusion (score = Σ 1 / (k + rank)), which avoids the cold-start failure mode of pure vector search — if a query happens to use the exact terms a skill's description uses, lexical match still wins, and vice versa for synonyms.
A deliberate choice: the embedding target is just `name + description + when_to_use` from each SKILL's frontmatter — never the full body. This keeps embeddings cheap, the index tiny, and the signal aligned with the discovery task. Full bodies are there if you want to read them; they don't need to be in the vector space.
Module boundaries are strict and one-way: `ollama.py` (HTTP only) → `store.py` (SQL only) → `sources.py` (filesystem parsing) → `index.py` (sync + change detection) → `search.py` (hybrid retrieval) → `server.py` (MCP plumbing). Lazy indexing on first query; a separate `rag-memory-reindex` CLI entry point does full rebuilds. Tests inject a fake embedding function so the suite runs with no network or Ollama dependency.