Visual RAG Memory Map · LocalRAGVisualMapSystem

01 / PROBLEM

Today's AI chat is opaque by default.

Single-turn Q&A works great. But using AI as a long-term thinking partner breaks on three opacities — none of which are about model capability.

→ 01 / MEMORY

Can't see what it knows.

Memory lists surface after the fact, flat, uncategorized. No situational toggles. "Use this tone for work, not personal" has no affordance.

→ 02 / INFLUENCE

Can't see what mattered.

Users don't know which memories or documents shaped a reply. Output feels like luck. Users over-trust or under-trust — neither is good UX.

→ 03 / SHAPE

Can't see what's in there.

Uploaded files become a name in a sidebar. A 300-page PDF and a one-line note look the same. The knowledge base is a black bag.

02 / ITERATION

The four moves weren't a plan. They were what survived.

I didn't map this app out and then build it. I built the simplest version I could, used it myself, noticed what sucked, fixed that one thing, used it again. Five rounds later, the four design moves on the next page are what was left standing.

ITER 01WHERE I STARTED

Just a chat window. Nothing else.

WHAT IT WAS

A plain chat box. Type a question, get a reply. No files, no memory, nothing saved between sessions.

WHY THAT WASN'T ENOUGH

Every time I opened it, it had forgotten everything. If I told it Monday "I'm a designer working on a music app," on Tuesday it had no idea who I was. Great for one-off questions. Useless as a thinking partner.

ITER 02+ FILE READING

Now it can read my documents.

WHAT I ADDED

Drag in a PDF, a résumé, meeting notes — the AI would actually read them and use them when answering. I also added a little "Sources" list under each reply so you could see where an answer came from.

WHY THAT WASN'T ENOUGH

The sources were broken. I'd ask "what's my identity?" and it would cite a random SEO doc. The same file got listed three or four times for the same answer — like filler. The "Sources" label looked like evidence but was really just noise. I couldn't trust my own app.

ITER 03+ MEMORY CARDS

Now it remembers things about me.

WHAT I ADDED

Little notes the AI always sees — organized by type. Who I am. Projects I'm working on. Preferences. Things I don't want it to do. Flip a card off when it doesn't apply to the current conversation.

WHY THAT WASN'T ENOUGH

The cards lived inside a separate pop-up window. Out of sight, out of mind. I'd write "keep replies under 3 sentences" and then have no way of knowing whether the AI was actually using that card on any given reply, or just ignoring it. The memory existed — but I couldn't see it working.

ITER 04+ VISUAL FEEDBACK

Now I can see which memory got used — kind of.

WHAT I ADDED

When the AI replied, the cards it had actually used would flash amber for about a second, like a little heartbeat. I also built a visual map of everything in the knowledge base — dots grouped into topic-clusters you could zoom and pan around.

WHY THAT WASN'T ENOUGH

Both of these lived inside pop-up windows. To see which card flashed, I had to click the Memory button, leave the chat, watch the flash, then come back to read the reply. The information was there — the cost of looking at it was too high. So I just stopped looking.

ITER 05WHERE I AM NOW

Now everything sits right in the conversation.

WHAT I ADDED

Small coloured tags directly under every AI reply, showing which memory cards it just used. Hover one to preview the content. Click it to edit the memory on the spot. For sources: only show a file if it's actually relevant to the question, and never the same file twice.

WHAT CHANGED

I can now glance at a reply and know why the AI said what it said — without ever leaving the chat. Sources became real evidence again instead of noise. The four design moves on the next page (Structure, Surface, Navigate, Retain) are the ones that earned their place after surviving all of this.

The same pattern kept showing up: the feature would work fine in the code, but I'd stop noticing it existed. The fix was almost never "build something new" → it was "move the thing I already built closer to where I'm actually looking."

03 / DESIGN FRAMEWORK

Structure → Surface → Navigate → Retain.

→ MOVE 01 · STRUCTURE

Memory Cards — a deck, not a textarea.

Turn hidden memory into a deck of typed, toggleable cards. User curates, model respects.

→ MOVE 02 · SURFACE

The firing pulse.

Animate the invisible. A 900ms amber pulse shows which memories fired — without stealing attention from the answer.

→ MOVE 03 · NAVIGATE

The 2D semantic map.

Project embeddings to 2D. Make latent space a place users can pan, zoom, and query — not a lookup table.

→ MOVE 04 · RETAIN

Local-first architecture.

Store everything locally in one deletable folder. Trust through architecture, not policy copy.

→ MOVE 05 · LOCALIZE

Inference moved on-device.

Move inference on-device via Ollama. The cloud becomes optional, not required.

MOVE 01 · STRUCTURE — Memory Cards. A deck, not a textarea.

Six typed categories. identity · project · preference · taboo · style · general. Chosen after testing free-form tags (sprawled to 40+ in a week) and three broad buckets (too coarse — preferences and taboos behave differently at prompt-injection time). Six was the smallest set the model actually respected.

A parse-paste flow. User pastes a messy brain-dump. One LLM call turns it into cards. User reviews, edits, toggles. 30-minute setup → 30 seconds.

MOVE 02 · SURFACE — The firing pulse, explaining AI influence.

After every reply, the app embeds the response and cosine-scores it against every enabled card. Top cards pulse amber for ~900ms. No numbers, no citation popup — just a quiet witnessable cue: these memories shaped this answer.

Killed the citation list. An earlier version listed "memories used" under each reply. Accurate but ugly — and it trained users to read the footnote instead of the answer. Ambient pulse > explicit citation when the user's primary task is reading.

Firing pulse — 900ms amber flash on memory cards that shaped a reply

MOVE 03 · NAVIGATE — The 2D semantic map. Latent space, walkable.

Every chunk: embedded (1,536-D), projected via UMAP to 2D, k-means clustered, LLM-labeled. Every dot is a chunk. Every color is a topic. RAG becomes navigation, not retrieval.

Interactions: cursor-anchored wheel zoom, click-drag pan with 5px click/drag threshold, rich hover tooltip with match-% bar, query mode that highlights top-K with connecting lines, fullscreen, keyboard shortcuts (+/−/0/F). Map first. Text second.

2D semantic map — UMAP projection of 294 chunks across 6 clusters

MOVE 04 · RETAIN — Local-first architecture. Trust via locality.

All state — every conversation, memory card, KB vector, Gmail credential, API key — lives in one 2.4 MB JSON file on the user's machine. No backend. No account. No sync. No telemetry. The only network traffic is the explicit API call the user configured.

Delete the file = complete reset. That clarity is the feature. The best trust signal is a file the user can delete.

C:\Users\…\AppData\Roaming\
local-rag-visual-map-system
└── config.json    // 2.4 MB — everything you own
    {
      conversations: [...],
      memoryCards: [...],
      kb: { docs, embeddings },
      kbMap: { points, clusters },
      anthropicKey: "***",
      gmailAppPwd: "***"
    }

→ Delete the file to fully reset.

Local config.json shown in Windows Explorer — one deletable file holds all state

MOVE 05 · LOCALIZE — Inference moved on-device.

v1 was local-first data but cloud-first inference. v1.2 closed the loop. Chat, embeddings, memory parsing, and cluster labeling all run via Ollama on the user's machine. The cloud became optional, not required.

Anthropic and OpenAI remain selectable in Settings — for users who want top-tier reasoning. But the default install path is now zero keys, zero accounts, zero outbound traffic. Privacy users no longer have a "but."

→ CHAT

qwen2.5:7b

via Ollama, on-device. ~10–15 tok/s on CPU · streaming.

→ EMBEDDINGS

nomic-embed-text

via Ollama, on-device. 768-dim · ~270MB · powers RAG + Map + firing pulse.

→ MEMORY PARSING

Ollama JSON mode

on-device. format:'json' · strict schema, no drift.

I didn't make this local-first because it's faster — it isn't. I made it local-first because the people I want to use this care more about privacy than latency.

04 / TRADEOFFS

What I ruled out, and why.

Every design choice is a killed alternative. The ones worth naming:

Decision	Chose	Ruled out
Projection	UMAP — stable under re-embedding, preserves global structure.	t-SNE rebuilds layout each run (breaks mental map). PCA smears semantics.
Memory schema	Six fixed categories — forces a productive decision: preference or taboo?	Free-form tags sprawl to 40+ labels. Three buckets are too coarse.
Explainability UX	Ambient firing pulse — available, not intrusive.	Citation list trained users to read the footnote instead of the answer.
Data architecture	Local-first — delete = reset. Zero breach surface.	Cloud sync brings accounts, TOS, vendor lock-in.

05 / WHAT I LEARNED

Three things, after five iterations.

→ 01

Animation is an explainability primitive.

The firing pulse explains memory influence in a way no citation list ever did. Motion can describe state-change without stealing attention.

→ 02

Latent space rewards spatial UI.

Once you can pan and zoom through your own knowledge, you stop treating it as "files" and start treating it as territory. RAG becomes navigation.

→ 03

Trust is a folder the user can delete.

No policy copy outperforms a single 2.4 MB file the user can put in the trash. Architecture as trust signal.

An AI chat app you can see through.

30 seconds of the thing actually working.

Today's AI chat is opaque by default.

Can't see what it knows.

Can't see what mattered.

Can't see what's in there.

The four moves weren't a plan. They were what survived.

Structure → Surface → Navigate → Retain.

Memory Cards — a deck, not a textarea.

The firing pulse.

The 2D semantic map.

Local-first architecture.

Inference moved on-device.

MOVE 01 · STRUCTURE — Memory Cards. A deck, not a textarea.

MOVE 02 · SURFACE — The firing pulse, explaining AI influence.

MOVE 03 · NAVIGATE — The 2D semantic map. Latent space, walkable.

MOVE 04 · RETAIN — Local-first architecture. Trust via locality.

MOVE 05 · LOCALIZE — Inference moved on-device.

qwen2.5:7b

nomic-embed-text

Ollama JSON mode

What I ruled out, and why.

Three things, after five iterations.

Animation is an explainability primitive.

Latent space rewards spatial UI.

Trust is a folder the user can delete.