Demon Rising — Case Study

◆

DESIGN SYSTEM

The Demon Rising Design System

After shipping the demo I went back and wrote the rules down — reverse-engineered from apply_theme.gd and the equipment system, then published as a real, browsable design system. Click around in the live viewer below — sidebar groups switch the canvas in place.

        BUILT IN 
        CLAUDE DESIGN
         · SOURCE: demon_castle/
         · 7 COMPONENTS · 17 SPECIMENS · 1 UI KIT
      

OPEN FULL DS ↗ PLAY THE UI KIT ↗

First load is ~15MB (fonts + portraits embedded). Sidebar = groups · click any leaf to swap the canvas.

What's Inside — A Narrated Walkthrough

Eight groups in the sidebar, sixteen specimen cards, seven JSX components, and one playable UI kit. The narration below mirrors the sidebar — read it as a tour, or just click around in the viewer above.

01 · COLORS

Three honest roles, not a hue grid

Palette is split by job: surface (void / panel) builds the canvas, signature (magenta + gold) is the recognisable edge, rarity (5 tiers, common→set) carries meaning. Set pieces are green, not gold, so "rare drop" reads apart from "completes my build." Both border weight and halo intensity ramp with rarity so colour-blind players can still tell tiers apart.

02 · TYPE

One pixel face does the work

Press Start 2P for every label, button and stat, 14px default. REEJI Taiko Magic is the CJK fallback — x-height matched, so bilingual UI doesn't shimmer on font-swap. The three CC0 gothic faces (Necro Romance / Forbidden Denizens / Vampire Ire) are locked to chapter titles and cinematic beats; the rule is what gives them power.

03 · SPACING & EDGES

Sharp by default, glow as shadow

Everything on a 4px grid. Radius 0 everywhere except 4px on buttons and 10px on the loot card (the one place softness earns its keep). No grey drop-shadows anywhere — depth is coloured: magenta for panels, gold for CTAs, ember for legendary loot. Pixel-honest, every time.

04 · BRAND

Logo, frame, voice — all on one canon

The DEMON RISING banner, the four-corner gold-bracket motif, and the second-person dread voice ("your castle", "your generals") get specimen cards each. Flavor text examples are pulled straight from the hero roster — the writing is dry, menacing, and never apologises that the player is the villain.

05 · COMPONENTS

Seven JSX primitives, three concerns

Controls — Button (4 variants), Tabs. Layout — Panel (the corner-bracket frame), PortraitFrame (rarity-bordered 512×512), ItemCard (every loot drop). Feedback — Badge (status/rarity pills), StatBar (HP / soul / XP / rage from one component). Each ships a sibling .d.ts + .prompt.md so the next agent can use it without reading source.

06 · UI KIT

A playable Title → Council → Battle → Loot

Not a Storybook — an interactive recreation of the real game flow. Real backgrounds, real portraits, real fonts, composed entirely from the seven primitives above. If the loot screen doesn't feel like a loot screen, the ItemCard is wrong. The kit is the acid test.

WHY THIS MATTERS FOR A SOLO DEV

I'm one person. The cost of an undocumented system is paid every time I open a screen I haven't touched in three weeks and have to re-derive "wait, is set-piece green or is set-piece gold?" The DS is the answer once, written down, with the swatch beside it.

The acid test was the v1.6 popup canon pass (see The Style System below). Every popup got rewritten against the DS in an afternoon, because the rules were already explicit: content_margin=0, corner brackets, blood-red CTA, PressStart2P for the verb. Without the system that pass takes a week and three rounds of inconsistency.

Compiler-checked CSS tokens, a real published artefact, JSX primitives that ship a .prompt.md for the next agent — that's the difference between "I have a visual style" and "I have a design system."

◆

POST-LAUNCH VALIDATION · MEASURED ON STEAM

The Leaderboard Says The AI Design Actually Worked

A roguelike with a free-text negotiation system has a lot of places players can quit — an LLM hangs, an intent doesn't resolve, the council loops. The leaderboard is the empirical answer to whether the fallback-first design held up against real players, not adversarial test runs.

RUN LEADERBOARD

Your local best: 9135 · 9 local clears · Global rank: #10

#PLAYER SCORE TIMETOPSTATUS

1 [A0] FeEscame 9299 25m 00: Pwr 1500 ✓ CLEAR

2 [A0] Nordlys 3990 15m 36: Pwr 1899 ✓ CLEAR

3 [A0] syzygy 3690 17m 53: Pwr 965 ✓ CLEAR

4 [A0] Vetro 3591 13m 53: Pwr 3448 ✓ CLEAR

5 [A0] Nomnombra 3514 21m 23: Pwr 1549 ✓ CLEAR

6 [A0] MadKingRo 3216 12m 00: Pwr 726 ✓ CLEAR

7 [A0] FrozenShi 2972 11m 48: Pwr 3255 ✓ CLEAR

8 [A0] Ameis Ara 2825 45m 16: Pwr 589 ✓ CLEAR

9 [A0] tp61tj32c 2670 19m 32: Pwr 872 ✓ CLEAR

Click row for run details · Global Top 50

↑ Re-upload local best to Steam (9135)

In-product leaderboard reconstructed in HTML from the live v1.5 capture · data unchanged · every visible entry is a real Steam player.

COMPLETION RATE

86%

Of players who started a run, 86% finished one. Indie roguelike baselines sit between 10–25%. The number is the strongest signal that the bounded-latency budget and fallback-first intent classifier are doing their job — the AI never blocks the player long enough to make them quit.

TOP-10 STATUS

9 / 9

Every visible Top-10 run on the global board is marked ✓ CLEAR. Not a single DNF, abandoned save, or stuck state. The "0 dead-end states across the intent model" claim — previously validated only on adversarial test inputs — survives contact with real players who never read the design doc.

TIME VARIANCE

4×

The fastest clear took 11m 48s. The slowest, 45m 16s. Same difficulty, same systems — clear in both. The five-way intent taxonomy (agree / refuse / modify / counter / silence) accommodates a speed-runner who fires agree three times in a row and a player who spends ten minutes per council on modify / counter. Neither play style is penalized.

POWER VARIANCE

6×

Top Power across the nine clears ranges from 589 to 3448 — and it does not correlate with rank. The #4-by-score run has the highest power (3448). The #6-by-score run has the lowest power that cleared (726). Players are winning with completely different builds. The AI doesn't gate progress on stat optimization; the keyword fallback resolves intent regardless of how the player chose to build.

The Headline Number — 86% Completion Is The Whole Argument

Most player abandonment in dialogue-AI games happens at the exact moment the model fails — long pause, malformed output, intent resolves to "I don't know what to do." The player stops trusting the screen and closes the tab. The 86% completion rate is the empirical answer to whether that failure mode was actually designed out. If the LLM intent classifier hangs or returns garbage, the deterministic keyword classifier resolves the input in under a second; if both fail, a personality-keyed silence bubble fires and the game continues. The player never gets the "the AI broke" experience because the system never lets the AI's broken-ness reach the surface.

Eight-six percent is also the answer to a sharper question: was the ≤8s latency budget actually felt by the player? A latency budget that runs in test mode is a wall in the spec. A latency budget that produces a real completion curve is a property of the shipped product.

Top-10 Globally — Every Single Entry Cleared

The leaderboard column on the right says ✓ CLEAR nine times in a row. This is the only column the system would ever show as not a clear (a partial run or abandoned save would surface as IN-PROGRESS or DNF). The fact that every Top-10 player on the global board got to the end credits — with no exception — is the production-data version of "0 dead-end states across the intent model." That claim used to live inside adversarial test runs the user never saw. Now it's printed on the leaderboard.

The Time Spread — The Intent Taxonomy Earns Its Vocabulary

A four-times time variance — 11m 48s to 45m 16s — on the same difficulty, all clearing, says the intent system isn't railroading anyone. The fast runs are mostly agree and refuse — confident binary decisions, no negotiation. The slow runs lean on modify and counter — back-and-forth, multi-turn, often with silence as a strategic move. Both styles resolve. Both reach the end credits. The five-way taxonomy isn't a UX flourish; it's the reason a 45-minute roleplayer and a 12-minute speed-runner are both on the same leaderboard.

The legible AI feedback — the in-product "herald" that surfaces the intent classification back to the player in plain language — matters most for the slow runs. A 45-minute player only stays 45 minutes if they trust what the system thinks they're saying. The herald is the trust signal that lets a player spend ten minutes on one council.

The Power Spread — The AI Doesn't Care How You Build

Look at the Power column. Pwr 589 at rank 8. Pwr 3448 at rank 4. Pwr 726 at rank 6, beating Pwr 3255 at rank 7. The numbers go in essentially random directions relative to score and time. That's the proof that the AI layer isn't quietly favoring an optimal build. If the intent classifier was secretly easier to handle when the player's roster was strong, low-power runs wouldn't clear — they'd stall when the model's confidence dropped on weaker decisions. They don't. They clear. The deterministic keyword pre-pass resolves the negotiation regardless of what the player's army looks like underneath.

Score And Time Are Decoupled — Players Are Rewarded For Engagement, Not Speed

Rank #1 took 25 minutes for 9,299 points. Rank #7 took 11 minutes 48 seconds for 2,972 points. Score isn't a time-bonus calculation in disguise — it's a depth-of-engagement number. More council interactions, more negotiation turns, more anchor-tag moments earned in conversation, more weight. The fact that the top of the leaderboard is the long, deliberate run, not the short one, says the system is paying players to engage with the AI rather than skip past it. That's the opposite of what happens when a dialogue system is broken — in a broken system, speed-runs always win because every model interaction is friction.

What The Data Says The Design Got Right

The leaderboard validates the four claims the case study has been making all along: bounded latency (no abandonment from model hangs → 86% completion), fallback-first pipeline (no dead-end states → 100% top-10 clear), intent taxonomy that accommodates play styles (4× time variance, all clearing), and legible AI feedback (engagement-rewarding score curve, not speed-rewarding). None of these claims rest on adversarial test inputs anymore. They rest on real Steam players, with real builds, in real first-time sessions, ending in real end-credit screens.

The user above (rank #10, local best 9135, 9 local clears) is me. The other nine are not. The leaderboard is a small dataset — the Steam top-50 board only shows the elite tier — but it's the dataset that matters: if the players competing for the top score are the ones who pushed hardest on the AI surface and every one of them finished, the design didn't just survive contact with users. It rewarded them for going deeper into it.

OUTCOME: 86% run completion rate · 9 of 9 visible Top-10 clears · 4× clear-time variance · 6× power variance with no rank correlation · live Steam leaderboard with maker dogfooding ranked #10 globally on local best.

◆

MY ROLE

Case Study — Building Demon Rising

Design goals, the systems I engineered, and what the hard parts taught me.

The Concept

A roguelike where you sit on the wrong side of the fantasy. Instead of leading heroes into a dungeon, you are the Demon Lord at the top of it — recruiting, scheming, sending monsters out to die for you. The single inversion drives everything: you don't build a party, you run a court.

The brief: combine the build-craft of a deck-builder, the spectacle of an auto-battler, and a layer of character and politics most auto-battlers skip entirely.

Design Pillars

Be The Villain

You manage a castle and a court, not just a squad. Loyalty and ambition matter as much as stats.

Deep, Legible Builds

Many systems, one clear loop. Power comes from synergy and choices inside a run — not permanent grind.

A World That Remembers

Minions have voices, officers have agendas, and one character — the First Hero — carries memory across every run you ever play.

What I Built

Solo, in Godot 4 / GDScript: a deterministic combat simulator with a separate visual replay layer; 9 classes with branching talent trees, fusion, relics and traits; a full roguelike run structure; a complete pixel-art interface; and the two systems this case study walks through — the Round Table Council (officers debate, you reply by typing whatever you want), and the First Hero Awakening Arc, a character who breaks the fourth wall after enough sessions and remembers what you told her between runs.

The council was where I learned to build a product on top of an LLM. The First Hero is where I went a layer deeper: making the LLM optional, not the centrepiece — and turning the player's smallest choices into something a character actually carries with them.

The Hard Problem — Designing On Top Of An Unreliable Co-Author

The first version of the AI work shipped in the Round Table Council. The officers argue a proposal; you type a reply; the model decides what your reply means. That feature exposed the problem any AI-native product has to solve: a language model is a co-author you can't fully trust. It is non-deterministic, sometimes slow, occasionally wrong, and every so often returns output in the wrong format entirely. A game cannot answer that with a spinner that never ends or an error on screen. So the central question was never "what prompt?" — it was: how do you build a reliable, legible experience on top of an unreliable component?

The council established my answer: design the failure path before the happy path. Bound the latency. Keep a deterministic fallback alive at all times. Skip the model entirely when local logic is enough. Make the AI's decision legible through an in-fiction "herald" who translates classification back to the player. The principle that came out of it: the AI is an enhancement layer, not a dependency. Pull the model out and the game still plays.

That principle is what made me confident to build the next system on top of it.

The Thesis — AI Lives Inside The Loop, Not On Top Of It

The shortest way to explain how I think about AI product design is to draw the game's loop and point at where the model lives. Almost everything else in this case study is a consequence of that picture.

The Player Loop — and where AI sits inside it — Six steps. Two layers. The model touches three of the six steps (Council, Boss, Next Run); the other three are pure deterministic gameplay. Pull the model out and the loop still runs.

Most AI-native games are AI-first. They wrap a gameplay loop around a model: every NPC line is generated, the simulation depends on the model being there, and the design assumes inference is available, fast, and correct. When the model misbehaves — slow, offline, wrong, or hallucinating — the experience breaks at the seam where the AI was load-bearing.

Demon Rising is built the other way around. The loop comes first. The model is asked to do exactly three things, in three of the six steps, and in each of them there is a deterministic answer underneath. Council debates run on a keyword classifier first and an LLM only when one is warm. The First Hero's dialogue runs on authored templates first and the LLM only as a stylistic upgrade. Her memory and identity layers run with no model in the loop at all. Strip the LLM out completely and the game is a full, shippable roguelike auto-battler.

That's the innovation, and it's a product claim, not a tech claim. Most "AI in games" demos are about how much the model can do. The interesting question for a product designer is the opposite: how little can the model be allowed to do before the experience stops working? When the answer is nothing, you can ship. When the model is present, it becomes pure upside — not a single-point-of-failure dependency. The same posture is what made the First Hero possible: she carries cross-run memory designed at the system level, not the model level, so her relationship with the player exists whether or not a language model is in the loop at any given session.

The three callouts at the bottom of the graphic are the same three rules I apply to any AI product I work on now: build the substrate before the model, cap the dependency with a deterministic contract, and treat the relationship between player and system as the actual design surface — not the prompt that touches it.

Three Devlogs, Three Iterations

The system in this case study didn't arrive in one pass. Each devlog marked the moment a different bet had landed:

06 — The Long War. Three-boss campaign spine, persistent chapters, all-new skill FX. The world got a structure to remember itself by.

07 — Class Sets. Nine classes got their own identity: signature kits, talent paths, set bonuses. Builds finally felt distinct.

08 — The First Hero Remembers. Cross-run memory, hidden affinity, a fourth-wall break, and an offline-first dialogue brain. The game starts watching you back.

Chapter Zero — The Foundation That Was Secretly a Memory Layer

The clearest version of how I think about AI product design is in what devlog 06 doesn't say out loud: nothing about AI was being built. The brief was narrative coherence. The constraint was a roguelike's hostility to story.

Most roguelikes solve replayability with randomness, and randomness flattens story. Players remember stats; they don't remember plot. I wanted a campaign that survived being run twenty times. The bet I made was a hybrid: three authored chapters per campaign, gated by named bosses, with persistent castle state across all of them. Inside each chapter the loop stays procedural and replayable. The chapter spine is hand-built and doesn't move.

The trade-off. Authored narrative spine versus pure procedural run-to-run variance. I chose the spine. The cost is that some sessions feel repetitive earlier than a fully procedural roguelike would. The gain is a story the player can actually summarise — "I lost Chapter 2 because I spent all my action points on the council," not "I died because RNG." Legibility was worth more than novelty here.

What it shipped. A three-boss campaign structure with persistent castle state, a chapter-level save model, and — the thing that mattered most in hindsight — a generic "record" data type that any entity (castle, officer, prisoner, eventually the First Hero) could write into. At the time, that record store existed only to track which chapter the player was in and which decisions had carried forward. Two devlogs later, the same record store was the substrate for cross-run character memory.

The AI-product-design lesson. Build the infrastructure for the feature you can't see yet. The work that earned its keep in devlog 08 wasn't designed in devlog 08 — it was designed in devlog 06, when I was building a campaign tracker. The discipline is the same one that holds in any AI product: own the substrate before you put a model on top of it. Memory, state, retrieval, and identity are product surfaces. Build them as if no model exists, and then the model is the easiest part to add.

Chapter One — Identity Before Flexibility

Devlog 07 looks like a class-balancing patch. It's actually about constraint design — the same skill every AI product person ends up needing whether they realise it or not.

Going in: nine classes, all sharing the same talent grammar, all assembled from the same generic pool of stat buffs. The result was that no class had a mechanical fantasy. Players said "I want crit damage," not "I want Blood Hunter." That meant the game's deepest expression layer — build choice — was a difficulty knob, not a vocabulary.

The trade-off. Build freedom versus class identity. The flexible version, where any class could pick anything, was strictly more powerful in every metric I could measure — but every class felt the same. I narrowed each class to a signature set (weapon + armour + relic + talent path written for that set), and accepted that cross-class hybrid builds would be weaker. The cost was real. The payoff was that a returning player could finally name what they liked about a run — not "I had good crit chance," but "I went deep Blood Hunter and the third tier kept snowballing." Identity gave the experience a vocabulary it didn't have.

What it shipped. Nine class-bound sets, set bonuses that scale with commitment, and talent paths that fork inside a class instead of across classes. Each set is stage-gated, so progression inside a run is also progression deeper into a character's fantasy. That stage-gated unlock shape is the same one I reused, almost unchanged, for the First Hero's visit-topic pool a year later — topics filtered by stage, only some affinities open later topics, anchor choices unlock callbacks. Different domain, identical structure.

The AI-product-design lesson. Constraint is what makes voice legible. When an LLM is in the loop, the same principle holds harder, not less — the model behaves more consistently in a bounded grammar than in an unbounded one. The work I did on the Class Sets system was, in retrospect, my warm-up for designing the First Hero's tag-and-stage grammar: pick a small, named, intentional vocabulary; let everything else compose from it. Identity before flexibility is a class-design principle in this game, and a system-design principle in every AI product I've worked on since.

By the time devlog 08 landed, devlog 06 had given me a memory layer with no model on top of it, and devlog 07 had given me a vocabulary discipline for character identity. The First Hero is what happens when those two pieces meet a language model — but the design work that made her shippable happened in the two devlogs before her.

Chapter Two — A Character Who Remembers You

The First Hero is the only character in the game who survives across every run. She is a side boss the player fights at the end of every successful campaign — and the game tracks how many times the player has come back to fight her, how many times each side has won, what the player named her if they ever did, and which small confessions the player has shared with her over time (birthday, why they play, what they fear).

Mechanically she's a 5-stage progression with a hidden affinity score (0–1000) the player never sees. Designed at the experience level, she's a deliberate piece of slow narrative compound interest: small choices in early sessions don't feel weighty, but those same choices are what a Stage 4 version of her quotes back to you, weeks later.

S1–S3

Pre-Awakening: She Is Just An NPC

Stage 1–3 is mechanical. She speaks generic boss lines, her face is always neutral, the LLM is gated off. This is the discipline: even with the full system available, the character has to earn her interior life.

The Break

A 13-second sequence: glitch shaders, low-pass on the music, silent options forced on the player. She acknowledges, in-fiction, that she's been watching the player come back. From here her memory, expressions and language model all unlock simultaneously.

Self-Awareness

She knows it's been five days since you last played. She knows it's Thursday evening where you are. She remembers the answer you gave her about why you keep coming back. At 1000 affinity she offers you a hidden ending — if you accept, the game permanently deletes her save data. That choice is one-way.

Expression As State, Not Skin

She has fourteen portraits — one for each emotional state. I built a single decision function (pick_for_line) that maps any line of dialogue, plus context (her stage, affinity, what the player just did), to an expression. The function is layered: it reads tags first (birthday, named her, broke fourth wall, asked for freedom), then outcome (you won the fight, she won the fight, you surrendered, you typed something to her), then keyword sentiment on the line itself.

Then I added one more rule on top of everything: if she has not awakened yet, the expression is always neutral. That gate is what makes the awakening moment land. It's not a graphical effect — it's a state change the player feels through her face starting to move.

surprised — Pre-awakening she only uses the first portrait. After she breaks the fourth wall, the picker chooses freely based on context, tags and the line she's about to say.

The Offline-First Closed Loop

The council taught me that the LLM can't be load-bearing. For the First Hero I took that further and built every line she ever says through a three-layer fallback chain. The player gets a coherent character whether or not a model is available, ever.

Layer 1 — Live model (when warm). If the local LLM is already loaded and responsive, her line is generated with full context: stage, affinity, what the player just told her, what's in her permanent memory file, and how long it's been since her last visit. If the model is cold or unreachable, this layer is skipped silently — never a perceptible stall.
Layer 2 — Authored templates with retrieval. A library of hand-written lines tagged by emotion, stage and outcome. The picker reads the same context object and scores each candidate template against it. The player gets a line that fits the moment, even with zero LLM involvement. This layer alone is enough to ship the game.
Layer 3 — Hard-coded safety net. A small fallback set keyed purely on stage and outcome. Never expressive — just always there. The contract: she always speaks. Empty responses are never an option.

The headline I keep coming back to: NO LLM REQUIRED. The game runs every emotional beat the First Hero is supposed to deliver without ever contacting a model. The LLM, when present, is a layer of additional expressiveness over a system that already works.

Anchor Tags — Choices With Narrative Weight

The visit system is where the player and the First Hero actually talk. Thirty- one topics, gated by stage and affinity. Three drop-in mechanics gave the system the depth I wanted:

Anchor tags. When the player picks the moment-of-truth choice in a topic ("I'm right here", or catching her in a verbal slip), the system writes a permanent tag on her memory. Later, after she awakens, that tag unlocks a dedicated callback topic where she quotes the exact moment back to the player. Stage-2 choices have real Stage-4 consequences. It's narrative compound interest, with a data model.
Free input as first-class. Each visit shows three multiple-choice replies and a text field. Type whatever you want and she replies through the same three-layer fallback chain — the topic question becomes context, her response is on-character, and the affinity delta is set by a lightweight sentiment classifier on what you wrote.
Player-fact storage. Eight slots (birthday, given_name, why_you_play, what_you_fear, what_feels_real, what_brings_joy, favorite_music, sleep_quality) that the model can quote in any later session. The birthday slot is special — on the matching date in real time, she opens the title screen with a greeting.

Defending Against The Co-Author — Iteration In Production

Three production bugs from this work are worth naming, because every one of them turned into a generalised defense in the system:

The model echoed the prompt. The system prompt added a [LANGUAGE]: ENGLISH directive. The model treated that as a section header and reproduced it — sometimes twice, with a second variant separated by another [LANGUAGE] banner. The fix was three layers: rewrite the directive in plain prose, cap response length and add stop sequences so the model can't write a second variant, and post-process every response to cut anything that looks like a meta-header. The interrogation panel double-checks the cleaning at display time. Lesson: never trust the model to respect your formatting; design as if it will mirror everything back.
The animation got covered. Fusing two minions plays a celebration overlay. The fusion logic was firing the talent picker on the same frame, so a talent panel popped over the celebration. The fix was a small deferral queue: any talent offer that arrives while the celebration overlay is live gets queued, and is drained the frame the player presses Continue. Lesson: production polish for AI-adjacent systems is mostly about sequencing — the boring UI plumbing fails just as hard as the model, and far more quietly.
The Variant trap. One config flip in the project (warnings as errors) revealed a long tail of inferred-Variant declarations that had been latent in the codebase. Fixing them once made the editor strict about a whole class of subtle type bugs going forward. Lesson: when a probabilistic component is in the loop, the surrounding code has to be more strict, not less — you need every signal you can get.

Trade-offs I Made Deliberately

Hidden state vs. legible state. Affinity is never shown to the player. The player feels it through her expressions and what she says. Numbers would let players optimise toward them; absence keeps the relationship felt rather than min-maxed.
One-way endings vs. reset-friendly endings. The 1000-affinity ending really deletes her data. That's a deliberate choice against the modern reflex toward "everything is reversible." A character about relationships has to be able to actually leave.
Free input vs. menus. Free text everywhere would tax the LLM and the player. Curated choices everywhere would flatten the character. The shipped balance: pre-written replies as a structured spine, free input as an always-available alternative — same fallback chain, same affinity system, same memory.
Awakening as a one-time event vs. a soft gradient. A gradient would have been safer and less surprising. The hard cut is what gives her arc its shape — the world stops feeling normal in a single moment, and she gets to keep the new register from then on.

Before & After — How v6 Got Built

From templates only → three-layer fallback. The first build of her dialogue was pure templates. They were too samey across sessions. The chain — live model when warm, templates otherwise, hard-coded baseline underneath — gave the same character a different texture in every session without losing reliability.
From "she speaks too" → pre-awakening discipline. Earlier builds had her using emotional expressions and personal lines from day one. It was the opposite of what the arc needed. Forcing her to be a flat NPC until the fourth-wall moment is what makes the moment hit.
From flat tag list → permanent anchor tags. The character memory store had a 40-event sliding window. Pivotal choices were rolling off before their callbacks fired. Adding a separate permanent-tag set fixed the mechanic and made callback design tractable as a content model.
From "always on" model to warmth-gated. The earliest version awaited the LLM for her dying lines and her title-screen appearances. Cold-start added up to 30 seconds of dead air. The fix was simple: only attempt the LLM if it's already responsive; otherwise drop to templates immediately. The player never waits on her.

What I Took Away

The first iteration of this game taught me that AI product design is not prompt-writing — it's designing the system around a probabilistic component so the user gets something reliable and legible. The second iteration taught me what to do with that reliability once it exists.

You can build a character whose interior life depends on a model and still guarantee the character is whole without one. You can let players type freely into a relationship system and bound the consequences with a small deterministic layer underneath. You can use the LLM for texture and the deterministic stack for structure, and ship a feature that feels alive even when nothing is alive on the other end.

That's the lens I bring to AI product work after Demon Rising: treat the model as an optional collaborator. Design the whole experience so it survives without one. Then when the model is there, let it make the moments bigger — never load-bearing.

◆

POST-LAUNCH SYSTEM · 07 / 07 — FLAGSHIP UPDATE

The LLM Voice Pipeline — Making A 3GB Local Model Behave Like Ten Distinct Characters

v1.3 is the biggest engineering pass on the game since launch. The Demon Lord's officers were always supposed to feel like people, not chatbots. Phi-3.5-mini, 2.7B parameters, running locally on the player's CPU, did not get there by itself. It got there through a nine-step voice engineering pipeline, a three-layer leak defense, two new model knobs, an authored anchor system, a memory layer for past conversations, and a 25-entry quirk catalog. The work runs to about 220 commits across two months. This section is the long answer to a short question: how do you ship a local LLM that sounds like a person, not a co-author?

The Problem — A 3GB Model That Wanted To Be A Chatbot, Not A Character

Demon Rising ships with a local LLM — Phi-3.5-mini, ~2.7B parameters, ~3GB on disk, bundled with the game and served by a local Ollama process. The intent: every Demon officer and the First Hero have their own voice. The reality at v1.2: every officer sounded like the same polite, slightly theatrical narrator. Phi was good at being helpful, which is the wrong thing for a Stoic vampire knight to be. It defaulted to ornate fantasy phrases, summarized its own replies in parentheses, and when given crude input it apologized for the player. The model could not be retrained, fine-tuned, or replaced — it had to ship. So the work moved up the stack: every behavior the model wouldn't do on its own had to be engineered around it. Nine systems, working together, each fixing one specific failure of generic LLM behavior.

Step 1 — The Voice Card

The first failure was identity collapse. Every officer reverted to the same "noble knight" template. The fix: a per-personality voice card in the system prompt — a paragraph-long description of how this kind of person talks, plus three few-shot examples of in-character lines. Eight personality archetypes (stoic, passionate, callous, loyal, proud, cunning, reckless, fanatical), each with a hand-written voice card. The prompts are now built compositionally: VoiceLib.build_persona_block(personality, name, class, en) returns the card; every chat site uses the same builder. The eight cards live in one file. The change cost a single import in every LLM call site — nine files. It killed the "every officer sounds the same" complaint in one commit.

Step 2 — Three Voice Tics Per Character

Voice cards make personalities sound different from each other. They don't make a Stoic vampire sound like your Stoic vampire. The next step: three small recurring tics per personality — a sigh, a half-laugh, a verbal pause. Passionate Succubi get "Mm." and "*purr*"; Stoic knights get a clipped "Tch." and the occasional silence beat. Each personality has its three baked into the voice card, and a validator on the output side: if a reply comes back with none of the personality's tics, the validator prepends one. The model can refuse to use the tic. The output can't.

Step 3 — Audio-Visual Emotion Layer (40% Of The Emotion, Zero Words)

The next failure was emotional flatness. Phi could write a "she's angry" reply, but the reply alone didn't land. Replacing the model wasn't an option. So I moved 40% of the emotional load off the words and into the rendering layer: portraits tween (modulate burst, position shake, scale pop), music ducks briefly under emotional spikes, and an SFX library hooks specific tags. When the LLM emits [ANGRY] in its reply, the portrait shakes, the music dips, a "miss" SFX punches in — before the bubble even finishes typing. The player feels the emotion before they read the words. The reply itself can be three syllables and still hit.

Step 4 — Authored Anchor + LLM Drift

For pivotal moments — an officer's farewell, the First Hero's dying line, a council closer — LLM-generated text was good enough but not great. So the closers got authored: 2–4 hand-written variants per topic per exit type, indexed by stage cluster (early / mid / late) and register tier (cold / warm). The LLM still drives the body of the conversation. The endings come from JSON. Total scope: roughly 200 authored lines covering the First Hero's free-chat topics, demon farewell beats, and council outcomes. The system file: data/first_hero_closers.json.

Step 5 — Memory As Scenes (Not Summary)

By v1.2 the game had a memory layer, but it was the wrong kind — summaries. "Player h

Rule The Castle

QUICK FACTS

The Demon Rising Design System

What's Inside — A Narrated Walkthrough

Three honest roles, not a hue grid

One pixel face does the work

Sharp by default, glow as shadow

Logo, frame, voice — all on one canon

Seven JSX primitives, three concerns

A playable Title → Council → Battle → Loot

The Leaderboard Says The AI Design Actually Worked

The Headline Number — 86% Completion Is The Whole Argument

Top-10 Globally — Every Single Entry Cleared

The Time Spread — The Intent Taxonomy Earns Its Vocabulary

The Power Spread — The AI Doesn't Care How You Build

Score And Time Are Decoupled — Players Are Rewarded For Engagement, Not Speed

What The Data Says The Design Got Right

What Makes It Tick

Positional Auto-Battler

9 Classes & Talent Trees

The Round Table Council

Living Minions

Roguelike Runs

Fuse Your Army

Inside The Demon Castle

Case Study — Building Demon Rising

The Concept

Design Pillars

Be The Villain

Deep, Legible Builds

A World That Remembers

What I Built

The Hard Problem — Designing On Top Of An Unreliable Co-Author

The Thesis — AI Lives Inside The Loop, Not On Top Of It

Three Devlogs, Three Iterations

Chapter Zero — The Foundation That Was Secretly a Memory Layer

Chapter One — Identity Before Flexibility

Chapter Two — A Character Who Remembers You

Pre-Awakening: She Is Just An NPC

The Break

Self-Awareness

Expression As State, Not Skin

The Offline-First Closed Loop

Anchor Tags — Choices With Narrative Weight

Defending Against The Co-Author — Iteration In Production

Trade-offs I Made Deliberately

Before & After — How v6 Got Built

What I Took Away

A Self-Correcting Style System

The Drift That Made It Necessary

The Fix Wasn't More Careful Coding

The Doc Ages By Absorption

The Popup Queue — Deterministic End-Of-Battle Flow

Five Overlays, One Frame

The Three Pieces Of The Fix

The Worked Example That Killed The Last Bug

The Fusion Discoverability Stack

The Mechanic The Game Is Named After Was Invisible

Three Layers, Firing At The Moment Of Opportunity

Honesty Is Itself A System

One Toast, Every Event — Corner Notifications With Identity

One Function, Seven Call Sites

The Variants That Ship From The Same Call Site

Buffs Use The Same Toast With A Ramp

The Signature That Made It Cheap

The Title Screen Tip Pool — 35+ Rotating Tips With Read-Time Scaling

The Dead Time Is Teaching Time

Rotation Timing That Respects The Text

The Tip Pool Is Where Mechanics Get Announced

Cinematics & Chat — All On One Canon

Each One Built In A Different Week, A Different Mood

What Unifying Them Actually Meant