Bitez — Jestaz Yao

01 / THE CORE NEED

"Just tell me where to eat" is a decision request — not a discovery request.

Existing apps treat hunger like a research project. Open Yelp, Google Maps, OpenTable — every one shows a 50-item list sorted by something opaque, and transfers the work of deciding from app to user. The actual job-to-be-done is "hand me a single confident pick, with the reasoning I can audit, in the time it takes me to put my shoes on." The product builds outward from that one act.

→ DECISION FATIGUE

50 options ≠ help.

Showing more transfers cognitive load onto the user. A confident product picks one. The only way to do that without being wrong a lot is to have a reasoning system the user can interrogate when they disagree.

→ TRUST DEFICIT

Recommendations need receipts.

"We picked this because of X, Y, Z" beats "trending now." Every reason is sourced (rating · weather · your taste · distance · time · budget) and shown explicitly. Showing the work is what makes one-pick viable.

→ MOMENT-AWARENESS

Mood + weather + budget + history all matter.

The same person at noon on Wednesday and 9pm on Friday wants different things. The recommender reads time, weather, learned price tier, recently-presented set, and a single user-chosen mood. The AI's job is to make that read feel human, not algorithmic.

02 / AI ARCHITECTURE

The model translates. Code decides. Both layers are visible to the user.

This is the central Product AI question on every product I design: what is the model allowed to decide? In Bitez the answer is narrow on purpose. The LLM (Apple Foundation Models, on-device) gets two jobs — both about turning messy human language into something deterministic code can work with, or rendering deterministic code's output in human language. Everything that affects the actual recommendation runs through plain Swift.

→ LLM SIDE · Apple Foundation Models

Reads intent. Writes warmth.

Parse free-text dish requests. "Curry-something hearty" / "ramyon" / "fish that doesn't smell." Returns a typed DishIntent the recommender can use.
Narrate the pick in friend-voice. One warm sentence + 3 short reasons, grounded only in the deterministic reasoning facts. Tone the user actually wants to read.
Nothing the user can't override. Both outputs are advisory; the underlying signals are always visible if AI fails or hallucinates.

HANDOFF

JSON

SCORING

→ CODE SIDE · deterministic recommender

Picks the place. Justifies the pick.

Weighted scoring across rating, distance, walkability, cuisine match, mood bias, budget tier, learned history, and a recently-presented penalty for variety.
The reasoning engine mines facts from sourced signals — Ratings, Weather, Your taste, Distance, Time, Your budget — each carrying a strength score the UI ranks.
Honesty fact for budget mismatch: if a $$ wins for a $$$$ user, the engine surfaces a dedicated fact explaining why — turned a perceived bug into the most-loved feature in testing.

[ MODEL READS · CODE DECIDES · MODEL NARRATES ]

03 / TWO AI MOMENTS, SHOWN

Both AI touchpoints, captured in the live app.

Apple's on-device LLM has two narrow, high-leverage jobs in Bitez. Below are real frames from the live App Store build — the model isn't simulated, the screenshots aren't mockups. Each one shows exactly what the LLM contributes and what's still being decided by deterministic code underneath.

Bitez calibration screen — Found 20 spots near 113564, Filtering by $$$$ budget, Matching French, Locking in Spicy lover

Enter password below to view

Moment 01 · PARSE

Free-text dish intent → structured cuisine, mood, and flags.

The calibration screen replays everything the user just said. The last row — "Locking in Spicy lover" — is the Foundation Models output. The user didn't pick "Spicy" from a list; they typed it (or "curry-something hearty", or "fish that doesn't smell"), and Apple's on-device LLM extracted it into a typed DishIntent the recommender knows how to use.

User input: "curry-something hearty"
LLM output: { dish: "curry", cuisine: "Indian", isHearty: true }

The model never picks the restaurant. It only converts messy phrasing into structured fields. A 110-entry keyword dictionary acts as the safety net — instant on common cases, the LLM only runs when the dictionary can't pin a cuisine. The result feels like the app understood you.

Apple Foundation Models @Generable schema Keyword fallback · 110+ entries 5s timeout race

Bitez pick view — Nami Nori West Village, friend-voice narration above, budget honesty fact below explaining why $$ won when user said $$$$

Enter password below to view

Moment 02 · NARRATE

Sourced reasoning facts → one warm sentence the user actually wants to read.

The line above the restaurant name — "Jestaz, Out of everything nearby, Nami Nori West Village is the one. Trust me." — is the model speaking. But it's not free-styling. It's reading the deterministic reasoning facts (rating, distance, walking time, mood, weather, recently-presented penalty, the budget-honesty fact below) and rewriting them in one sentence.

Hallucination guard: the line is rejected if it doesn't mention the restaurant name, cuisine, or user's named dish. If it fails, the template version takes over and the user never knows.

Underneath, the deterministic reasoning is still visible — every fact is sourced (BUDGET in this frame), strength-ranked, and auditable. The AI adds warmth; it doesn't replace the math. When a user changes their budget and the same place wins, the reasoning row labelled "Still your best match" is sticky — AI narration can't overwrite the honest fact underneath.

@Generable AINarrationSchema Hallucination guard Grounded on facts Sticky reasoning

Two bounded jobs. Both about language — reading user phrasing, writing the explanation. Neither one ever decides anything that touches money, distance, or the actual pick.

04 / THE LEARNING LOOP

Bitez gets smarter every visit — and the data never leaves your phone.

The privacy promise and the personalization promise usually fight. A normal app says: "to recommend better, we need your behaviour." Then it ships that behaviour to a server, models you, and the more it knows the better it pretends to be. Bitez does the opposite: every "I'm going" tap is a private signal recorded in a counter on the device. After 5 commits the recommender knows what you actually pick — without anyone else knowing anything.

EXPLICIT (what you typed)

Onboarding declaration.

cuisines: [italian, korean] — the chips you tapped at sign-up
budget: $$ — the tier you said you live at
dietary: [vegetarian] — declared filters
maxDistance: nearby — how far you said you'll walk

These are what you said. Treated as ground truth at cold start, because the app doesn't have anything else to go on yet.

IMPLICIT (what you actually do)

Local on-device counts.

likedCuisines: ["japanese": 5, "italian": 1]
likedPriceLevels: [3: 4, 2: 1]
recentVisits: [last 10 "I'm going" snapshots]
seen: [skipped IDs for current session]

These are what you actually picked. Live counters, never reset on app close, never synced anywhere. The user who declared $$ but keeps choosing $$$ gets a quiet $$$ bias from then on — without ever being told "you're a $$$ person now."

The math is deliberately bounded. Implicit signals are +bonus on top of the explicit framework, never a replacement for it. A user who declared "no Japanese" gets that respected even if they accidentally tapped one Japanese place — the explicit veto wins. Implicit cuisine gets up to +18 in Special mood; price-tier learning caps at +6 as a tiebreaker. The recommender can lean into what you do, but it can't override what you said.

What the app deliberately doesn't learn is just as important: no negative dampening on cuisines you skipped (that's a confirmation-bias loop where you stop seeing whole categories), no time-of-day × cuisine cross-tables (would overfit in 1.0 with 3-5 data points and pin a user to one mood-shape forever), no cross-user collaborative filtering at all (would require an account and a server — kills the privacy promise dead). Each one is queued as a 2.0 candidate when there's enough usage data to make the math honest.

Where the user sees it.

Settings → Recent visits. Five most recent "I'm going" taps, tap to re-open in Google Maps or Apple Maps. The only visible surface of the learning data — everything else lives quietly inside the scorer. Users who want to see the model see it; users who don't, don't.

Where the design discipline shows up.

Reset everything in Settings is one tap. The "you're being learned about" anxiety is mitigated by making the data visibly cheap to throw away. The privacy promise isn't a clause buried in the EULA — it's an action button two screens deep.

05 / TWO MODES, TWO POOLS

A user looking for dinner and a user looking for a cafe are different products. The toggle reflects that.

Most restaurant-finder apps put a "Cafes" filter chip on the same list and call it a feature. That's wrong, because the success criteria genuinely differ — a dinner pick is rated on food quality, price tier, occasion fit; a cafe pick is rated on dwell-time signals (real seating, wifi-mention rate, noise level, room to spread out a laptop for two hours). Same word, "place to eat," very different shopping list. Bitez doesn't share the pool. The corner toggle is the visible surface of an architectural split.

EAT MODE

"I want a meal."

Google types pulled: cuisine-specific *_restaurant types based on the user's saved cuisines (or generic restaurant when none)
Excluded at API: cafe, coffee_shop, tea_house primaries — Starbucks structurally cannot appear in Eat results
Rank by: POPULARITY — surfaces the strong, well-reviewed options
Scoring branch: dish keywords, cuisine match, mood-shifted budget, dietary tags, dish-intent text from the search box
UI: Mood chips (Quick / Special), dish-intent search button, Cuisines + Dietary visible in Settings

CAFE MODE

"I want to sit and stay."

Google types pulled: cafe, coffee_shop, bakery, breakfast_restaurant, brunch_restaurant (cafes registered as breakfast spots are common in NYC)
Excluded at API: restaurant, bar, fast_food_restaurant primaries — actual restaurants don't sneak into Cafe results just because they serve coffee
Rank by: DISTANCE — popularity drags in Magnolia/Levain 30 min away; the local independent that's actually walkable wins
Scoring branch: review-count as capacity proxy, dineIn for real seating, editorialSummary scanned for "cozy / quiet / intimate," price tier INVERTED so $-$$ wins
UI: Mood chips hidden (no Quick/Special — irrelevant to working from a cafe); dish intent disabled; "Cafe extras" Settings (Real seating preferred, Outdoor preferred, Quieter places preferred)

The cache layer follows the split. Each fetch signature carries the context as its first axis ("dining" vs "cafe"), so the two pools never overlap or evict each other in the 8-signature LRU bundle. A user who toggles back and forth doesn't lose what they had — the cafe pool sleeps while the dining pool is active and vice-versa. The Settings sheet adapts: cuisines and dietary sections vanish when the context is Cafe (they describe meals, not work spots); "Cafe extras" appears in their place. Two contexts, two scorers, two Settings shapes — one navigation shell.

The design question that made this section worth writing: when should a product give the user a context switch vs a filter? A filter says "here's the same data, sliced differently." A context says "here's a different question, asked differently, answered differently." The threshold I held: if the WINNING signals change shape — not just weights, but which signals enter the math at all — then it's a context, not a filter. Dinner picks weight dish keywords; cafe picks weight dwell-time signals. Different shape. Two modes earned the split.

06 / DESIGN SYSTEM

A design system lifted from the code — not drawn beside it.

Bitez ships with its own design system, and every value in it was extracted straight from the shipped SwiftUI — so the source of truth is the app, and the design file can't drift away from what users actually see. It's structured the way a production library should be: raw primitives feed semantic tokens (with real Light/Dark modes), which feed a component set, which assemble into screen patterns. Screens are built only by composing component instances — never hand-drawn — so a token change ripples everywhere at once.

raw color primitives

23×2

semantic tokens · Light + Dark modes

~30

components · 46 icons · variants + slots

screen patterns · 100% instance-built

→ LAYER 1 · PRIMITIVES

Raw values

The literal palette from BBColor — brand reds, warm accents, ink darks, neutrals, alphas. No semantics yet.

35 colors

→ LAYER 2 · SEMANTIC

Meaning + modes

surface / text / border / brand / status. Each aliases a primitive per mode, so Light and Dark are one switch.

23 tokens × 2 modes

→ LAYER 3 · TYPE + NUMBER

Rhythm

A 30-style Inter ramp grounded in real iOS sizes, plus a half-step spacing scale, radius, sizing and stroke tokens.

30 text styles · 42 number tokens

→ LAYER 4 · COMPONENTS → PATTERNS

Assembly

Buttons, chips, inputs, list rows, pills, the pick-screen blocks, the driving suite — composed into 9 full screens.

~30 components → 9 patterns

Color · brand + accent + status (the values the app actually paints with)

brand/tint

#FF3B30

brand/tint-press

#E0271C

brand/accent

#FF9500

accent/peach

#FFD5A8

accent/sand

#F5E6D3

accent/tan

#D4A574

status/success

#34C759

status/warning

#FFCC00

status/severe

#FF9500

ink-950

#0E0E12

ink-800

#1F1F26

gray-50

#F2F2F7

◐ Light mode · semantic surfaces

surface/bg

surface/card

text/primary

brand/tint

◑ Dark mode · same tokens, re-aliased

surface/bg

surface/card

text/primary

brand/tint

Type · Inter ramp, real iOS sizes (display → label)

display/lg · 40I'll pick. You eat.

heading/sm · 22Good evening, Jestaz

friend-line · 22Lucia's pours pizza by the slice.

body/md · 14Hand-stretched dough, tomato, fresh basil.

label/sm · 10Order this · why I picked it

Spacing · half-step scale

space/1 · 4

space/2 · 8

space/3 · 12

space/4 · 16

space/6 · 24

space/8 · 32

space/10 · 40

Radius · sm → full

sm·8

xl·12

3xl·16

4xl·18

full

Component library · every reusable surface, as a real variant component

Button ×6IconButton ×4Chip ×5Tile ×4Input ×5Toggle ×3ToggleRowStatusPill ×5Banner ×3ProgressBarContextToggleListRow ×3PickTopBarActionsBarFriendLineRestaurantIdentityFirstPickBadgeDishCardViewFoodCTAReasoningBlockReviewCardMoodTileDrivingTopBarDrivingStatsRowDrivingMicButton ×3CalibrationStep ×3Icons ×46

The point isn't the swatch grid — it's the discipline. Because the system was reverse-built from shipped code and screens are assembled only from instances, the design file is a faithful mirror of the product, not a hopeful sketch of it. One token edit re-themes every screen; one component fix propagates to every place it appears.

07 / THE TRANSPARENCY PROBLEM

When the user moves but the recommendation doesn't.

A user opens Settings, drags budget from $$ to $$$$, taps Save & refresh. The recommender re-ranks. The same restaurant wins. Mathematically the algorithm is right: a 4.5★ place 12 minutes away beats every $$$$ option in the user's radius on signal weight. Perceptually the app is broken. The user changed the input. They expect the output to change. When it doesn't, the algorithm has three seconds to defend itself out loud — or it loses the user.

This is the design problem most "smart" recommenders lose on. Yelp shows fifty options so the user feels in control. A single-pick product can't do that — and that's exactly the problem worth solving. This is the signature moment of the case study because it's the test of whether transparent reasoning is a real design discipline or just a buzzword.

Pick view defending an off-tier choice with five visible design moves

Enter password below to view

The fix · five visible layers

Same pick. Visibly defended on five fronts, in one screen.

The feedback came from a real tester. Direct quote, not paraphrased: "my friend keep saying after you change the budget range, it's giving you the same restaurant." One line of feedback, five layered design moves in response. The same place wins — but everything around the place changes loudly enough that the user understands the algorithm did run, and did hear them.

→ 01 The price-tier badge gets visibly louder. "$$ below your $$$$" rendered inline under the rating row. The single most direct answer to "did my $$$$ change even take effect?" — yes, we read $$$$, and yes, we picked $$ anyway. The user knows the signal landed before they read another line. UI EMPHASIS

→ 02 The budget honesty fact. A sticky reasoning row, surfacing every time pick.priceLevel != profile.budget: "You said $$$$ but $$ won — it beats your higher-tier options here on rating, distance, or what you actually keep eating." Sticky means AI narration cannot overwrite it. The reasoning shows its work, in plain language, every time. NARRATIVE

→ 03 "Still your strongest match" chip. When the user-driven refresh re-ranks to the same restaurant, a small confirmation chip surfaces: "This is my strongest match for you right now — trust it." Tells the user explicitly the algorithm ran AND landed here again — not "nothing happened". RUN CONFIRMATION

→ 04 Recently-presented penalty. Inside the recommender: a −25 score penalty against any restaurant the user has been shown in the recent past. Doesn't reshape the whole ranking, but gives the algorithm a small variety nudge so cold relaunches don't always land on yesterday's pick. Pure code, invisible to the user — but the user feels the variety. ALGORITHM NUDGE

→ 05 Post-save confirmation toast. "Updated — showing your top $$$$ picks" appears briefly after Save & refresh, before the user has read anything else on screen. The change is acknowledged the moment it happens, not retroactively inferred from a different pick that never came. ACK

The recommendation didn't change. Everything around it did. Five visible design moves, ranging from a single inline phrase to a recommender-level score penalty, all in service of one outcome: the user understands the algorithm heard them, even when the result lands in the same place. The friend who'd called it broken never raised that complaint again.

[ ALGORITHM HOLDS · UI DEFENDS · USER UNDERSTANDS ]

08 / FEEDING THE AI WITHOUT PAYING TWICE

On-device AI is free. The ground truth it anchors to isn't.

Apple's Foundation Models run free on the user's phone. Google Places (New) — the data the recommender grounds every reasoning fact against — does not. A naive build hits the API on every screen open and burns the unit economics inside a week. Three small caching choices, each protecting the AI's accuracy without paying for it more than once a day.

01 · POOLS

Multi-pool cache.

Eight LRU pools by query signature — toggling Italian → Korean → Italian is free after the second visit.

02 · TIME

Local "open now" math.

Fetch the weekly schedule once, compute open / closed locally against the device clock. The pool stays valid 24 hours, not 30 minutes.

03 · EMERGENCY

Weather-triggered refetch.

Severe-weather alerts at the user's coordinate enter the cache signature → automatic miss → live refetch right when local hours actually shift.

[ NAIVE: 1 API CALL / APP OPEN → STAGED: ~1 / DAY / USER ]

09 / TESTER-DRIVEN ITERATION

Every beta build was paired to a specific tester observation.

Iteration is the proof that the product is being validated by real people, not assumed from the inside. Each release below pairs an actual quote from a beta tester with the design / engineering response that shipped in answer. The point isn't speed — it's that the loop is closed: feedback in, fix out, ship, repeat. Nothing reverse-engineered for the case study.

01 "This app keeps recommending closed restaurants." Hard-filter rule: drop only confirmed-closed places from Google's openNow; keep unknown hours in the pool but bias them lower. Pool got smaller, pool got more honest. Three days later this got smarter: Stage 2 computes "open now" locally from the cached weekly schedule.

02 "It just showed me a strip club." Three-layer venue filter: primary-type denylist (night_club, casino, adult_entertainment), secondary-type denylist (any matching tag), and a name-token blocklist for places whose primary type lies but whose name doesn't. Plus excludedTypes pushed to Google at fetch time.

03 "After I changed the budget, the same restaurant came back." Four moves at once: heavier price-tier weight in the scorer, a much larger visible $$ badge, a "Still your top pick" confirmation chip when re-ranking legitimately picks the same place, and a -25 recently-presented penalty so cold relaunches don't always land on yesterday's pick.

04 "Why did you pick $$ when I said $$$$?" The budget honesty fact: a dedicated reasoning row appears when the pick is off-tier — "You said $$$$ but $$ won — it beats your higher-tier options on rating, distance, or what you actually keep eating." Sticky (AI narration can't replace it), visible the entire time the pick stays on screen.

05 "I kept tapping Skip and the screen froze." Two-layer freeze defense: 5-second timeout race around every AI call (TaskGroup against a sleep), and a 2.5-second LoaderOverlay kill-switch that force-hides any stuck spinner. The unhappy path now has the same care as the happy one.

06 "What's this 'Range requires lowerBound <= upperBound' crash?" Triple-layer markdown defense: escape every dynamic field with a markdownSafe extension, balance-check ** / _ counts at the AttributedString.markdown() entry, and sanitize AI-generated reasons before they reach the parser. Same crash had recurred — this stop-shipped it.

07 "The more I use it, the less API budget should burn." Cost optimization Stages 1 + 2: multi-pool cache (8 LRU signatures) so toggling cuisines is free after the first visit, plus 24-hour cache backed by locally-computed open/closed math. Combined cumulative API savings: ~60-80% depending on session pattern.

08 "What if there's a storm — won't hours be wrong?" Stage 3 — WeatherKit emergency refetch. Severe / extreme weather alerts at the user's coordinate enter the cache signature, automatically busting the 24-hour cache exactly when local schedules become unreliable. Home banner: "Winter Storm Warning — hours may vary." Honest signal, free API.

09 "App crashed in Foundation Models again." Crash-crumb pattern. UserDefaults flag set before each AI call, cleared on normal return. Next launch sees it stuck → previous run died inside Apple's framework → disable AI for this session, escalate to permanent after two strikes. App never crashes inside the same Apple bug twice in a row.

10 "I want to find a cafe to sit and work from — not a restaurant." Cafe as a parallel app context. Eat / Cafe pill at the top of Home switches the entire experience. The two modes share the UI shell and almost nothing else: different Google fetch (cafe / coffee_shop / bakery / breakfast_restaurant types, with restaurant excluded by excludedPrimaryTypes only so cafes carrying "restaurant" as a SECONDARY tag still pass), a separate cafeScore branch (review count proxies capacity, dineIn confirms real seating, editorialSummary scanned for "quiet / cozy / intimate", price tier inverted so $-$$ wins), a separate cache pool, and DISTANCE ranking instead of POPULARITY — so the local independent beats Magnolia Bakery 30 minutes away. Mood and dish intent are disabled in Cafe context: they describe meals, not work spots.

11 "I switched Eat → Cafe → Eat and the same cafe showed up as my dinner pick — with a warning sticker." Mock-data fallback excised, in-memory pool wiped on every context switch. The first ship had a defensive layer that fell back to MockData (Greenleaf Kitchen et al.) when Google returned empty, fronted by a "Showing sample picks" sticker. Honest in dev mode — dishonest the moment a real user sees a place that doesn't exist. Removed the fallback, removed the banner, and tightened the lifecycle so any context flip clears restaurants = [] before the new fetch can race. No pool leakage between modes; failures now produce a true empty state with a clear "set your city" CTA, not fake picks dressed up as real ones.

12 "It recommended Bob's Donuts at 11 PM — closes at 11:30." Hours math became a first-class scoring concern. Added minutesUntilClose() to the Restaurant model — runway computed locally from regularHours + device clock, no extra API call. Recommender now penalizes <30 min runway by -40 (effectively buries it), 30–60 by -22, 60–120 by -8 — in both Eat and Cafe modes. Pick card swaps the generic "Open now" for a concrete "Open until 11 PM", and when runway drops under an hour a red "Closing in 35 min" pill renders right under the meta row with a clock-with-exclamation glyph. Same data drives the score and the warning; no chance of disagreement.

13 "Some guy named Shakil Ahmad just showed up as a bakery 25 miles away." Two-layer junk gate. Pattern observed in the wild: someone creates a Google Maps "business" profile with their own personal name, tags it "Bakery", uploads an office-building exterior photo, 0 reviews, 0 rating, "Claim this business" flag — and Google's API surfaces it ranked by distance. The (rating == 0 && reviewCount == 0) combo is now a hard reject at the provider layer. Plus a defensive distance gate that drops any computed result beyond 1.5× the requested search radius regardless of name or quality — Google occasionally ships places with miscoded lat/lng that pass the API's own radius check.

14 "Allow 'Bitez
这个icon 现在怎么有两个缺口?
食物也有缺口?
Bitez' to use your location?" The submission-day catch. The CFBundleDisplayName key in Info.plist had been silently corrupted — chat-debug content from an earlier session had embedded itself into the production app name. iOS injects that string verbatim into the system permission popup. Caught in real-device testing the day before App Store submission would have shipped — a guaranteed Apple review rejection. Fixed both Debug and Release configurations and replaced the location-permission description with the proper brand copy. Confidence in real-device dogfooding restored.

15 "It said 4 min walk but Maps took me 18 minutes — there's a highway between us." MKDirections for the recommended pick only. After the recommender selects a place, fire ONE MKDirections call (Apple Maps' free pedestrian routing API) for that specific pick. Single-flight — any new pick cancels the previous request, no TaskGroup races (the parallel version had crashed under iOS 26 beta). Only fires when straight-line haversine says ≥5 min away, since that's where freeways, water, and bridge crossings make geometry lie. The walk-time label on the card updates the moment Apple's route returns — typically 200–800ms. Haversine stays the cheap baseline; MKDirections is the truth pass.

16 "I typed 11364 in Settings and got 'Set your city' — but Onboarding accepted the same zip just fine." Onboarding / Settings parity for location input. Centralized the city → coordinates translation in one resolveCityToCoordinates method called by both surfaces. Three-layer geocode: raw input → 5-digit ZIP gets a ", USA" suffix (CLGeocoder's well-known blind spot) → 1.1-second pause + retry to recover from rate-limit throttling. Process-cache so repeated lookups of the same text never hit Apple. Plus a stricter rule on top: both surfaces now require the city field to contain at least one letter — pure-digit ZIPs are rejected at validation with an inline "Type Bayside instead of 11364" hint. Stops the failure before it can happen.

17 "Is this 'WHAT REVIEWERS LOVE' card AI-generated, or actual review text? I can't tell." Source-attributed summary, three paths, three badges. One review-summary card body, three possible sources, three visually distinct attribution rows so the user always knows what they're reading. On-device AI synthesis → ✦ APPLE INTELLIGENCE · ON-DEVICE (sparkle icon, accent purple). Google's editorial blurb verbatim → FROM GOOGLE'S EDITORIAL SUMMARY (warm orange). First positive review unedited → FROM A RECENT REVIEW. A negative-tone token list ("terrible", "avoid", "would not recommend" ...) is scanned at every layer so the "what reviewers love" promise on the card title can't be falsified by a 1-star rant slipping through.

18 "Every time I open the app I have to look at 'What kind of bite tonight?' before I can see a pick. Annoying." Mood Gate removed from the auto-show paths. Two paths used to dump the user onto the full-screen mood selector — post-onboarding entry and re-entry after a 30-minute idle window. Both were redundant: Home already shows the mood chips and the dish-text search button right under the greeting. Removed both auto-show triggers; the gate code stays in the project for users who specifically want the screen via the search button, but it never auto-appears. Fewer screens, no functionality lost — the kind of removal that's harder to justify than an addition.

19 "Why is Starbucks showing up when I asked for dinner?" Eat / Cafe pool separation tightened at the API layer. Added cafe, coffee_shop, and tea_house to dining mode's excludedPrimaryTypes — Google never returns a coffee-shop primary place when the user picked Eat, so the Starbucks-as-dinner case is structurally impossible. The reverse already excludes restaurant/bar/fast-food primaries from cafe mode. Bakery stays in Eat (Levain, Tartine, etc. are real food destinations). The two contexts are now truly disjoint pools — not just differently-scored views of the same pool.

20 "After one AI timeout I lost AI for the rest of the session — and after two crashes I lost AI for good. Way too aggressive." AI auto-disable removed; replaced with user-controlled toggle. The old "AI failed once → disable session" / "AI failed twice → disable forever" defense was meant to stop crash loops on iOS 26 beta, but the cure was worse: a single transient timeout could lose the user AI for hours. Every nil result is now just logged; every new pick gets a fresh AI attempt. The only ways AI gets disabled now are (a) the explicit "Skip AI summaries" Settings toggle (the user picks when iOS reliability isn't worth the wait), or (b) the in-session breadcrumb that protects against an immediate crash loop. A migration on first launch of this build clears stale "permanently disabled" UserDefaults flags from older builds, so users whose AI got stuck off in earlier sessions get it back automatically.

21 "I keep trying to use this at red lights but the buttons are tiny and the friend line is hard to read at a glance." Driving mode as a whole UI shell, not a layout tweak. Detection aggregates three independent signals — CarPlay scene connect (UISceneSession.Role raw-string match, no CarPlay entitlement needed), AVAudioSession current route containing bluetoothA2DP / bluetoothHFP / carAudio, and CMMotionActivityManager reporting automotive at non-low confidence. Any one fires, the entire PickView swaps for DrivingPickView: pure-black background, 54pt restaurant name, 140pt Skip / I'm going buttons, friend line as the deterministic template (AI's 25-second timeout doesn't fit a stoplight). Auto-narrate via AVSpeechSynthesizer with .duckOthers + voicePrompt mode plays the friend line over CarPlay audio. Voice input via SFSpeechRecognizer (on-device when available) — tap mic, say "ramen", get a fresh pick. Settings → Diagnostics has a sticky Force driving view toggle for demoing without an actual car. False positives (Bluetooth headphones on a walk) get a "Use normal view" pill in the top-right.

22 "View their menu just bounces me to a sketchy website. I want to see what the food actually looks like first." Killed the external-link CTA. Built an in-app gallery ranked by Apple Vision. "See the place" sheet renders every photo Google has for the restaurant (up to 8) as a 2-column grid. Each image runs through VNClassifyImageRequest on-device: 60+ food-related identifier tokens (food, dish, pizza, ramen, salad, burger, ...) compute a max food confidence; anti-tokens (storefront, facade, sign, person, building) above 0.5 zero it out. Photos sort food-first so the burger photo floats above the storefront, but storefronts stay visible — the user wanted to "see the place," not just the food. Zero network round-trips for classification, zero permission prompts, ~25–80ms per image on A15+. The original website still lives as "View their store" at the bottom of the sheet — one tap further, no longer the front door.

23 "The review on this card — is that a real review? Can I see the other ones?" Review-summary card became an expand surface. Source attribution stayed (editorial / review badges), but when the source is a positive review snippet and Google returned 2+ usable ones, a tap-anywhere "SHOW N REVIEWS" toggle reveals all of them. Each expanded review gets its OWN quote glyph + body + thin separator — the first prototype shared a single quote icon at the top-left of the card and the two stacked reviews read like one giant run-on quote. Empty-string snippets are filtered out before counting, so "SHOW 3 REVIEWS" can't disagree with the number you see on expand. Single-snippet and editorial cards stay tap-disabled but render at full opacity — replacing the previous Button + .disabled implementation that dimmed the whole card and looked broken on the very places that didn't need a toggle.

24 "Why are random places with 5 ratings showing up? Those don't feel real." Quality gate tightened at the provider layer. Old rule rejected only when rating==0 AND reviewCount<3. New rule rejects ANY place with reviewCount<3 regardless of star rating — the same gate that caught the "Shakil Ahmad - Bakery" unclaimed-listing pattern now also catches the more subtle "1× 5-star review from an owner-aligned account" pattern that flatters a brand-new listing into looking established. Applied upstream of type filtering so it kicks in for Eat AND Cafe modes. Cache version v16 → v17 so existing users' first session on the new build repopulates the pool with the new floor — no waiting for the 24-hour cache to age out.

25 A multi-day Foursquare experiment ending in a delete. The story is the discipline. Why I tried Foursquare in the first place. Google Places — the only data source the app uses — has documented gaps. There's no structured noise level ("is this place quiet enough to read in?"), no structured wifi info ("can I open a laptop here?"), and no review-source voice distinct from Google's own reviewer base. For the Cafe mode I'd just shipped (builds 10 / 11), those three gaps are exactly what users care about. Foursquare publicly advertised all three as structured fields with a generous 180k-call / month free tier. On paper the integration looked free: a single per-pick enrichment call gives me the missing axes, and "FROM A FOURSQUARE TIP — DIFFERENT CROWD" surfaces a non-Google review voice as a separate card so the user sees the place through two crowds, not one. Did the full build: created a Service API key, switched to the new places-api.foursquare.com host, added Bearer-prefixed auth + the required X-Places-Api-Version date header, modeled the per-pick enrichment (foursquareNoiseLevel / foursquareHasWifi / foursquareTips / foursquareVerified), wired it into cafeScore as the strongest noise signal (quiet → +24, very_loud → −28), and shipped the blue-accented review-source card.

Why I deleted it. Foursquare's free tier returned 429 "no API credits remaining" on every app request — but a curl test from the same machine, same key, same minute returned 200 with 179,960 monthly requests still on the counter. The diff between the two requests was the fields= parameter: attributes and tips — the exact two fields I was integrating Foursquare to get — are categorized as Premium Data and require per-call paid credits even when free-tier quota is untouched. Without those fields, the integration provided exactly one usable signal: the verified boolean. One boolean is not worth a per-pick HTTP round-trip, the 429 log noise, the extra cache-invalidation surface, the schema risk on the Restaurant model. Two options: pay for Premium credits to keep the original value proposition, or accept that the free-tier reduction made the integration a net loss. I chose the second one. Deleted enricher, the AppCoordinator hook, the four Restaurant fields, the Recommender wiring, the blue tip card. Cache version bumped v17 → v18 so the on-disk shape matches in-memory. Left a stub file with this rationale so a future me doesn't try the same experiment twice.

What's documented here is the deletion, not the build. Shipping a feature and removing it once the economics break is a separate discipline from never building it.

26 A recurring fatal-error "Range requires lowerBound <= upperBound" crash, traced into libswiftCore on a background queue inside FoundationModels. FoundationModels turned off until iOS 26.6. Hard-killed both narrate() (friend-line + reasoning rewrite) and the AI fallback inside parseDishIntent. Every escalation of input sanitization across the last weeks — asciiOnly, markdownBalanced, escape-all-markdown, AttributedString parser short-circuit — bought roughly a week of stability before a new edge case landed in the tokenizer. The trap is a Swift precondition inside Apple's framework, on a background queue, untouchable by try? and unreachable from our code. Friendlier and more honest to route around: ReasoningEngine's deterministic friend-line templates and the 80+ entry keyword-dictionary dish parser cover the same UX surface (warmer copy lost, never crashes). Re-enable is two return nil deletions when iOS 26.6+ ships a tokenizer fix. The honest story is choosing reliability over a marginally-warmer sentence — the same kind of trade-off as build 25's delete.

Starbucks Coffee Company appearing in Eat mode — pre-fix

Enter password

BEFORE BUILD 19 · Starbucks in Eat mode

The Eat / Cafe boundary was a scoring suggestion, not a structural rule.

Picture left: Eat mode, $$$$ budget, recommender picks Starbucks Coffee Company. Even with the "below your $" honesty fact rendered, this is a category miss — the user asked for dinner. The cafe pool and the dining pool used to share a Google search and differ only in scoring weights. Build 19 split them at the API layer: cafe primaries are excluded from Eat fetches, restaurant primaries are excluded from Cafe fetches, and the two contexts become truly disjoint pools.

↓ THIS IS NOW STRUCTURALLY IMPOSSIBLE ↓

"Closest match in your range" pill on the card was correct — Starbucks really was the closest 4-min-walk venue. The bug was the pool, not the scorer.

Settings screen showing Skip AI summaries toggle and Try AI again now button

Enter password

AFTER BUILD 20 · User-controlled AI

Reading this top-to-bottom is the design discipline: every fix paid down a specific human moment. Not a sprint plan, not a roadmap — feedback in, fix out, ship.

Mood Gate v1 — 4 mood cards, no AI input

Enter password

EARLY BUILD · 4 moods, no AI

The mood gate went from a 4-card menu to a 2-card + free-text input.

Day-zero hypothesis: pick a mood, get a place. Day-ten reality: testers wanted to say what they wanted. Removing "I'm hungry" and "Comfort food" cut the mood taxonomy in half; adding the free-text input gave the LLM its first job (PARSE) and let users phrase real cravings — "ramyon", "fish that doesn't smell", "curry-something hearty".

↓ AI ENTERED HERE ↓

The "Try 'curry' or 'something spicy'" placeholder is a hint and a contract: speak normally, the app understands. Apple Foundation Models parses, the keyword dictionary is the safety net.

Mood Gate v3 — 2 mood cards plus an AI free-text input

Enter password

CURRENT BUILD · 2 moods + AI input

09·5 / ADAPTIVE LAYERS

The product reshapes around you — never the reverse.

Most "smart" food apps make the user adapt to the product: pick a filter, scroll a list, calibrate to what the UI offers. Bitez does the opposite — the app watches what's around the user (CarPlay connected, Bluetooth audio paired, motion classifier says automotive) and reshapes the entire surface in response. The same discipline extends to travel mode and environment preferences. Three examples of one belief: the product owes the user flexibility, not the other way around.

Bitez driving mode — black background, 54pt restaurant name, big skip / I'm going buttons, voice mic

Enter password

A · IN-CAR SHELL

CarPlay or Bluetooth audio paired → entire UI swaps.

Detection aggregates three independent signals — CarPlay scene role (raw-string match, no entitlement), AVAudioSession route containing bluetoothA2DP / carAudio, and CMMotionActivityManager reporting automotive at non-low confidence. Any one fires, PickView swaps for DrivingPickView: pure-black, 54pt restaurant name, 140pt buttons, voice-only input (SFSpeechRecognizer), auto-narrated friend line over CarPlay audio (AVSpeechSynthesizer with .duckOthers). Bluetooth headphones on a walk → "Use normal view" pill in the top-right escapes back.

Enter password

B · TRAVEL MODE

4 modes, real ETAs from Apple Maps.

Walk / Bus & Subway / Bike / Drive. The selected tab decides which transport type goes to MKDirections (.walking / .transit / .automobile) — real Apple Maps ETAs, free of API cost. Bike has no native Apple API, so falls back to a tuned speed-multiplier estimate. The same selection reshapes the candidate-pool filter: a 100-minute walk preference becomes a 100-minute drive radius when the user picks Drive, so a place 2 hours' walk away (30 min by car) enters the queue. The same Wider preference means different things to different users — that's the point.

Bitez Settings extras — Pet-friendly preferred, Easy parking matters, Skip AI summaries, Try AI again now, Force driving view diagnostic

Enter password

C · ENVIRONMENT & CONTROL

Pet · parking · AI on/off · force-on driving view.

Environment toggles bias scoring without burning extra API calls (Google returns the same pool either way — the recommender just lifts pet-friendly and easy-parking places higher when these are on). The Apple Intelligence block lets the user kill AI summaries with one tap if iOS is acting up, AND retry mid-session — auto-disable was removed because a single transient timeout used to lose AI for hours. Bottom: Force driving view in Diagnostics, persisting across launches so the in-car layout can be demoed without an actual car. Every control the user might need is a single toggle deep.

Three context layers, one principle: read the user's situation, reshape to match it, never the reverse. The in-car shell reads the car. The travel mode picker reads the user's intent. The environment toggles read declared preference. Each one removes an instance of "you'd think this app cared, but it doesn't" — the kind of small misalignment that adds up to the user closing the tab.

10 / APP WALKTHROUGH

The actual product, the screens, the live App Store build.

Everything above explains why Bitez is built the way it is. What follows is what it actually looks like — the real screens, the interaction patterns, and how to try it yourself. The app is live on the App Store — download it and try the real thing.

Bitez welcome screen — I'll pick. You eat.

Bitez onboarding step 3 — cuisine preferences with Surprise me button

Bitez calibration screen showing live Google data

Bitez pick view with budget honesty fact

Try Bitez on your iPhone
Live on the App Store · iOS 17+ · free download.

Download on App Store ↗

Detailed walkthrough notes: the mood gate logic, the calibration animation with live restaurant count, the reasoning chip behavior on a budget mismatch, the recently-presented penalty for variety, the offline / mock-data banner system, the Apple Maps fallback for food deserts, and the in-app implicit history learning loop.

11 / WHAT I LEARNED

Product AI isn't about adding more model. It's about deciding what the model isn't allowed to do.

Product AI isn't about adding more model. It's about deciding precisely where the model adds value the rest of the system can't. In Bitez the model parses messy human input into typed data and rewrites deterministic facts in human voice — both irreducibly language tasks. Everything else (which restaurant, which budget, which hours, which signals matter most tonight) is plain Swift the user can audit. That separation is what made the AI feel present without ever being in charge.

A single confident pick. Two AI moments — read intent, write warmth. Reasoning the user can audit. Infrastructure that doesn't cost more than it earns. That's the product. The case study is a tour of the design decisions holding those four things together.

Bitez · on-device AI for what to eat.

"Just tell me where to eat" is a decision request — not a discovery request.

50 options ≠ help.

Recommendations need receipts.

Mood + weather + budget + history all matter.

The model translates. Code decides. Both layers are visible to the user.

Reads intent. Writes warmth.

Picks the place. Justifies the pick.

Both AI touchpoints, captured in the live app.

Free-text dish intent → structured cuisine, mood, and flags.

Sourced reasoning facts → one warm sentence the user actually wants to read.

Bitez gets smarter every visit — and the data never leaves your phone.

Onboarding declaration.

Local on-device counts.

Where the user sees it.

Where the design discipline shows up.

A user looking for dinner and a user looking for a cafe are different products. The toggle reflects that.

"I want a meal."

"I want to sit and stay."

A design system lifted from the code — not drawn beside it.

Raw values

Meaning + modes

Rhythm

Assembly

When the user moves but the recommendation doesn't.

Same pick. Visibly defended on five fronts, in one screen.

On-device AI is free. The ground truth it anchors to isn't.

Multi-pool cache.

Local "open now" math.

Weather-triggered refetch.

Every beta build was paired to a specific tester observation.

The Eat / Cafe boundary was a scoring suggestion, not a structural rule.

The mood gate went from a 4-card menu to a 2-card + free-text input.

The product reshapes around you — never the reverse.

CarPlay or Bluetooth audio paired → entire UI swaps.

4 modes, real ETAs from Apple Maps.

Pet · parking · AI on/off · force-on driving view.

The actual product, the screens, the live App Store build.

Live on the App Store.

Product AI isn't about adding more model. It's about deciding what the model isn't allowed to do.