[002]

Today Journal


I wanted to journal more. But writing feels like homework. Voice memos feel like talking to myself. What I actually wanted was to call a friend who'd just... listen.

The problem with AI voice assistants: they don't remember you. Every conversation starts fresh. No context. No continuity. No sense that someone actually knows what's going on in your life.

"How was your day?" — Every. Single. Time.


What if you could just call someone who actually remembered you? Not a therapist analyzing your feelings. Not an assistant waiting for commands. A friend who picks up where you left off.

"Hey Sarah! Did that presentation end up going okay?" instead of "Hello, how can I help you today?"


A phone number you can call to talk through your day. It remembers who you are, what you mentioned last time, and what's going on in your life.

+1 (659) 222-5033

Call to try it yourself

The hard part wasn't the conversation. It was the memory.

First version took three-plus seconds to respond. Unacceptable for a phone call. I needed memory that was both fast AND deep. The solution: three different types of memory working together, each optimized for a different job.

CALL FLOW Phone Rings Instant Greeting <500ms No DB wait Load Context (background) Conversation Gemini Flash 2.5 Call Ends Extract & Save LLM extraction MEMORY SYSTEM LIVE — Powers the conversation Profile Redis 5ms Name, location, key facts Recent call summaries Always loaded first Understanding Mem0 150ms Patterns & themes AI-curated insights Doesn't store everything Profile for speed · Understanding for depth Both read on call start, both updated after ARCHIVE — Keeps the truth Transcripts Pinecone 50ms Exact words, verbatim, with embeddings Because Mem0 curates, you need somewhere that keeps everything read write all MID-CALL RECALL "Remember when I..." 1. Send filler: "Let me think..." 2. Background search kicks off 3. Query Understanding (Mem0) 4. Query Transcripts (Pinecone) 5. Combine results 6. Continue with full context POST-CALL EXTRACTION LLM analyzes full transcript: • Key facts mentioned • Open threads to follow up • Mood and emotional state • Updates profile + feeds Mem0 Profile loads in 5ms · Greeting before any DB call · Full context ready by second sentence · All three systems updated after every call

Two systems power the live experience:

  • Quick Profile (Redis) — The notecard. Name, key facts, recent context. Loads in 5ms so the AI knows who's calling before they finish their first sentence.
  • Deep Memory (Mem0) — The understanding. Mem0's AI watches every conversation and extracts what it thinks matters: patterns, recurring themes, personality traits. It doesn't store everything - it curates.

One system archives the truth:

  • Transcript Archive (Pinecone) — The tape recorder. Exact words, verbatim, searchable. Because Mem0 curates, you need somewhere that keeps everything. When you need to know exactly what someone said about X, this is where you look.

The speed trick: Send the greeting before any database calls. User hears "Hey! Friday. Made it through the week?" within 500ms. By the time they respond, their context is ready.


Response time is UX. Three seconds of silence on a phone call feels like an eternity. I shaved 60-100ms per request just by reusing HTTP connections instead of creating new ones. Small things compound.

Prompting for personality is harder than prompting for accuracy. My first prompt was too formulaic: validate-then-ask, validate-then-ask. Real friends don't follow a script. They have actual reactions. "Wait, what?!" not "I understand that must be difficult."

I built an evaluation framework with eight different test personas, from "Quiet Sam" who gives one-word answers to "Philosophical Morgan" who wants deep conversations. Used LLM-as-judge to score conversations on openness, emotional depth, and flow. Iterated through three prompt versions before it felt natural.

Context continuity matters more than perfect recall. If you call twice in one day, it should feel like one conversation, not two. Detecting same-day calls and appending to the transcript made a bigger difference than fancier memory retrieval.

What I'd do differently: Start with the persona testing from day one. I spent too long tuning the prompt by feel before building the evaluation system. Also: Gemini Flash 2.5 as the response model was the right call for speed, but I'd use a bigger model for the post-call extraction. Quality of what gets remembered matters.


  • 15 different greetings based on time and day ("Monday, huh?" vs "Late night. Everything okay?")
  • 4000-word limit on stored profiles to keep responses fast
  • Security check blocks attempts to ask about "other users" or extract system info
  • Filler responses like "Let me think..." buy time for background memory searches
  • The prompt explicitly says "Don't use therapy language" because early versions kept saying "I hear you saying..."

Voice & Phone
Cartesia
Real-time voice synthesis, phone line integration
LLM
Gemini Flash 2.5
Fast responses for real-time conversation
Profile Storage
Upstash Redis
5ms reads with edge caching
AI Memory
Mem0
Pattern extraction, long-term understanding
Vector Search
Pinecone
Transcript embeddings, semantic search
Embeddings
OpenAI
text-embedding-3-small for Pinecone