Today Journal
I wanted to journal more. But writing feels like homework. Voice memos feel like talking to myself. What I actually wanted was to call a friend who'd just... listen.
The problem with AI voice assistants: they don't remember you. Every conversation starts fresh. No context. No continuity. No sense that someone actually knows what's going on in your life.
"How was your day?" — Every. Single. Time.
What if you could just call someone who actually remembered you? Not a therapist analyzing your feelings. Not an assistant waiting for commands. A friend who picks up where you left off.
"Hey Sarah! Did that presentation end up going okay?" instead of "Hello, how can I help you today?"
A phone number you can call to talk through your day. It remembers who you are, what you mentioned last time, and what's going on in your life.
+1 (659) 222-5033
Call to try it yourself
The hard part wasn't the conversation. It was the memory.
First version took three-plus seconds to respond. Unacceptable for a phone call. I needed memory that was both fast AND deep. The solution: three different types of memory working together, each optimized for a different job.
Two systems power the live experience:
- Quick Profile (Redis) — The notecard. Name, key facts, recent context. Loads in 5ms so the AI knows who's calling before they finish their first sentence.
- Deep Memory (Mem0) — The understanding. Mem0's AI watches every conversation and extracts what it thinks matters: patterns, recurring themes, personality traits. It doesn't store everything - it curates.
One system archives the truth:
- Transcript Archive (Pinecone) — The tape recorder. Exact words, verbatim, searchable. Because Mem0 curates, you need somewhere that keeps everything. When you need to know exactly what someone said about X, this is where you look.
The speed trick: Send the greeting before any database calls. User hears "Hey! Friday. Made it through the week?" within 500ms. By the time they respond, their context is ready.
Response time is UX. Three seconds of silence on a phone call feels like an eternity. I shaved 60-100ms per request just by reusing HTTP connections instead of creating new ones. Small things compound.
Prompting for personality is harder than prompting for accuracy. My first prompt was too formulaic: validate-then-ask, validate-then-ask. Real friends don't follow a script. They have actual reactions. "Wait, what?!" not "I understand that must be difficult."
I built an evaluation framework with eight different test personas, from "Quiet Sam" who gives one-word answers to "Philosophical Morgan" who wants deep conversations. Used LLM-as-judge to score conversations on openness, emotional depth, and flow. Iterated through three prompt versions before it felt natural.
Context continuity matters more than perfect recall. If you call twice in one day, it should feel like one conversation, not two. Detecting same-day calls and appending to the transcript made a bigger difference than fancier memory retrieval.
What I'd do differently: Start with the persona testing from day one. I spent too long tuning the prompt by feel before building the evaluation system. Also: Gemini Flash 2.5 as the response model was the right call for speed, but I'd use a bigger model for the post-call extraction. Quality of what gets remembered matters.
- 15 different greetings based on time and day ("Monday, huh?" vs "Late night. Everything okay?")
- 4000-word limit on stored profiles to keep responses fast
- Security check blocks attempts to ask about "other users" or extract system info
- Filler responses like "Let me think..." buy time for background memory searches
- The prompt explicitly says "Don't use therapy language" because early versions kept saying "I hear you saying..."