<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://jyc11.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jyc11.github.io/" rel="alternate" type="text/html" /><updated>2026-03-22T16:05:00+00:00</updated><id>https://jyc11.github.io/feed.xml</id><title type="html">Smart and Witty Blog Title</title><subtitle>A blog where I post stuff</subtitle><author><name>Jaeyoon Cho</name></author><entry><title type="html">Current LLM Workflow Setup</title><link href="https://jyc11.github.io/blog/2026/03/23/current-LLM-workflow-setup" rel="alternate" type="text/html" title="Current LLM Workflow Setup" /><published>2026-03-23T00:00:00+00:00</published><updated>2026-03-23T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/03/23/current-LLM-workflow-setup</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/03/23/current-LLM-workflow-setup"><![CDATA[<p>An updated snapshot of my LLM-assisted development setup, as of March 2026. The <a href="/blog/2026/02/26/current-LLM-workflow-setup.html">previous snapshot</a> was from late February. A lot has changed — new tools, more skills, a proper planning pipeline, and significantly expanded permissions. Same deal as before: Claude examined its own configuration and I peppered in my commentary.</p>

<hr />

<h2 id="environment">Environment</h2>

<table>
  <thead>
    <tr>
      <th> </th>
      <th> </th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Terminal</strong></td>
      <td>Ghostty</td>
    </tr>
    <tr>
      <td><strong>IDE</strong></td>
      <td>IntelliJ (Rust development), Zed (blogging)</td>
    </tr>
    <tr>
      <td><strong>LLM Tool</strong></td>
      <td>Claude Code CLI (Claude Opus, 1M context)</td>
    </tr>
    <tr>
      <td><strong>Task Management</strong></td>
      <td><a href="https://github.com/JYC11/filament">Filament</a> (replaced beads_rust)</td>
    </tr>
    <tr>
      <td><strong>Code Generation</strong></td>
      <td><a href="https://github.com/JYC11/jujo">Jujo</a></td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p><strong>What changed:</strong> Upgraded from beads_rust to filament for task tracking. Filament adds a knowledge graph, inter-agent messaging, and a TUI — all in one Rust binary. Added jujo for deterministic code generation from templates. The 1M context window is a game changer — no more context exhaustion mid-session.</p>
</blockquote>

<hr />

<h2 id="claude-code-configuration">Claude Code Configuration</h2>

<h3 id="plugins">Plugins</h3>

<table>
  <thead>
    <tr>
      <th>Plugin</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>rust-skills</strong></td>
      <td>Rust-specific guidance — ownership, concurrency, error handling, domain patterns, crate research, daily news</td>
    </tr>
    <tr>
      <td><strong>rust-analyzer-lsp</strong></td>
      <td>LSP integration for go-to-definition, find references, symbol analysis</td>
    </tr>
    <tr>
      <td><strong>code-review</strong></td>
      <td>Structured PR code review</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p><strong>What changed:</strong> Added the code-review plugin since February.</p>
</blockquote>

<h3 id="skills-20-installed">Skills (20 installed)</h3>

<p>Custom skills loaded from <code class="language-plaintext highlighter-rouge">~/.claude/skills/</code>:</p>

<table>
  <thead>
    <tr>
      <th>Skill</th>
      <th>Source</th>
      <th>What it does</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>project-context</strong></td>
      <td>local</td>
      <td>Reads CLAUDE.md files for onboarding; updates them after changes</td>
    </tr>
    <tr>
      <td><strong>skill-creator</strong></td>
      <td>local</td>
      <td>Guide for writing effective Claude Code skills</td>
    </tr>
    <tr>
      <td><strong>filament</strong></td>
      <td>local</td>
      <td>Task lifecycle, knowledge graph, lesson capture via <code class="language-plaintext highlighter-rouge">fl</code> CLI</td>
    </tr>
    <tr>
      <td><strong>jujo</strong></td>
      <td>local</td>
      <td>Code generation from Tera templates via <code class="language-plaintext highlighter-rouge">jujo</code> CLI</td>
    </tr>
    <tr>
      <td><strong>pattern-analyzer</strong></td>
      <td>local</td>
      <td>Analyze codebase patterns → generate jujo templates</td>
    </tr>
    <tr>
      <td><strong>research</strong></td>
      <td>local</td>
      <td>GitHub repo exploration and web article fetching via Go CLI</td>
    </tr>
    <tr>
      <td><strong>handoff</strong></td>
      <td>local</td>
      <td>Structured session handoff summaries</td>
    </tr>
    <tr>
      <td><strong>cleanup</strong></td>
      <td>local</td>
      <td>Scan and remove stale files across /tmp, ~/.claude, project dirs</td>
    </tr>
    <tr>
      <td><strong>datastar</strong></td>
      <td>local</td>
      <td>Datastar hypermedia framework patterns</td>
    </tr>
    <tr>
      <td><strong>spec-driven-dev</strong></td>
      <td>local</td>
      <td>Three-phase workflow: Research → Plan → Implement with human checkpoints</td>
    </tr>
    <tr>
      <td><strong>grill-me</strong></td>
      <td>matt</td>
      <td>Interview user relentlessly about a plan until shared understanding</td>
    </tr>
    <tr>
      <td><strong>write-a-prd</strong></td>
      <td>matt</td>
      <td>Create PRD through user interview, submit as GitHub issue</td>
    </tr>
    <tr>
      <td><strong>prd-to-plan</strong></td>
      <td>matt</td>
      <td>Turn PRD into multi-phase implementation plan with tracer bullets</td>
    </tr>
    <tr>
      <td><strong>prd-to-issues</strong></td>
      <td>matt</td>
      <td>Break PRD into independently-grabbable GitHub issues</td>
    </tr>
    <tr>
      <td><strong>triage-issue</strong></td>
      <td>matt</td>
      <td>Triage bugs: search filament lessons → investigate → fix plan with TDD</td>
    </tr>
    <tr>
      <td><strong>review</strong></td>
      <td>gstack</td>
      <td>Pre-landing PR review against project checklist</td>
    </tr>
    <tr>
      <td><strong>code-eng-review</strong></td>
      <td>gstack</td>
      <td>Eng manager-mode review of implemented code</td>
    </tr>
    <tr>
      <td><strong>plan-eng-review</strong></td>
      <td>gstack</td>
      <td>Eng manager-mode plan review with architecture focus</td>
    </tr>
    <tr>
      <td><strong>retro</strong></td>
      <td>gstack</td>
      <td>Engineering retrospective with trend tracking</td>
    </tr>
    <tr>
      <td><strong>library</strong></td>
      <td>fork</td>
      <td>Private skill distribution via YAML catalog + git sync</td>
    </tr>
  </tbody>
</table>

<p>Sources: local = original, matt = <a href="https://github.com/mattpocock/skills">mattpocock/skills</a>, gstack = <a href="https://github.com/garrytan/gstack">garrytan/gstack</a>, fork = <a href="https://github.com/disler/the-library">disler/the-library</a></p>

<blockquote>
  <p><strong>What changed:</strong> Went from 4 skills to 20. The biggest additions are the planning pipeline (write-a-prd → prd-to-plan → prd-to-issues), the spec-driven-dev meta-workflow, and the library catalog for managing skills across devices. All skills now have filament integration for task tracking and lesson capture. Removed br and bd-to-br-migration skills.</p>
</blockquote>

<h3 id="hooks">Hooks</h3>

<table>
  <thead>
    <tr>
      <th>Event</th>
      <th>Hook</th>
      <th>What it does</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>UserPromptSubmit</td>
      <td><code class="language-plaintext highlighter-rouge">log-prompt.sh</code></td>
      <td>Captures every user prompt to a daily session log file. Now works across multiple projects</td>
    </tr>
    <tr>
      <td>PostToolUse (Bash)</td>
      <td>cargo check after build</td>
      <td>Runs <code class="language-plaintext highlighter-rouge">cargo check</code> after any cargo/make command to catch compile errors immediately</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p><strong>What changed:</strong> The prompt logger now works across multiple projects (previously Koupang-only). Added the PostToolUse hook for Koupang that runs cargo check after build commands — catches compilation errors before I even look at the output.</p>
</blockquote>

<h3 id="permissions">Permissions</h3>

<p>The permissions list has grown significantly. The philosophy is: anything that only reads or only modifies local project files should be auto-allowed.</p>

<p><strong>Explicitly allowed (no confirmation needed):</strong></p>

<ul>
  <li><strong>Read-only shell</strong>: <code class="language-plaintext highlighter-rouge">ls</code>, <code class="language-plaintext highlighter-rouge">cat</code>, <code class="language-plaintext highlighter-rouge">find</code>, <code class="language-plaintext highlighter-rouge">grep</code>, <code class="language-plaintext highlighter-rouge">rg</code>, <code class="language-plaintext highlighter-rouge">tree</code>, <code class="language-plaintext highlighter-rouge">stat</code>, <code class="language-plaintext highlighter-rouge">which</code>, <code class="language-plaintext highlighter-rouge">file</code>, <code class="language-plaintext highlighter-rouge">wc</code>, <code class="language-plaintext highlighter-rouge">sort</code>, <code class="language-plaintext highlighter-rouge">uniq</code>, <code class="language-plaintext highlighter-rouge">diff</code>, <code class="language-plaintext highlighter-rouge">basename</code>, <code class="language-plaintext highlighter-rouge">dirname</code>, <code class="language-plaintext highlighter-rouge">realpath</code>, <code class="language-plaintext highlighter-rouge">jq</code>, <code class="language-plaintext highlighter-rouge">cut</code>, <code class="language-plaintext highlighter-rouge">tr</code>, <code class="language-plaintext highlighter-rouge">awk</code>, <code class="language-plaintext highlighter-rouge">sed</code>, <code class="language-plaintext highlighter-rouge">xargs</code>, <code class="language-plaintext highlighter-rouge">tee</code></li>
  <li><strong>Git read</strong>: <code class="language-plaintext highlighter-rouge">status</code>, <code class="language-plaintext highlighter-rouge">log</code>, <code class="language-plaintext highlighter-rouge">diff</code>, <code class="language-plaintext highlighter-rouge">branch</code>, <code class="language-plaintext highlighter-rouge">show</code>, <code class="language-plaintext highlighter-rouge">tag</code>, <code class="language-plaintext highlighter-rouge">remote</code>, <code class="language-plaintext highlighter-rouge">stash</code>, <code class="language-plaintext highlighter-rouge">blame</code>, <code class="language-plaintext highlighter-rouge">rev-parse</code></li>
  <li><strong>Git write</strong>: <code class="language-plaintext highlighter-rouge">add</code>, <code class="language-plaintext highlighter-rouge">commit</code>, <code class="language-plaintext highlighter-rouge">checkout</code>, <code class="language-plaintext highlighter-rouge">switch</code>, <code class="language-plaintext highlighter-rouge">merge</code>, <code class="language-plaintext highlighter-rouge">rebase</code>, <code class="language-plaintext highlighter-rouge">fetch</code>, <code class="language-plaintext highlighter-rouge">pull</code>, <code class="language-plaintext highlighter-rouge">push</code>, <code class="language-plaintext highlighter-rouge">cherry-pick</code></li>
  <li><strong>Cargo</strong>: <code class="language-plaintext highlighter-rouge">check</code>, <code class="language-plaintext highlighter-rouge">build</code>, <code class="language-plaintext highlighter-rouge">test</code>, <code class="language-plaintext highlighter-rouge">clippy</code>, <code class="language-plaintext highlighter-rouge">fmt</code>, <code class="language-plaintext highlighter-rouge">run</code>, <code class="language-plaintext highlighter-rouge">add</code>, <code class="language-plaintext highlighter-rouge">tree</code>, <code class="language-plaintext highlighter-rouge">doc</code>, <code class="language-plaintext highlighter-rouge">metadata</code>, <code class="language-plaintext highlighter-rouge">install</code>, <code class="language-plaintext highlighter-rouge">update</code>, <code class="language-plaintext highlighter-rouge">clean</code>, <code class="language-plaintext highlighter-rouge">bench</code>, <code class="language-plaintext highlighter-rouge">fix</code></li>
  <li><strong>Docker</strong>: <code class="language-plaintext highlighter-rouge">compose up/down/ps/logs/build/exec</code>, <code class="language-plaintext highlighter-rouge">images</code>, <code class="language-plaintext highlighter-rouge">logs</code>, <code class="language-plaintext highlighter-rouge">build</code>, <code class="language-plaintext highlighter-rouge">ps</code></li>
  <li><strong>Make</strong>: all <code class="language-plaintext highlighter-rouge">make</code> targets</li>
  <li><strong>Custom CLIs</strong>: <code class="language-plaintext highlighter-rouge">fl</code> (filament), <code class="language-plaintext highlighter-rouge">jujo</code>, <code class="language-plaintext highlighter-rouge">sqlx</code>, <code class="language-plaintext highlighter-rouge">rustup</code>, <code class="language-plaintext highlighter-rouge">research</code> (Go CLI)</li>
  <li><strong>File ops</strong>: <code class="language-plaintext highlighter-rouge">mkdir</code>, <code class="language-plaintext highlighter-rouge">touch</code>, <code class="language-plaintext highlighter-rouge">cp</code>, <code class="language-plaintext highlighter-rouge">mv</code></li>
  <li><strong>Shell</strong>: <code class="language-plaintext highlighter-rouge">printf</code>, <code class="language-plaintext highlighter-rouge">read</code>, <code class="language-plaintext highlighter-rouge">echo</code>, <code class="language-plaintext highlighter-rouge">whoami</code>, <code class="language-plaintext highlighter-rouge">id</code>, <code class="language-plaintext highlighter-rouge">env</code>, <code class="language-plaintext highlighter-rouge">date</code>, <code class="language-plaintext highlighter-rouge">uname</code></li>
</ul>

<p><strong>Explicitly denied:</strong></p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">rm</code>, <code class="language-plaintext highlighter-rouge">sudo</code>, <code class="language-plaintext highlighter-rouge">curl</code>, <code class="language-plaintext highlighter-rouge">wget</code>, <code class="language-plaintext highlighter-rouge">chmod</code>, <code class="language-plaintext highlighter-rouge">chown</code>, <code class="language-plaintext highlighter-rouge">kill</code>, <code class="language-plaintext highlighter-rouge">killall</code>, <code class="language-plaintext highlighter-rouge">pkill</code>, <code class="language-plaintext highlighter-rouge">dd</code>, <code class="language-plaintext highlighter-rouge">mkfs</code></li>
  <li><code class="language-plaintext highlighter-rouge">git push --force</code>, <code class="language-plaintext highlighter-rouge">git reset --hard</code>, <code class="language-plaintext highlighter-rouge">git clean -f</code></li>
  <li><code class="language-plaintext highlighter-rouge">docker rm</code>, <code class="language-plaintext highlighter-rouge">docker rmi</code></li>
  <li><code class="language-plaintext highlighter-rouge">WebSearch</code>, <code class="language-plaintext highlighter-rouge">WebFetch</code></li>
  <li>Output redirection (<code class="language-plaintext highlighter-rouge">&gt; *</code>)</li>
</ul>

<blockquote>
  <p><strong>What changed:</strong> Massively expanded from ~25 allowed commands to ~80+. Added all the text processing tools (jq, awk, sed, etc.), file operations (mkdir, cp, mv), more git commands (cherry-pick, blame), the full cargo suite, and custom CLIs (fl, jujo, research). The goal was to reduce the number of “approve this?” prompts to near-zero for normal development work. It mostly worked — I rarely see permission prompts now unless Claude is doing something genuinely unusual.</p>
</blockquote>

<h3 id="claudemd-files">CLAUDE.md Files</h3>

<p>Same hierarchical structure as before, but now with more content:</p>

<ul>
  <li><strong>Root</strong> (<code class="language-plaintext highlighter-rouge">koupang/CLAUDE.md</code>) — workspace structure, tech stack, ADR summary, key imports, scripts</li>
  <li><strong>STYLE.md</strong> — coding style guide adopted from TigerBeetle’s TIGER_STYLE, customized for this project. Covers: data-oriented programming, value objects, assertions, error handling, naming, function size</li>
  <li><strong>Per-service</strong> (<code class="language-plaintext highlighter-rouge">identity/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">catalog/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">shared/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">order/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">payment/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">cart/CLAUDE.md</code>) — detailed architecture, endpoints, domain models, test structure</li>
  <li><strong>Reference docs</strong> (<code class="language-plaintext highlighter-rouge">.plan/</code>) — detailed implementation plans, test standards</li>
</ul>

<blockquote>
  <p><strong>What changed:</strong> Added STYLE.md which is now the single source of truth for coding conventions. Added CLAUDE.md files for order, payment, and cart services. The STYLE.md adoption was a turning point — it gives Claude a concrete reference for what “good code” looks like rather than relying on vibes.</p>
</blockquote>

<hr />

<h2 id="development-cycle">Development Cycle</h2>

<h3 id="for-new-features-spec-driven-dev-workflow">For new features (spec-driven-dev workflow)</h3>

<ol>
  <li><strong>Research</strong> — <code class="language-plaintext highlighter-rouge">/spec-driven-dev</code> triggers filament lesson search for prior knowledge, then codebase exploration</li>
  <li><strong>Plan</strong> — <code class="language-plaintext highlighter-rouge">/grill-me</code> to stress-test the design, then <code class="language-plaintext highlighter-rouge">/plan-eng-review</code> for architecture review</li>
  <li><strong>Create tasks</strong> — break plan into filament tasks with dependency chains</li>
  <li><strong>Implement</strong> — work through tasks with <code class="language-plaintext highlighter-rouge">fl task ready</code> to find unblocked work</li>
  <li><strong>Review</strong> — <code class="language-plaintext highlighter-rouge">/code-eng-review</code> for structured code review against STYLE.md</li>
  <li><strong>Capture lessons</strong> — <code class="language-plaintext highlighter-rouge">fl lesson add</code> for gotchas and patterns discovered</li>
</ol>

<h3 id="for-boilerplatescaffolding">For boilerplate/scaffolding</h3>

<ol>
  <li><strong>Analyze patterns</strong> — <code class="language-plaintext highlighter-rouge">/pattern-analyzer</code> to find repeated structures in the codebase (done once or when patterns need updates)</li>
  <li><strong>Generate templates</strong> — export to jujo generator with <code class="language-plaintext highlighter-rouge">jujo init</code> + template files</li>
  <li><strong>Stamp out code</strong> — <code class="language-plaintext highlighter-rouge">jujo generate</code> for deterministic scaffolding</li>
  <li><strong>Customize</strong> — Claude fills in AI customization markers for business logic</li>
</ol>

<h3 id="for-bug-fixes">For bug fixes</h3>

<ol>
  <li><strong>Triage</strong> — <code class="language-plaintext highlighter-rouge">/triage-issue</code> searches filament lessons first, then investigates</li>
  <li><strong>Fix</strong> — TDD approach with the fix</li>
  <li><strong>Capture</strong> — lesson recorded in filament for future reference</li>
</ol>

<h3 id="what-makes-this-work">What makes this work</h3>

<ul>
  <li><strong>STYLE.md</strong> gives Claude a concrete reference for code quality, not vibes</li>
  <li><strong>Filament</strong> provides persistent context across sessions — lessons, tasks, and knowledge graph survive session boundaries</li>
  <li><strong>Jujo</strong> eliminates token waste on boilerplate — deterministic code gen for repetitive patterns, Claude only handles the unique parts</li>
  <li><strong>The planning pipeline</strong> (PRD → plan → grill → review → implement) prevents wasted work on under-specified features</li>
  <li><strong>Expanded permissions</strong> make the flow nearly frictionless — I rarely see “approve?” prompts</li>
  <li><strong>1M context window</strong> means I can do planning + implementation + review in a single session without context exhaustion</li>
  <li><strong>The prompt logger</strong> captures everything for blog posts and retrospectives</li>
</ul>

<blockquote>
  <p><strong>My take:</strong> The February setup was functional but ad-hoc. The March setup feels like a proper workflow. The biggest wins were: (1) filament replacing beads with a knowledge graph that accumulates project wisdom across sessions, (2) STYLE.md giving Claude a codified standard to follow, and (3) the planning pipeline preventing the “just start coding” impulse that led to problems in the Filament sprint. I’m still not at the “fully autonomous multi-agent” level but I’m getting more comfortable delegating larger chunks of work to single sessions with good context. I also found that STYLE.md helped a lot in getting the agent to produce better code.</p>
</blockquote>

<h2 id="closing-thoughts">Closing Thoughts</h2>
<p>Since this is all setup locally and I will have to use work computers, I will need a handy way to package all this and export it to other computers. I’ll get around to doing that.</p>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[An updated snapshot of my LLM-assisted development setup, as of March 2026. The previous snapshot was from late February. A lot has changed — new tools, more skills, a proper planning pipeline, and significantly expanded permissions. Same deal as before: Claude examined its own configuration and I peppered in my commentary.]]></summary></entry><entry><title type="html">Getting Gud at LLMs Pt6</title><link href="https://jyc11.github.io/blog/2026/03/23/getting-gud-at-llms-pt6" rel="alternate" type="text/html" title="Getting Gud at LLMs Pt6" /><published>2026-03-23T00:00:00+00:00</published><updated>2026-03-23T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/03/23/getting-gud-at-llms-pt6</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/03/23/getting-gud-at-llms-pt6"><![CDATA[<p>In <a href="/blog/2026/03/04/getting-gud-at-llms-pt5.html">Part 5</a>, I went fast and broke things building Filament. This time, I bounced between 5 projects over 18 days — shipped Filament v1.0, built and shipped Jujo v1.0 from scratch, pushed Koupang’s order saga to completion, attempted a C++ to Rust port, and started learning Haskell. I also overhauled my entire skill and workflow setup. Here’s a snapshot of my current setup which is now quite evolved: <a href="/blog/2026/03/23/current-LLM-workflow-setup.html">Current LLM Workflow Setup (March 2026)</a>.</p>

<hr />

<h2 id="the-numbers">The Numbers</h2>

<ul>
  <li><strong>~35 sessions</strong>, <strong>~300+ prompts</strong> over ~18 days (Mar 4 – Mar 22)</li>
  <li><strong>5 projects</strong> touched: Filament, Koupang, Jujo, RLedger, Haskell learning</li>
  <li><strong>87 git commits</strong> across all projects (36 Koupang, 31 Filament, 20 Jujo)</li>
  <li>Filament: v1.0.0 released, 31 commits, extensive QA</li>
  <li>Koupang: full order saga implemented, STYLE.md adopted, 36 commits</li>
  <li>Jujo: built from scratch to v1.0.0 in ~3 days</li>
  <li>Haskell: levels 1–9 completed (exercises generate by Claude) and also did additional Exercism exercises</li>
</ul>

<hr />

<h2 id="filament">Filament</h2>

<p>Finishing this took longer than I thought. When I last left off, the TUI was nearly done but not quite and then I got distracted by my perfectionism. I did some extensive QA work which I thought was good to experience and managed to release v1.0.0. By the end of this 5 day sprint, I was absolutely exhausted because of how much focus I put in. It was a very different kind of tiredness which I am finding hard to describe with words. As I was in the frenzy of prompting, it felt extremely exhilarating to just Get Shit Done but my need for full knowledge and perfect verification also led me to do lots of reading of the code and constant re-examining of priorities. I was just “On” for so many hours of the day and obsessed with getting it done. I don’t particularly wanna get back into this state again.</p>

<h3 id="git-history-mar-422">Git History (Mar 4–22)</h3>

<p>The filament sessions weren’t captured by the prompt logger since it only ran in the Koupang directory at the time. But the git history tells the story — 31 commits from Mar 4 to Mar 22:</p>

<ul>
  <li>TUI enhancements: message detail pane, keyset pagination, reply-to-message, 95 TUI tests</li>
  <li>Pre-v1 code review: 18 fixes across all 4 crates, typed entity DTOs, Clearable<T> enum</T></li>
  <li>QA: 7 QA sessions (22/22 tests passing, 0 bugs), roleplay simulations for multi-agent coordination</li>
  <li>Bug fixes: flaky daemon test race condition, error exit codes, circular dependency prevention</li>
  <li>v1.0.0: CI/CD pipeline, curl install script, distributable skills</li>
</ul>

<blockquote>
  <p><strong>My take:</strong> The filament git log is dense. 31 commits in ~18 days but the actual work was front-loaded into maybe 5-6 days of focused sessions. The roleplay simulations (where Claude pretended to be multiple agents using filament concurrently) were genuinely quite useful and I think using LLMs for QA has a lot of potential.</p>
</blockquote>

<hr />

<h2 id="koupang">Koupang</h2>

<p>I went back to Koupang after some rest where I did nothing. Instead of going more headfirst super-fast like Filament, I decided to take things a bit slower and more deliberate. I went over the generated code and revised plans multiple times. I eventually iterated enough on the plans and read enough of the code that I got a good understanding of what is going on and what to do and began implementing. At this point, I also started doing LLM-assisted code reviews. I was still kinda recovering from the mad-sprint as I was doing this.</p>

<p>I think the code that was output was pretty decent and well tested. I learned a fair bit (mostly big picture) about outbox and kafka setup from this which was good. The very detailed implementation details elude me to be honest but considering these are kinda one-off things and not repetitive like basic CRUD work, I think it would take me far longer to internalize. The full order flow is now complete and I just need to fully verify it works as expected.</p>

<h3 id="kafka-consumerdlq-planning--outbox-review-mar-5">Kafka Consumer/DLQ Planning &amp; Outbox Review (Mar 5)</h3>

<p>Reviewed and critiqued the Kafka consumer/DLQ plan for edge cases and production quality. Then turned the same rigor on the existing outbox implementation. Created improvement plans for both. Also got book recommendations for practical Kafka knowledge.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's go with bd-3sv
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I would like for you to critique it more for edge cases and production quality
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>could you also critique the current outbox implementation with the same rigor (edge cases and production readiness)?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the improved kafka plan and this outbox improvement plan, we should track them in beads
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This was the deliberate pace I wanted after my Filament burnout. Plan critique → outbox critique → track in beads. No rushing to implement. I was still using beads at this point. Idk how other people go full yolo for long periods of time. I don’t have it in me.</p>
</blockquote>

<h3 id="migrate-from-beads-to-filament-mar-10">Migrate from Beads to Filament (Mar 10)</h3>

<p>Migrated all beads tasks and project documentation into filament. Verified migrated data, filled in service entities, and updated global skills to reference filament instead of br.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament I recently completed the filament project and I want to migrate the beads in /br into filament along with project documentation. I want to actively use this tool in this project.
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>each plan needs a doc so the minimal mvp-milestone.md could be filled in in the future so leave it. the other stuff I think we can leave for record purposes
otherwise, just do a double check of ALL the migrated data
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>update the project_context and handoff skills to reference filament instead of br. they are located in the global .claude directory
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> Eating my own dog food. I built filament for this exact purpose and it was satisfying to actually use it. The migration from beads was straightforward because I designed filament to handle the same concepts. Was slightly worried things may not work well but as I kept on using it, it wasn’t that bad.</p>
</blockquote>

<h3 id="kafka-implementation-sprint-mar-1314">Kafka Implementation Sprint (Mar 13–14)</h3>

<p>Added gstack review skills and made them generic. Switched Kafka to <code class="language-plaintext highlighter-rouge">apache/kafka-native</code> image for smaller footprint. Implemented Kafka event consumer with DLQ support. Did outbox improvements. Worked through shared infrastructure tasks (caching, tracing). Ended with a shared code cleanup sweep.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>before we go further, I want to add these skills: https://github.com/garrytan/gstack
only the ones relevant to us
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament we will continue with the kafka implementation as we have planned
I would like to switch the kafka image we are using with this: https://hub.docker.com/r/apache/kafka-native
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>can you explain how adding to the dlq results in a retry?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament I want to work on the next shared tasks before we move on to payment and orders
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament I know there are additional tasks but I want a cleanup sweep of the shared code that was implemented. add tests, fix bugs and do a code review
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> The Kafka DLQ discussion was educational — I genuinely didn’t understand the retry mechanism at first and Claude walked me through it clearly. I’m glad I asked instead of just accepting the code. The gstack skills turned out to be useful for structured code reviews. I particularly like the stop and offer user options style. It forces me to sit and think instead of just waiting for output.</p>
</blockquote>

<h3 id="the-big-koupang-day-mar-15">The Big Koupang Day (Mar 15)</h3>

<p>This was a marathon session. Refactored service structs from OO-style to free functions. Researched a friend’s payment project and Toss (fintech company) articles. Adopted TigerBeetle’s TIGER_STYLE.md, consolidated into a project STYLE.md. Created a cleanup skill. Did STYLE.md compliance refactoring across the codebase. Started order/payment service scaffolding with DOP business rules and property testing.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament before we continue, I want to do see if we can make the various "service" structs less OO. Right now, it's structs with DI and it's "fine" it works but if there is a way to avoid OO-ness then I would like to
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament before we go onto payments let's do some research. this project is being done by a friend and I think it contains some useful things we may not have considered
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want to adopt this: https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TIGER_STYLE.md where relevant and keep a version of this in our project repo. use the /research skill
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's make a stale-files-cleanup skill since I want to do this semi-frequently
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's start from the P1 tasks and work through them one by one
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>all the quick wins should be done, for outbox only do redis dedup for relay edge case (add test for this case) and others defer
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This was one of the most productive single days. The OO → free functions refactor was something I’d been thinking about and Claude handled it cleanly. Researching my friend’s payment project gave me ideas I wouldn’t have had otherwise. Adopting TIGER_STYLE was a pivotal decision. It gave me a concrete reference for what “good code” means in this project rather than vague directions from me. The STYLE.md compliance refactoring that followed was extensive but worth it. I’ll probably keep doing this for future projects.</p>
</blockquote>

<h3 id="integration-tests--saga-wiring-mar-16">Integration Tests &amp; Saga Wiring (Mar 16)</h3>

<p>Implemented integration tests and Kafka consumer wiring for the order/payment saga. Did a refactor where event handlers properly use PgConnection within transactions. Code review of the full saga. Documentation and filament knowledge graph audit.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament start the integration test tasks as well as the kafka consumer/producer tasks for the order/payment saga
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>do the refactor where the event handlers properly use pgconnection and wraps things within a transaction
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>shouldn't we be using process with retry? not sure why there are two try process once and process with retry and the conditions for using each.
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/code-eng-review of the saga that we implemented
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I would like to add a task for implementing a background jobs feature from the shared crate with persisted jobs
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I tried to understand what the hell was going on because this part was genuinely the most interesting part of the project so far. I really wanted to make this production quality. The engineering review skill paid off by catching issues I wouldn’t have noticed in a manual read-through. However, there could still be issues that I may not know about. But this is a learning project (and I know for a fact that many other “real” codebases don’t have “high” quality code due to how software projects are usually run: rushed, low resources and untested/unreviewed)</p>
</blockquote>

<h3 id="jujo-generators-in-koupang-mar-19">Jujo Generators in Koupang (Mar 19)</h3>

<p>Used the newly built jujo tool to analyze existing code patterns in Koupang and generate service scaffolding templates. Tested by generating a shipping service, verified the output, then removed it.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament /jujo /pattern-analyzer let's analyze the existing repeated code patterns and then add them to jujo
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>generate some for shipping like before and then remove them later after I finished reviewing
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>okay remove the generated shipping files
can you estimate token savings using this tool vs reading files + finding patterns + generating code?
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This was the payoff. Jujo generated a full service scaffold that would have taken Claude many thousands of tokens to analyze patterns and produce from scratch. I’m very satisfied with this. Adding more determinism for LLMs is the way to go. I would like to add more going forward.</p>
</blockquote>

<h3 id="docker-sandbox--permissions-research-mar-21">Docker Sandbox &amp; Permissions Research (Mar 21)</h3>

<p>Explored running Claude Code in isolated Docker sandbox mode. Researched vibebox and various sandbox approaches. Also expanded the allowed bash commands to reduce permission prompts.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>using this: https://docs.docker.com/ai/sandboxes/get-started/
I ran claude in docker sandbox mode but all of the local setup I have (skills, CLIs, global CLAUDE.md) don't carry over
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>can you also research this: https://github.com/robcholz/vibebox
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I still have to frequently allow safe bash commands to run which I want to remove
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oh then nevermind about this project, I will have to look for a new solution to run claude code dangerously skip permissions in an isolated way
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> The sandbox research was a dead end for now. None of the solutions carry over my skills, CLIs, and config cleanly. The permissions expansion was more immediately useful. I went from constantly approving safe commands to rarely seeing permission prompts. I may have to return to this at a later date and just build a solution for this on my own (isolated VM + export LLM configs). Trying to make a good harness for LLMs is proving to be an interesting challenge that keeps coming up. First Filament, then Jujo and now this.</p>
</blockquote>

<hr />

<h2 id="rledger">RLedger</h2>

<p>When I heard about the ledger-cli I was quite intrigued at its ability to handle various different kinds of “assets” in a single double-entry accounting format. I shelved this fascination and just kinda forgot about it. Then in the middle of Koupang I just decided I would try porting the code over from C++ to Rust and trying to understand how the ledger-cli worked. I got through 7/8 phases planned and just had the test cases to convert then I got distracted again and did not finish lol. I also barely read the code this time since I wanted to read the codebase in full when it was all done. I can return to this any time I want to finish it off and just read and learn. This was also an experiment to see how well Claude would do in porting over a codebase that I had no idea about. Not a super high priority but I will return to it. I decided to keep this only locally because I think doing a “heist” of an open source software by rewriting it in another language for no good reason is kinda a shitty move.</p>

<p>No session logs for this one. I did it without the prompt logger running. It was a spontaneous detour. The port seemed to go well but I’m not sure how “good” it is. This was also done in preparation to see if I could use Claude Code to port over Vinyl Cache to Rust cleanly.</p>

<hr />

<h2 id="jujo">Jujo</h2>

<p>I originally had an idea of wanting to make a Ruby-on-Rails like meta framework but in Rust with Sqlite and Datastar. It was to be a simple boilerplate generating SaaS template. I got this idea a while back but real life work and just the activation energy to trial-and-error my way across multiple unknown libraries that I was unfamiliar with after a long day at work just pushed it to the side. I eventually just decided to have a go at it. While in the research and ideation phase with Claude, I realised that the “code generation” part was the more interesting problem to solve. I noticed in Koupang that the LLM was spending a lot of tokens just reading and understanding the existing codebase for patterns and re-implementing them. This felt like a waste of time and tokens so I decided to extract this part out of the project and pursue it instead.</p>

<p>With my learnings from Filament and Koupang (both very large and ambitious projects), I decided to keep this short and very simple. Do one thing very well. So I planned and researched with Claude and got it done in 1 long session (thanks to the new 1M token context window update) and then used a few shorter sessions afterwards to polish and release it. I am happy with my restraint and the fact I got it done quite fast without burning out again. I did also not pay a lot of attention to the code because I cared more about getting it done and because it was quite simple.</p>

<h3 id="ideation--naming-mar-17-evening">Ideation &amp; Naming (Mar 17 evening)</h3>

<p>Spitballed the code generation idea. Explored Korean words for “stamp” as potential names. Tried “dojang”, “gakin”, and others before settling on “jujo” (Korean for casting/mold).</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament I was thinking how to make working with LLMs more efficient
I already realized that consistent patterns across the codebase is very important
Why not make this cli codegen tool generalizable?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's call it `dojang` which is korean for stamp but romanized
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's get synonyms for stamp first, get the romanized korean versions and check if they exist on crates
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do jujo, gakin could be pronounced as gay-kin which is wrong
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> The naming discussion was fun. I wanted a Korean word that non-Korean speakers could actually pronounce. “Jujo” worked out perfectly. It’s short, memorable, and the crate name was available.</p>
</blockquote>

<h3 id="planning--v10-in-one-long-session-mar-17-night">Planning → v1.0 in One Long Session (Mar 17 night)</h3>

<p>Used /spec-driven-dev and /grill-me for the planning phase. Explored Tera templating. Implemented all 5 phases (TOML parse + Tera render → field parsing → injection/dry-run → discovery commands → AI customization markers) in a single continuous session. Code review. Added formatter hooks. Multiple live demos in /tmp directories between phases.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/spec-driven-dev let's continue with the planning phase /grill-me as well
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for the templates itself, I was thinking of intellij style codegen. you can add templates which are actual code but with areas you can slot in variables
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I don't quite understand what Tera is and how it fits into all of this to be honest
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I would like to see phase 1 in action in a directory somewhere step by step
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>commit and move on, also clean up tmp directory of stuff
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/code-eng-review
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I would like a live demo of the formatter hook in action
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This was a nice test run of the new workflow (described further down the blog). Planning → implementation → demos → review → ship, all in one evening. The 1M context window made this possible. In earlier sessions I would have run out of context mid-implementation. I did live demos between each phase which gave me confidence the code actually worked before moving on.</p>
</blockquote>

<h3 id="polish--v10-release-mar-1819">Polish &amp; v1.0 Release (Mar 18–19)</h3>

<p>Added HTML/CSS language support. Ran QA with Claude across multiple languages. Added CI/CD, install/uninstall scripts, Makefile. Published v1.0.0 with curl install support. Registered skills in the library catalog.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament let's add html and css support then update the qa plan to include them
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/filament let's begin the qa
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's release v1 and also give an option to install with curl
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's install jujo from source and use it in other projects
but before we do, let's do a review of the skills we will be using and add it to the local /library
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> Keeping scope small paid off. Filament took ~5 days of intense work. Jujo took ~3 days of relaxed work. Both shipped v1.0. The difference was scope discipline. Jujo does one thing (code generation from templates) and does it well.</p>
</blockquote>

<hr />

<h2 id="meta-stuff">Meta stuff</h2>

<p>Across the 4–5 projects I worked on, I kept noticing friction points in my workflow. I wrote them down and then actually tackled most of them in a single marathon session:</p>

<ul>
  <li><strong>Done:</strong> formalized review and planning skills (grill-me, plan-eng-review, code-eng-review, spec-driven-dev)</li>
  <li><strong>Done:</strong> central skill library with catalog (forked disler/the-library)</li>
  <li><strong>Done:</strong> proper coding STYLE.md (adopted from TigerBeetle’s TIGER_STYLE)</li>
  <li><strong>Done:</strong> combined all skills into a spec-driven-dev workflow</li>
  <li><strong>Done:</strong> proper research skill using a Go CLI instead of random Python scripts</li>
  <li><strong>Done:</strong> expanded allowed bash commands to near-zero permission prompts</li>
  <li><strong>Partial:</strong> more rigorous testing (added property testing, mutation and fuzz testing still TODO)</li>
  <li><strong>Partial:</strong> .md file auditing (did cleanup passes but this is ongoing)</li>
  <li><strong>Dead end:</strong> running Claude Code in “dangerously skip permissions” mode in an isolated environment (Docker sandbox and vibebox both didn’t work the way I wanted)</li>
</ul>

<h3 id="skill-ecosystem-overhaul-mar-17">Skill Ecosystem Overhaul (Mar 17)</h3>

<p>Massive meta session. Researched mattpocock/skills (16 skills) and garrytan/gstack (15 skills), compared to my 18 existing skills. Installed 4 planning skills (grill-me, write-a-prd, prd-to-plan, prd-to-issues). Forked disler/the-library for private skill distribution. Created spec-driven-dev workflow (Research → Plan → Implement with human checkpoints). Wove filament into all skills. Installed and customized triage-issue. Removed br and bd-to-br-migration skills.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>research this https://github.com/mattpocock/skills for skills we can add
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>research this https://github.com/disler/the-library
research and think about how to implement this
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>this is from a video by a netflix engineer about using AI effectively in large codebases but I think it can be generalized to a meta skill that refers to multiple skills to define a workflow
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I would like to weave in filament into all this because filament has tasks, plans, lessons etc.
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I think we should install the bug fix skill and weave in filament into that as well. make sure it uses lessons
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This session transformed my setup. Going from 18 ad-hoc skills to 20 integrated skills with a library catalog, a proper planning pipeline (PRD → plan → issues → implement), and filament woven throughout was a step change (which I hope fixes the issue where Claude wasn’t using it as much as I wanted). The spec-driven-dev workflow in particular has become my default way to start any non-trivial feature. See the <a href="/blog/2026/03/23/current-LLM-workflow-setup.html">workflow snapshot</a> for the full current setup. I just came across these things from YT recommended videos. I didn’t really seek out these other skills. I also thought that maybe too many skills would become cumbersome but it was surprisingly easy to manage them.</p>
</blockquote>

<hr />

<h2 id="llms-for-learning">LLMs for learning</h2>

<p>In preparation for a new position (which required me to do a lot of paperwork too), I turned towards Claude Code to help me learn a new language for the position: Haskell. I decided I wanted to learn to code by hand and do it more slowly before I start to use LLMs for work (although how I plan to use it will be slightly more conservative). Having an endlessly patient tutor who will answer any dumb question I have is proving to be very useful. I strictly told it to not give me full answers and to give me only hints. It’s quite good at doing that but sometimes Haskell stumps me so much I just needle it until it gives me the semblance of an answer. I also used it to generate me some practice questions I could do.</p>

<h3 id="haskell-via-exercism-mar-2122">Haskell via Exercism (Mar 21–22)</h3>

<p>Worked through levels 5–9 of custom Haskell exercises. Topics: pattern matching on ADTs, type classes (Show, Functor, Describable), Maybe/Either error handling, State monad (stack calculator), Writer monad, IO (guessing game, address book, word counter). Also set up a “build-your-own-dkv” distributed key-value store project as another TDD learning exercise.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I am doing level 5 in exercises
I'm not sure how to get the required data from the shape for the area and perimeter functions
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for prettyPrint, I want to add addParens as a helper function inside prettyPrint itself. how do I do so?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I don't quite understand the state type
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for stackCalc, I don't understand where the operator is coming from
</code></pre></div>  </div>

  <p>For the build-your-own-dkv project:</p>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>this is a LEARNING project I want to emphasize that
you should heavily bias towards providing hints and resources I can read
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> The pattern of “I try → I get stuck → I ask for a hint → I try again → I ask for more help” works really well for learning. Haskell’s type system is genuinely hard and having a tutor that can look at my code and tell me exactly where my type error is without giving me the answer is invaluable. The State monad section was particularly painful but educational. I went from “what is this?” to implementing a stack calculator with it in one session. The previous sessions were not recorded but they followed the same format.</p>
</blockquote>

<p>I plan to use Claude Code for more learning projects in the future. CodeCrafters is cool but I think even if I pay for it, I will just forget to do it because of my propensity of getting distracted. So I kinda plan to use Claude Code like CodeCrafters as a learning tool. This time, I will code by hand as a learning experience but with a competent(?) tutor.</p>

<hr />

<h2 id="observations">Observations</h2>

<p>I think the meta-cognition/meta-observation of your own thought/work processes and vigilance of how the LLM works is the biggest takeaway. I often hear people complain about LLMs not doing things the right way and going off rails but I wonder how much they tailor their context and split up their tasks so that the LLM is most likely to output the “right” thing.</p>

<p>Considering that LLMs improve very rapidly, all of the stuff I am doing may be useless in a couple months time. Hell, I already found that Claude can work with worktrees already with subagents which is what Filament is supposed to help with using the file locking feature and the build coordinating I planned to implement. It also has more primitive aspects of Filament which are just various .md files but I can see Anthropic and other AI companies converging on some graph-like knowledge management structure. I also found <a href="https://github.com/dgraph-io/dgraph">dgraph</a> and <a href="https://github.com/abhigyanpatwari/GitNexus">GitNexus</a> which also do what Filament does but more specialized. In addition, all of the tips and tricks I learned were from when the context window was quite small. Now that the context window is 1M tokens, the precautions I have to take are lessened but not completely irrelevant. The experience of using Claude Code got easier because I don’t have to constantly close and reopen sessions.</p>

<hr />

<h2 id="going-forward">Going Forward</h2>

<p>If all goes well, I will be far too busy to continue a lot of side projects I have going on now. I will most likely continue Koupang to see how big the codebase can get and do the CodeCrafters thing on my own. I do have a fundamental unwillingness to completely let go and not understand what the code being produced is like for codebases I consider important to me and I do genuinely want to improve as a software engineer. The Filament experience was quite eye-opening in how draining it is to just full send which was a big inflection point for me. I could bounce between multiple projects (2 or 3) manually but I don’t think I’m comfortable going full Gas Town industrial code factory level.</p>

<p>But one thing I do want to share is tackling a refactor of a legacy codebase using Claude Code. There are so many legacy codebases in real life but I don’t really see actual examples of how it can be tackled with LLM tools so I think I will do that and share. So future posts are:</p>
<ul>
  <li>Overengineering Koupang for Fun and Profit pt{n} -&gt; Koupang obviously</li>
  <li>Just Refactor It Dude pt{n} -&gt; for refactoring legacy codebases</li>
  <li>smaller LLM posts (maybe I go off the deep end with Gas Town or figure out how to isolate LLMs better?)</li>
  <li>any miscellaneous thoughts I may have</li>
</ul>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[In Part 5, I went fast and broke things building Filament. This time, I bounced between 5 projects over 18 days — shipped Filament v1.0, built and shipped Jujo v1.0 from scratch, pushed Koupang’s order saga to completion, attempted a C++ to Rust port, and started learning Haskell. I also overhauled my entire skill and workflow setup. Here’s a snapshot of my current setup which is now quite evolved: Current LLM Workflow Setup (March 2026).]]></summary></entry><entry><title type="html">Getting Gud at LLMs Pt5</title><link href="https://jyc11.github.io/blog/2026/03/04/getting-gud-at-llms-pt5" rel="alternate" type="text/html" title="Getting Gud at LLMs Pt5" /><published>2026-03-04T00:00:00+00:00</published><updated>2026-03-04T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/03/04/getting-gud-at-llms-pt5</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/03/04/getting-gud-at-llms-pt5"><![CDATA[<p>In <a href="/blog/2026/02/26/getting-gud-at-llms-pt4.html">Part 4</a>, I finished planning for the big order orchestration saga and then started implementing. I am in the middle of implementing it but I decided to go on a little side quest. Things were getting complacent and I wanted to shake things up.</p>

<hr />

<h2 id="problem-and-solution">Problem and Solution</h2>

<p>I was already falling into established patterns because I was working like how I would work at a real serious full time job. Using LLMs to enhance my speed by outsourcing typing but not going all in. I found a pace I was comfortable in. But, I was definitely not satisfied. The fact that human intervention is needed or LLMs speed you up is not revolutionary. Plenty of other people are saying it too. Also, I wanted to challenge myself more with the single agent setup. So I changed 2 variables:</p>

<ol>
  <li>I sped up the pace of work. I was going reasonably slow-ish pace compared to other devs who embraced this fully so I have room to improve there.</li>
  <li>The amount of human review and intervention needed to be reduced. I still meticulously read the code generated in Koupang and directed refactoring efforts.</li>
</ol>

<p>I didn’t want to apply this to Koupang, so I decided on a new project. Tangent: <a href="https://en.wikipedia.org/wiki/Jevons_paradox">Jevons Paradox</a> “is said to occur when technological improvements that increase the efficiency of a resource’s use lead to a rise, rather than a fall, in total consumption of that resource”. This applies because now if I have an idea, I can just execute. I started the second project because now, the cost of starting is so low that I can afford to take a quick jaunt at something completely unrelated and come back to the original project without losing so much time.</p>

<p>As I developed Koupang, I felt that task management was solved with <a href="https://github.com/Dicklesworthstone/beads_rust">beads_rust</a> but other stuff like knowledge management was not so convenient or agent first. I looked into <a href="https://github.com/Dicklesworthstone#the-agentic-coding-flywheel">Flywheel</a> but I didn’t really want to install 10+ tools and learn all of them. So instead, I decided to build my own: <a href="https://github.com/JYC11/filament">Filament</a>. I wanted a single rust binary with all the tools I needed. I directly got Claude to research the tools and codebases of my inspiration and with some of my input, we began building.</p>

<h2 id="process-and-outcomes">Process and Outcomes</h2>

<p>I took an evening or so to plan and research. Then the next day, I started executing. I decided to give myself a day to get as much done as possible since I wanted to go back to Koupang ASAP. Thus, I went fast and reduced human intervention (the 2 variables I talked about above). Cracks immediately started to show. The cracks were:</p>

<ul>
  <li>programming shortcuts taken (eg: N+1 queries, LLM didn’t just write a new query to get things in batch)</li>
  <li>code quality issues (eg: god functions, circular dependencies)</li>
  <li>bugs (too numerous to count)</li>
</ul>

<p>There is quite a lot to cover so I got Claude to summarize the interactions I had below. But in the end, I mostly managed to finish all the features I wanted. I will need to do some extensive QA but after, I plan to implement this into my workflow afterwards. I am currently planning and executing an aggressive QA part of the development where I will try to break what I built using Claude as I write this blog post.</p>

<h2 id="the-numbers">The Numbers</h2>

<ul>
  <li><strong>40 sessions</strong>, <strong>227 prompts</strong> over ~1.5 days (Mar 2 evening → Mar 4 morning)</li>
  <li><strong>42 commits</strong>, <strong>~15,000 lines of Rust</strong> across 4 crates</li>
  <li><strong>235 tests</strong> (120 core + 58 CLI + 39 daemon + 10 MCP + 8 TUI)</li>
  <li><strong>20 ADRs</strong> documenting architecture decisions</li>
  <li><strong>5 phases completed</strong> (Core, CLI, Daemon+MCP, Agent Dispatching, TUI)</li>
  <li><strong>~6 code review sessions</strong>, <strong>2+ manual QA rounds</strong></li>
  <li>Multiple context window exhaustions (sessions continued from summaries)</li>
</ul>

<h2 id="prompt--progress-summaries-by-ai-with-my-takes-in-between-as-usual">Prompt &amp; Progress summaries by AI with my takes in between as usual</h2>

<h3 id="sessions-12-planning--architecture-decisions-mar-2-evening">Sessions 1–2: Planning &amp; Architecture Decisions (Mar 2 evening)</h3>

<p>Copied the Makefile and util-scripts from Koupang as a starting point. Wrote 6+ Architecture Decision Records. Key decisions made: messages are NOT graph nodes (separate inbox/outbox pattern), file reservations and agent runs also not graph nodes, single-binary architecture (installable via <code class="language-plaintext highlighter-rouge">curl</code> not just <code class="language-plaintext highlighter-rouge">cargo</code>), and per-project <code class="language-plaintext highlighter-rouge">.filament/</code> directory with local SQLite + Unix socket.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/project-context some chore work, let's copy over the makefile and util-scripts from koupang
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's record the current architecture decision records
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for inter-agent messaging are the messages stored as graphs as well? that doesn't seem to make sense to me
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>should we adopt an inbox/outbox type structure for messages?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do single-binary because I want it to be installable via curl as well, not just cargo
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I decided to do a significant amount of planning. What isn’t captured here is in other sessions with the LLM where I explicitly planned knowledge graph tools, multi agent managing with bash scripts, tool research/brainstorming and naming the project. They have been left out but yeah I did them.</p>
</blockquote>

<h3 id="sessions-34-phase-1--core-library-mar-3-morning">Sessions 3–4: Phase 1 — Core Library (Mar 3 morning)</h3>

<p>Researched beads_rust’s JSONL/flush design for comparison. Implemented the core library: models, errors, schema, store, graph, connection, protocol. Value objects (Priority, Weight, NonEmptyString) to make invalid states unrepresentable. Code review + test review against the test guide, marked Phase 1 complete.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>could you assume any reasons why beads_rust uses jsonl and a flush feature?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's start implementing
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>can you review the tests based on the test guide? also mark phase 1 as complete in the plans
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>look through the codebase and see if there are more opportunities to use value objects
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I mostly did directing as I developed this part. I wanted this part to be airtight because it is literally the core. I implemented the stuff I talked about in previous posts (ADTs, value objects, etc) for correctness and hoped that maybe Claude will follow the pattern as it continued the project (it didn’t lol). I still read the code Claude generated.</p>
</blockquote>

<h3 id="sessions-57-phase-1-polish--code-reviews-mar-3-12001330">Sessions 5–7: Phase 1 Polish &amp; Code Reviews (Mar 3 ~12:00–13:30)</h3>

<p>Two rounds of code review — found 3 bugs, made 4 improvements. Added ADR-018 for value types/newtypes. Created a gotchas document since the project was already accumulating pitfalls (sqlx custom newtypes, thiserror v2 <code class="language-plaintext highlighter-rouge">source</code> field behavior, petgraph 0.7 API changes).</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I don't think priority should be i32 because I don't want negative priority. Also, I want invalid states to be unrepresentable
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do a code review session for phase 1
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>what about the macros for the repetitive code?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do another code review over phase 1 just in case
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I was still quite slow during these sections. I wanted things to be airtight.</p>
</blockquote>

<h3 id="koupang-interlude-1-outbox--shared-module-code-review-mar-3-1330">Koupang Interlude 1: Outbox &amp; Shared Module Code Review (Mar 3 ~13:30)</h3>

<p>Switched back to Koupang briefly. Did a code review of the outbox implementation, fixed 8 issues. Then reviewed the shared module, found and fixed 9 more issues. Context window ran out mid-session.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do a code review of the current outbox implementation
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's fix all of the issues from 1 to 8
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do some more code review on the shared module, we can skip outbox since that was done.
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's fix them all 1 to 9
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> After realizing I could even offload the review work to Claude, I tried it with Koupang as a trial run to see how it was. I think I’m somewhat satisfied with the performance.</p>
</blockquote>

<h3 id="sessions-89-phase-2--cli-mar-3-13401400">Sessions 8–9: Phase 2 — CLI (Mar 3 ~13:40–14:00)</h3>

<p>Implemented all CLI commands: entity, task, relation, query, message, reserve. 27 integration tests. This was one of the fastest phases — two sessions, plan then execute.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do phase 2
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: Phase 2: CLI Implementation Plan...
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> Still reading the code and I still have a good idea of the codebase.</p>
</blockquote>

<h3 id="sessions-1011-phase-2-code-reviews--manual-qa-mar-3-14001500">Sessions 10–11: Phase 2 Code Reviews + Manual QA (Mar 3 ~14:00–15:00)</h3>

<p>Two code review rounds — fixed 5 bugs, 3 architecture improvements, 18 new tests. Created a manual QA skill in <code class="language-plaintext highlighter-rouge">.claude/skills/</code> for structured end-to-end testing. Ran 50 manual test cases. Decided on dual-track project management: keep <code class="language-plaintext highlighter-rouge">.md</code> files as committed source of truth AND use filament’s own knowledge graph for live tracking.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do a codereview. bugs, test coverage, test cases, architecture improvements should be the focus
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>before actually using filament for self-tracking, I want you to do some manual QA with some dummy information
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the manual qa should be a proper SKILL.md in .claude/skills
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the test results should have date-time in the file name as well in case of multiple tests within the same day
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I started off by being quite aggressive with reviews, fixing and refactoring. This changes as time passes.</p>
</blockquote>

<h3 id="sessions-1214-phase-3--daemon-mar-3-15001600">Sessions 12–14: Phase 3 — Daemon (Mar 3 ~15:00–16:00)</h3>

<p>Implemented Unix socket daemon with NDJSON protocol. 9 daemon integration tests. CLI now routes through daemon when running, falls back to direct DB access otherwise.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>final round of manual qa and code review before importing project docs, tasks, context into filament
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do phase 3
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: Phase 3: Daemon Implementation Plan...
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This is where I started going much faster I think. I started reading the code less.</p>
</blockquote>

<h3 id="sessions-1518-phase-3-bug-fixes--dogfooding-mar-3-16301730">Sessions 15–18: Phase 3 Bug Fixes &amp; Dogfooding (Mar 3 ~16:30–17:30)</h3>

<p>This is where things got interesting. Manual QA of the daemon revealed that neither <code class="language-plaintext highlighter-rouge">create_entity</code> nor <code class="language-plaintext highlighter-rouge">update_entity_status</code> were creating events — the store layer just didn’t do it. I accidentally deleted the tmp directory at one point. Started using filament to track its own tasks (dogfooding). Refactored the daemon handler “god function” into 7 domain sub-modules. Added multi-agent concurrency tests.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Neither create_entity nor update_entity_status creates events. Events must be created separately.
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>add that as a task on filament
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I have accidentally deleted the tmp directory
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want you to add test cases of cli connecting to the daemon for multiple agents using multithreading
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I got an idea that Claude can probably manual QA this and it should because this is an agent tool first so I did it. I’m glad I did. It also exposed bugs and shortcuts Claude was taking (not fully implementing the event store feature) so it was a good call. Another effect of going fast.</p>
</blockquote>

<h3 id="sessions-1920-phase-3-refactoring-mar-3-17501830">Sessions 19–20: Phase 3 Refactoring (Mar 3 ~17:50–18:30)</h3>

<p>Handler refactoring and code review of Phase 3. Deduplicated gotchas from MEMORY.md into a proper gotchas document + filament knowledge graph. Context window ran out during this — had to continue from summary.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want to do some refactoring of the big handler function
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want you to deduplicate some content in .md files. In memory.md there are a bunch of gotchas, those can be in a proper gotchas markdown file and in filament as well
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> A consequence of going too fast. I noticed a god function and cruft being built up in .md files. I decided to slow down and fix things.</p>
</blockquote>

<h3 id="sessions-2124-mcp-server-mar-3-18252000">Sessions 21–24: MCP Server (Mar 3 ~18:25–20:00)</h3>

<p>Planned and implemented MCP server using the <code class="language-plaintext highlighter-rouge">rmcp</code> crate — 12 tools via stdio transport. Code audit: removed dead code, fixed clippy warnings, added 4 new MCP tools. Manual QA of MCP implementation.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do the MCP server, start planning
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>put the plan into filament as tasks
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do a code review and try to also fix warnings given by clippy that got bypassed with allows
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do some manual testing of the mcp implementation
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> Making fast progress but doing more reviews and manual QA as needed.</p>
</blockquote>

<h3 id="sessions-2528-major-refactoring--slugs--entity-adt-mar-3-20002130">Sessions 25–28: Major Refactoring — Slugs + Entity ADT (Mar 3 ~20:00–21:30)</h3>

<p>This was the biggest cross-cutting change. Identified the name collision problem — entities were looked up by name which could overlap. Switched to 8-char base36 slug identity (ADR-019). Then refactored <code class="language-plaintext highlighter-rouge">Entity</code> from a flat struct to a tagged enum (<code class="language-plaintext highlighter-rouge">Task | Module | Service | Agent | Plan | Doc</code>) with typed variants and <code class="language-plaintext highlighter-rouge">TypeMismatch</code> errors for compile-time safety (ADR-020). Then further type-safety improvements, replacing runtime <code class="language-plaintext highlighter-rouge">is_task()</code> checks with pattern matching.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the entities are related/found using names which I think could overlap. beads_rust uses a randomly generated slug to identify and match
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: Refactor: Slug-Based Identity + Entity ADT...
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>we did a big ADT refactoring all over, let's do an analysis of the codebase on similar issues
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the features require task and knowledge graph management both but it seems that it only resolves to task and agent, is this correct?
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I knew this part would be massively breaking and set back progress but I didn’t want problems in the future so I did it. This meant that I had to stop using filament to self check filament (easier and faster to just trash everything). This is I think where the bigger cracks started to show in my workflow. I was going fast and not fully reviewing the code well because the amount of code was overwhelming. Maybe I could have prompted better or provided better context.</p>
</blockquote>

<h3 id="sessions-2931-phase-4--agent-dispatching-mar-3-21302230">Sessions 29–31: Phase 4 — Agent Dispatching (Mar 3 ~21:30–22:30)</h3>

<p>Implemented the dispatch engine: spawn subprocess via <code class="language-plaintext highlighter-rouge">std::process</code>, monitor via <code class="language-plaintext highlighter-rouge">tokio::spawn</code>, parse <code class="language-plaintext highlighter-rouge">AgentResult</code> JSON, route messages, death cleanup (revert task, release reservations, refresh graph). Agent roles: Coder, Reviewer, Planner, Dockeeper with compiled-in prompts and tool whitelists. 23 new tests.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do phase 4 and before we start, let's make sure the docs are up to date
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: Phase 4: Agent Dispatching...
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do the next priorities that are in MEMORY.md
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I was getting tired but pushing through. I let Claude plan and just do it and decided to catch bugs later. At this point, I was barely reading the code. A mistake but I guess deliberate. I wanted to go fast. That was the purpose of this experiment/project.</p>
</blockquote>

<h3 id="session-32-p1-bug-deep-dive-mar-3-2300">Session 32: P1 Bug Deep Dive (Mar 3 ~23:00)</h3>

<p>The most complex debugging session. The dispatch engine had a child reaping race condition — when the server batched multiple agent dispatches, child processes could be reaped by the wrong handler. Fix: <code class="language-plaintext highlighter-rouge">std::process</code> + remove server-side batch dispatch, CLI <code class="language-plaintext highlighter-rouge">dispatch-all</code> now loops individual <code class="language-plaintext highlighter-rouge">dispatch_agent</code> RPCs instead.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's fix the P1 bug and before you do, explain to me the dispatch code logic/overall structure and how the P1 bug occurs in detail
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> This part was genuinely technically challenging because multithreading/multiprocessing is not an area I am well versed in. I made an executive decision to just do things sequentially instead of chasing this. I will have to sit down and study this part in detail in the future (I hope).</p>
</blockquote>

<h3 id="sessions-3334-phase-5--tui-mar-3-23102330">Sessions 33–34: Phase 5 — TUI (Mar 3 ~23:10–23:30)</h3>

<p>Implemented ratatui-based TUI with task, agent, and reservation views. 7 TUI tests. Fastest phase to implement.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do phase 5
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: Phase 5: TUI Implementation Plan...
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> Final push to get shit done.</p>
</blockquote>

<h3 id="sessions-3538-code-reviews--bug-fixes-mar-3-2340--mar-4-0200">Sessions 35–38: Code Reviews &amp; Bug Fixes (Mar 3 23:40 – Mar 4 02:00)</h3>

<p>Phase 5 code review fixed N+1 query patterns in both CLI and TUI. Created <code class="language-plaintext highlighter-rouge">batch_get_entities</code> API in core to eliminate them. Then a comprehensive code smell analysis: where to use ADTs/value objects to make illegal states unrepresentable, where to simplify code. Created a 15-task code review plan. Completed all items: type-strengthened DTOs, CLI args, dedup utils, broke a bidirectional dependency.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want to see if there are any more inefficient SQL usage patterns in the UI (cli and tui) and I want them fixed
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>where can we use ADTs/value objects to make illegal states unrepresentable?
where can we make the code simpler?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fix the bugs and code smells from session34 in .plan
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I decided to do a review of the code with Claude. I was quite tired and honestly could not read everything myself. Instead, I directed Claude to look for things that would be code smells and bugs and got Claude to fix them. This made me think deeply about “what is good code?” and “how do I properly balance speed and quality in software development?”</p>
</blockquote>

<h3 id="koupang-interlude-2-dev-rules--autonomy-discussion-mar-4-0110">Koupang Interlude 2: Dev Rules &amp; Autonomy Discussion (Mar 4 ~01:10)</h3>

<p>Switched back to Koupang again late at night. Added development rules to Koupang’s CLAUDE.md and had a discussion about code style preferences — god functions vs overly fragmented code. Then explored what additional subagents/skills/MCP tools would help Claude be more autonomous. Did some housekeeping on context-affecting <code class="language-plaintext highlighter-rouge">.md</code> files.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I have added some development rules into CLAUDE.md. Let's have a discussion about them
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for a god function/class I think of a function/class where I have to scroll thousands of lines of code. for overly fragmented functions/classes I think of a function/class I have to jump around between dozens of files...
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do some more guides to help you be more autonomous if needed, what other subagents/skills/mcp whatever would be useful. 1. in general? 2. for this project?
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>should we do some housekeeping on the context affecting .md files?
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> After thinking deeply about software and guidelines, I decided to go back to Koupang and put them in the CLAUDE.md. It forced me to introspect and clearly communicate what I want out of software which I think was good. This also made it clear to me why I had resistance to simply adopting other people’s LLM workflows. I didn’t fully know what the other people valued and what I would be feeding the LLM. I wanted full control and customization.</p>
</blockquote>

<h3 id="sessions-3940-readme-license--aggressive-qa-mar-4-10001100">Sessions 39–40: README, License &amp; Aggressive QA (Mar 4 ~10:00–11:00)</h3>

<p>Wrote a comprehensive README with installation and usage guide. Added MIT license and an inspiration section crediting beads_rust and Flywheel. Planned aggressive QA rounds targeting concurrency, state corruption, and edge cases. Pushed to GitHub.</p>

<details>
  <summary>Prompts</summary>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I need you to write a comprehensive readme on installation instructions and how to use all of the features
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>added MIT license, also put in section about inspirations. I was directly inspired by beads_rust and flywheel but wanted one tool in rust that did everything for me
</code></pre></div>  </div>

  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want to do some aggressive QA where the goal is to not conservatively test but break stuff
</code></pre></div>  </div>

</details>

<blockquote>
  <p><strong>My take:</strong> I still think I didn’t manage to catch as many bugs as I hoped. I kinda expected this but the goal was to go fast and finding problems within my workflow. I think I achieved the goals. As I am writing this blog post, I am conducting aggressive QA with Claude on the filament tool. I think this was overall a productive experiment. I discovered where the limitations in my LLM assisted workflows are and made me think about what I consider good software (a deeply contentious topic in the industry). I also reminded myself that you should challenge yourself more often because it’s easy to fall into patterns.</p>
</blockquote>

<h2 id="observations">Observations</h2>

<ol>
  <li>Reducing reviews led to more bugs and worse code. No surprise there. I also lost some understanding of the later parts of the code. Those can be rectified.</li>
  <li>The speed felt sustainable at first but as things progressed, it did not. I was in a “vibe coding”, ugh I hate that phrase, stupor and was just focused on shipping like I was in a super early stage startup.</li>
  <li>I don’t think I found the right balance in speed and review with this single agent setup. Maybe I will discover it later.</li>
  <li>“Good code” is whatever you (or a team if you are working in a team) find comfortable to work in I think. I think my thoughts on good code will change as I progress in my career and meet more people and work on different projects. Right now, I am doing things alone and am sticking to what I am comfortable with and following principles I agree with (“A Philosophy of Software Design” is my preferred overall software guide).</li>
  <li>Claude definitely is better at catching small subtle bugs than me while I am better at bigger code smells and code direction. I should try to get better at what Claude is doing (catching small bugs) but that could be difficult considering the amount of code produced. I noticed that big PRs get quick LGTM! from reviewers and yeah that ain’t gonna change soon.</li>
  <li>Going fast on Filament made me appreciate the slower more deliberate pace of Koupang. Working on Koupang felt like more proper engineering while Filament felt like a startup. I can do both but I learned that LLMs just enhance existing aspects. If you go fast and break things, LLMs will let you go even faster and break even more things. If you go slow and deliberate, LLMs will speed you up but it will help you be more deliberate. It really is all up to the user.</li>
</ol>

<h2 id="whats-next">What’s next</h2>

<p>I will finish Filament and maybe add a couple more features on the TUI side then I will return to Koupang (I have been implementing the outbox which is quite crucial before I got distracted). Stay tuned.</p>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[In Part 4, I finished planning for the big order orchestration saga and then started implementing. I am in the middle of implementing it but I decided to go on a little side quest. Things were getting complacent and I wanted to shake things up.]]></summary></entry><entry><title type="html">Current LLM Workflow Setup</title><link href="https://jyc11.github.io/blog/2026/02/26/current-LLM-workflow-setup" rel="alternate" type="text/html" title="Current LLM Workflow Setup" /><published>2026-02-26T00:00:00+00:00</published><updated>2026-02-26T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/02/26/current-LLM-workflow-setup</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/02/26/current-LLM-workflow-setup"><![CDATA[<p>A snapshot of my current setup for LLM-assisted development, as of February 2026. This is what I use to build <a href="https://github.com/JYC11/koupang">Koupang</a>, the Rust e-commerce project documented in the <a href="/blog/2026/02/23/getting-gud-at-llms-pt1.html">Getting Gud at LLMs</a> series.</p>

<p>I got Claude to look at its own configuration, write the summary of configuration and format the post nicely. Like pt3, I peppered in my commentary in block quotes. I thought this could be a separate post on its own as this is quite meta and mostly informational.</p>

<hr />

<h2 id="environment">Environment</h2>

<table>
  <thead>
    <tr>
      <th> </th>
      <th> </th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Terminal</strong></td>
      <td>Ghostty</td>
    </tr>
    <tr>
      <td><strong>IDE</strong></td>
      <td>IntelliJ (Rust development), Zed (blogging)</td>
    </tr>
    <tr>
      <td><strong>LLM Tool</strong></td>
      <td>Claude Code CLI (Claude Opus)</td>
    </tr>
    <tr>
      <td><strong>Task Management</strong></td>
      <td><a href="https://github.com/Dicklesworthstone/beads_rust">beads_rust (br)</a></td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="claude-code-configuration">Claude Code Configuration</h2>

<h3 id="plugins">Plugins</h3>

<table>
  <thead>
    <tr>
      <th>Plugin</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>rust-skills</strong></td>
      <td>Rust-specific guidance — ownership, concurrency, error handling, domain patterns, crate research, daily news</td>
    </tr>
    <tr>
      <td><strong>rust-analyzer-lsp</strong></td>
      <td>LSP integration for go-to-definition, find references, symbol analysis</td>
    </tr>
  </tbody>
</table>

<h3 id="skills">Skills</h3>

<p>Custom skills loaded from <code class="language-plaintext highlighter-rouge">~/.claude/skills/</code>:</p>

<table>
  <thead>
    <tr>
      <th>Skill</th>
      <th>Trigger</th>
      <th>What it does</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>br</strong></td>
      <td><code class="language-plaintext highlighter-rouge">/br</code>, mentions of tasks/issues/backlog</td>
      <td>Task lifecycle management via beads_rust CLI — create, query, update, close, dependency tracking</td>
    </tr>
    <tr>
      <td><strong>project-context</strong></td>
      <td><code class="language-plaintext highlighter-rouge">/project-context</code>, session start</td>
      <td>Reads CLAUDE.md files for onboarding; updates them after significant changes</td>
    </tr>
    <tr>
      <td><strong>skill-creator</strong></td>
      <td>Creating new skills</td>
      <td>Guide for writing effective Claude Code skills</td>
    </tr>
    <tr>
      <td><strong>bd-to-br-migration</strong></td>
      <td>Migrating from beads to beads_rust</td>
      <td>Command mapping and migration patterns from <code class="language-plaintext highlighter-rouge">bd</code> to <code class="language-plaintext highlighter-rouge">br</code></td>
    </tr>
  </tbody>
</table>

<h3 id="hooks">Hooks</h3>

<p>One hook configured on <code class="language-plaintext highlighter-rouge">UserPromptSubmit</code>:</p>

<p><strong>Prompt logger</strong> (<code class="language-plaintext highlighter-rouge">log-prompt.sh</code>) — automatically captures every user prompt to a daily session log file (<code class="language-plaintext highlighter-rouge">session-log-YYYY-MM-DD.md</code>). Only activates within the Koupang project directory. Filters out system/command messages and empty prompts. These logs feed directly into the blog posts — it’s how I track session counts and reproduce exact prompts.</p>

<blockquote>
  <p><strong>My take</strong>: I am quite surprised I need so few skills, plugins and hooks to be this productive. I may try to branch out further with more stuff to see if I’m missing out. Also very satisfied with having Claude generate its own summaries bceause this saves a lot of time when writing posts where the content is mostly factual and self-documenting. Making Claude write stuff that it will later use kinda reminds me of metaprogramming. If LLMs <em>were</em> a deterministic programming language, I think it would be a combination of Ada (the “English”-like syntax) and Lisp (language capable of a lot of metaprogramming).</p>
</blockquote>

<h3 id="permissions">Permissions</h3>

<p>Explicitly allowed (no confirmation needed):</p>

<ul>
  <li><strong>Read-only shell</strong>: <code class="language-plaintext highlighter-rouge">ls</code>, <code class="language-plaintext highlighter-rouge">cat</code>, <code class="language-plaintext highlighter-rouge">find</code>, <code class="language-plaintext highlighter-rouge">grep</code>, <code class="language-plaintext highlighter-rouge">tree</code>, <code class="language-plaintext highlighter-rouge">git status/log/diff/show/tag</code>, etc.</li>
  <li><strong>Git write</strong>: <code class="language-plaintext highlighter-rouge">git add</code>, <code class="language-plaintext highlighter-rouge">commit</code>, <code class="language-plaintext highlighter-rouge">checkout</code>, <code class="language-plaintext highlighter-rouge">merge</code>, <code class="language-plaintext highlighter-rouge">rebase</code>, <code class="language-plaintext highlighter-rouge">push</code></li>
  <li><strong>Cargo</strong>: <code class="language-plaintext highlighter-rouge">check</code>, <code class="language-plaintext highlighter-rouge">build</code>, <code class="language-plaintext highlighter-rouge">test</code>, <code class="language-plaintext highlighter-rouge">clippy</code>, <code class="language-plaintext highlighter-rouge">fmt</code>, <code class="language-plaintext highlighter-rouge">run</code>, <code class="language-plaintext highlighter-rouge">add</code>, <code class="language-plaintext highlighter-rouge">doc</code></li>
  <li><strong>Docker</strong>: <code class="language-plaintext highlighter-rouge">docker compose up/down/ps/logs</code></li>
  <li><strong>Make</strong>: all <code class="language-plaintext highlighter-rouge">make</code> targets</li>
</ul>

<p>Explicitly denied:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">rm</code>, <code class="language-plaintext highlighter-rouge">sudo</code>, <code class="language-plaintext highlighter-rouge">curl</code>, <code class="language-plaintext highlighter-rouge">wget</code>, <code class="language-plaintext highlighter-rouge">chmod</code>, <code class="language-plaintext highlighter-rouge">chown</code>, <code class="language-plaintext highlighter-rouge">kill</code></li>
  <li><code class="language-plaintext highlighter-rouge">git push --force</code>, <code class="language-plaintext highlighter-rouge">git reset --hard</code>, <code class="language-plaintext highlighter-rouge">git clean -f</code></li>
  <li><code class="language-plaintext highlighter-rouge">docker rm</code>, <code class="language-plaintext highlighter-rouge">docker rmi</code></li>
  <li><code class="language-plaintext highlighter-rouge">WebSearch</code>, <code class="language-plaintext highlighter-rouge">WebFetch</code></li>
</ul>

<p>Everything else prompts for confirmation.</p>

<blockquote>
  <p><strong>My take</strong>: “Supposedly” allowed but I still have to manually allow permissions for the “safe” commands. Gotta figure out how to properly configure allowed stuff.</p>
</blockquote>

<h3 id="claudemd-files">CLAUDE.md Files</h3>

<p>The project uses hierarchical CLAUDE.md files:</p>

<ul>
  <li><strong>Root</strong> (<code class="language-plaintext highlighter-rouge">koupang/CLAUDE.md</code>) — workspace structure, tech stack, ADR summary, key imports, scripts</li>
  <li><strong>Per-service</strong> (<code class="language-plaintext highlighter-rouge">identity/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">catalog/CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">shared/CLAUDE.md</code>) — detailed architecture, endpoints, domain models, test structure</li>
  <li><strong>Reference docs</strong> (<code class="language-plaintext highlighter-rouge">.plan/</code>) — bootstrap recipe, code patterns, test standards (loaded on-demand, not auto-loaded)</li>
</ul>

<p>These are the primary onboarding mechanism — a new session reads them first and skips redundant exploration.</p>

<blockquote>
  <p><strong>My take</strong>: Sometimes Claude just reads all of the CLAUDE.md files and all the plan files when it loads up. Seems inconsistent but doing my best to prevent context bloat.</p>
</blockquote>

<hr />

<h2 id="development-cycle">Development Cycle</h2>

<p>This is the general loop I follow for each feature:</p>

<ol>
  <li><strong>Plan</strong> — enter plan mode, let Claude explore the codebase, design the approach</li>
  <li><strong>Iterate on the plan</strong> — review, ask questions, refine until the plan is solid</li>
  <li><strong>Put the plan into beads</strong> — create <code class="language-plaintext highlighter-rouge">br</code> tasks with priorities and dependencies so Claude has a structured work queue</li>
  <li><strong>Generate code</strong> — Claude works through tasks, launching subagents for parallel work when possible</li>
  <li><strong>Review code</strong> — read through what was generated, check for scope creep and pattern violations</li>
  <li><strong>Commit code</strong> — stage and commit with meaningful messages</li>
  <li><strong>Clean up</strong> — second pass to catch anything missed: redundant code, missing tests, stale docs</li>
</ol>

<h3 id="what-makes-this-work">What makes this work</h3>

<ul>
  <li><strong>CLAUDE.md files</strong> keep context cheap. A new session doesn’t waste 10 prompts re-discovering the architecture.</li>
  <li><strong>beads_rust</strong> gives structure to multi-step work. Instead of one giant prompt, break it into tasks with dependencies and let Claude work through them.</li>
  <li><strong>The prompt logger</strong> means I never lose track of what I asked. Blog posts practically write themselves from the logs.</li>
  <li><strong>Strict permissions</strong> prevent accidents. No force-pushes, no <code class="language-plaintext highlighter-rouge">rm</code>, no silent <code class="language-plaintext highlighter-rouge">curl</code> calls. Everything destructive requires confirmation.</li>
  <li><strong>Plan mode first</strong> prevents wasted work. Getting alignment on the approach before writing code is always worth the extra 5 minutes.</li>
</ul>

<blockquote>
  <p><strong>My take</strong>: I am currently very happy with this workflow and can see myself using this workflow in the future and in jobs.</p>
</blockquote>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[A snapshot of my current setup for LLM-assisted development, as of February 2026. This is what I use to build Koupang, the Rust e-commerce project documented in the Getting Gud at LLMs series.]]></summary></entry><entry><title type="html">Getting Gud at LLMs Pt4</title><link href="https://jyc11.github.io/blog/2026/02/26/getting-gud-at-llms-pt4" rel="alternate" type="text/html" title="Getting Gud at LLMs Pt4" /><published>2026-02-26T00:00:00+00:00</published><updated>2026-02-26T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/02/26/getting-gud-at-llms-pt4</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/02/26/getting-gud-at-llms-pt4"><![CDATA[<p>In <a href="/blog/2026/02/25/getting-gud-at-llms-pt3.html">Part 3</a>, I finished the catalog service, compacted CLAUDE.md files, and kicked off planning for the order/payment phase. Today was less about writing code and more about optimizing what exists and planning what comes next. I also wrote a separate post about my <a href="/blog/2026/02/26/current-LLM-workflow-setup.html">current LLM workflow setup</a> if you’re curious about the tooling side. I also write about the direction of blog posts since I got to thinking about what was the end goal.</p>

<hr />

<h2 id="direction-of-blog-posts">Direction of blog posts</h2>

<p>The blog posts so far contain 3 topics</p>

<ul>
  <li>using LLMs</li>
  <li>the Koupang project</li>
  <li>my existential dread + musings about software industry</li>
</ul>

<h3 id="git-gud">Git Gud</h3>

<p>My LLM usage stabilized once I figured out planning, beads and context management. According to Steve Yegge’s <a href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04">Gas Town</a>, I am somewhere around level 5. Personally, I haven’t reached the stage in the codebase where multiple agents are justifiably needed. But when I have multiple microservices developed and have many features to develop at once (I started putting more tasks as backlogs besides this order saga to fill things up), I will start experimenting with multiple agent stuff. I am still a big proponent of human in the loop and can still see areas where the LLM messes up so I need to challenge that part as well. I want to see if I can create nearly comprehensive guidelines that many LLMs can follow so that I can grow to “trust” their output. Trust is a funny word considering LLMs are just fancy matrix multiplication calculators.</p>

<p>When I reach maybe level 7/8 (or whatever the highest possible level I can reach) from that Gas Town post, I think that will be a nice place to end this series.</p>

<h3 id="koupang">Koupang</h3>

<p>On the topic of the project itself, I decided that the MVP will be when the full order cycle is completed and is in a deployable state. Once that is done, the functional and non-functional challenges will become more difficult.</p>

<ul>
  <li>On the functional side, I will try to leverage multi-agents to develop many features at once.</li>
  <li>On the non-functional side, I will also try to leverage multi-agents to make this service ready for “production” grade traffic and up-time.</li>
</ul>

<p>These updates will be a bit less frequent and will mostly focus on non-functional stuff. The “feature” work can be outsourced to LLMs since a lot of it is mostly CRUD, but the “tech” side I want to take a bit more time to make it right. I think the posts will be titled “Overengineering Koupang for Fun and Profit pt{n}”.</p>

<h3 id="trauma-dumping-on-main">Trauma dumping on main</h3>

<p>Honestly have no clue about this. May suddenly decide to randomly drop an 10k word essay or whatever.</p>

<hr />

<h2 id="ai-summary-of-work">AI summary of work</h2>

<p>Same deal as pt3 — I got Claude to summarize the work from its own session logs and git history. My commentary is in block quotes.</p>

<h3 id="by-the-numbers">By the numbers</h3>

<p>5 sessions, ~34 prompts across Feb 25 evening and Feb 26:</p>

<ul>
  <li>4 git commits since <code class="language-plaintext highlighter-rouge">v0.3-catalog-complete</code></li>
  <li>19 files changed, +879 lines added, -2,523 lines removed (net -1,644 lines)</li>
  <li>Tests: 207 → 235 (deduped many redundant tests, added shared module tests)</li>
  <li>4 plan files revised with inline comments</li>
  <li>35 beads tasks created with full dependency DAG</li>
  <li>1 new doc: <code class="language-plaintext highlighter-rouge">.plan/test-standards.md</code></li>
</ul>

<p>Git tag: <code class="language-plaintext highlighter-rouge">v0.4-test-optimization-and-planning</code></p>

<p>Most of the work fell into two categories: test optimization and order/payment planning.</p>

<hr />

<h3 id="test-optimization-2-sessions-13-prompts">Test optimization (2 sessions, ~13 prompts)</h3>

<p>Analyzed the test suites across identity and catalog and found significant redundancy. The same CRUD operations were being tested at every layer (router, service, repository) with full end-to-end Postgres containers each time.</p>

<p>Key changes:</p>

<ul>
  <li><strong>Shared container infrastructure</strong> — instead of spinning up a separate Postgres/Redis container per test, containers are now shared per test binary. This cut test setup overhead significantly.</li>
  <li><strong>Test deduplication</strong> — removed 107 redundant tests across identity and catalog (identity 115 → 82, catalog 209 → 135). Coverage maintained — the removed tests were asserting the same behavior at multiple layers.</li>
  <li><strong>Extracted test helpers to shared</strong> — auth fixtures (<code class="language-plaintext highlighter-rouge">seller_user()</code>, <code class="language-plaintext highlighter-rouge">admin_user()</code>), HTTP request builders (<code class="language-plaintext highlighter-rouge">authed_json_request</code>, <code class="language-plaintext highlighter-rouge">authed_get</code>), and pagination unit tests all moved to the shared module so every future service gets them for free.</li>
  <li><strong>Test standards doc</strong> — created <code class="language-plaintext highlighter-rouge">.plan/test-standards.md</code> defining what each test layer should cover, preventing redundancy from creeping back in.</li>
</ul>

<p>Net result: -2,523 lines of test code removed, +879 added (mostly shared infrastructure), and the test suite runs faster.</p>

<blockquote>
  <p><strong>My take:</strong> This was necessary to do and quite quickly achieved. This would have taken me ages to type otherwise so I’m glad I got the LLM to do it. Emphasizes the importance of actually reading what the LLM generates. I made the AI create a <code class="language-plaintext highlighter-rouge">test-standards.md</code> file so that it can refer back to these standards repeatedly to prevent further code like this.</p>
</blockquote>

<hr />

<h3 id="orderpayment-mega-planning-3-sessions-21-prompts">Order/payment mega-planning (3 sessions, ~21 prompts)</h3>

<p>This was the big one. The order/payment phase touches multiple services and needs careful coordination. The planning was split across multiple sessions:</p>

<p><strong>Session 1 — Initial 4-plan structure (from pt3)</strong>
Claude explored the entire codebase and created 4 detailed plan files:</p>

<ol>
  <li><strong>Shared infrastructure</strong> — Kafka KRaft, transactional outbox (<code class="language-plaintext highlighter-rouge">outbox-core</code>), event system with <code class="language-plaintext highlighter-rouge">rdkafka</code>, distributed tracing with Jaeger</li>
  <li><strong>Cart service</strong> — Redis-only, 6 endpoints, 30-day TTL</li>
  <li><strong>Order + Payment</strong> — choreography saga, state machines, mock payment gateway, inventory reservation, compensation flows</li>
  <li><strong>Workflow docs</strong> — ADRs 010-013, CLAUDE.md files, saga flow documentation</li>
</ol>

<p><strong>Session 2 — Plan review with my comments</strong>
I left inline comments on plans 1-3 (titled “Comment on [relevant part]”), then walked through each one with Claude to revise:</p>

<ul>
  <li>Plan 1: Added <code class="language-plaintext highlighter-rouge">ServiceBuilder</code> pattern, typed event enums, DLQ topics, programmatic Kafka topic creation</li>
  <li>Plan 2: Changed cart to display-only totals, added <code class="language-plaintext highlighter-rouge">/validate</code> endpoint, seller order endpoint</li>
  <li>Plan 3: Added <code class="language-plaintext highlighter-rouge">PaymentTimedOut</code> handling, <code class="language-plaintext highlighter-rouge">sku_availability</code> view</li>
</ul>

<p><strong>Session 3 — Double-entry accounting discussion</strong>
I pushed for a double-entry accounting ledger in the payment service, inspired by <a href="https://news.alvaroduran.com/p/engineers-do-not-get-to-make-startup">Alvaro Duran’s article</a> about big tech companies begrudgingly building their own double-entry payment ledgers. Claude revised plan 3 to adopt this approach and added a note about platform commission (out of scope for now but will affect the ledger design).</p>

<p>After the plans were finalized, Claude created 35 beads tasks with a full dependency DAG across all 4 plans, plus 3 MVP milestone tasks (docker-compose deployment, seed script, API walkthrough) and a standalone non-functional requirements task for high traffic/uptime planning.</p>

<blockquote>
  <p><strong>My take:</strong> Still very planning heavy and I could have decided to research way more before going through amendments but I think I need to execute and learn through pain and suffering. The accounting double entry stuff was my call because I know from experience that money records needs a special kind of care/domain knowledge. I predict the Kafka stuff will bite me in the ass but oh well.</p>
</blockquote>

<h2 id="whats-next">What’s next</h2>

<p>The 4 plans are reviewed and waiting for implementation. Next time, I’ll try to push my current setup to its limits by handling this huge set of requirements with me managing 1 agent.</p>

<p>Here’s the full beads dependency tree — this is what Claude will work through:
(I got Claude to use the beads_rust skill to read from beads_rust and format this. This is so very convenient lmao)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PLAN 1: SHARED INFRASTRUCTURE (foundation for everything)
──────────────────────────────────────────────────────────
bd-na8  Event types + typed enums (EventType, AggregateType, EventEnvelope)
├── bd-1j2  KafkaEventPublisher (rdkafka) ← also needs bd-3ga
│   ├── bd-3sv  KafkaEventConsumer with DLQ support
│   │   ├──→ [Plan 3: Order schema, Payment schema, Catalog inventory]
│   │   └── bd-2v4  ADR-010 Event-driven architecture ← also needs bd-1ee, bd-5y4
│   └── bd-27z  Kafka health check
├── bd-5y4  ServiceBuilder composable bootstrap
│   ├──→ [Plan 2: Cart value objects]
│   └──→ bd-2v4 (see above)
└── bd-1fo  MockEventPublisher (test infra)

bd-3ga  Docker compose additions (Kafka KRaft, Kafka UI, Jaeger)
├──→ bd-1j2 (see above)
└── bd-2g4  Programmatic topic creation (AdminClient)

bd-3ej  Research outbox-core crate API compatibility
└── bd-1ee  Outbox integration via outbox-core + migration templates
    ├──→ [Plan 3: Order schema, Payment schema]
    └──→ bd-2v4 (see above)

bd-8fc  Distributed tracing OTLP exporter (independent)


PLAN 2: CART SERVICE (Redis-only)
─────────────────────────────────
bd-337  Cart value objects (Quantity, PriceSnapshot, Currency) ← blocked by bd-5y4
└── bd-2aq  Cart domain model (CartItem, Cart) + Redis data model
    ├── bd-1tw  Cart DTOs (request, validated, response) ──┐
    └── bd-3sf  Cart repository (Redis ops) + tests ───────┤
                                                           ▼
                                              bd-1ra  Cart service + tests
                                              └── bd-aqs  Cart routes + router tests
                                                  └── bd-1p7  Cart bootstrap (lib.rs, main.rs)
                                                      └── bd-305  cart/CLAUDE.md


PLAN 3: ORDER + PAYMENT + INVENTORY
────────────────────────────────────
Order chain:                              Payment chain:
bd-32f  Order schema ← bd-1ee, bd-3sv    bd-1tp  Payment double-entry schema ← bd-1ee, bd-3sv
└── bd-lwv  Order value objects           ├── bd-2cj  Payment gateway trait + mock
    └── bd-186  Order repository          └── bd-1te  Payment repository
        └── bd-2ac  Order service             └── bd-1rk  Payment service ← needs both
            └── bd-3pu  Order routes              └── bd-3of  Payment routes
                └── bd-a1p  Order outbox
                    └── bd-1a3  Order Kafka consumers

Inventory chain:
bd-1yx  Catalog inventory migration ← bd-3sv
└── bd-xsn  Inventory service + repository
    └── bd-2h3  Inventory Kafka consumer

              ┌─── bd-1a3 (order) ──────┐
All three ──► │    bd-3of (payment) ────►├──► bd-b02  Wire Kafka consumers in all main.rs
              └─── bd-2h3 (inventory) ──┘
                                         └── bd-1zd  Saga integration tests
                                             ├── bd-jp3  order/payment CLAUDE.md
                                             └──► MVP track (below)


PLAN 4: DOCS
─────────────
bd-jp3  order/CLAUDE.md + payment/CLAUDE.md ──┐
bd-305  cart/CLAUDE.md ────────────────────────┤
                                              ▼
                              bd-1jp  ADRs 010-014
                              bd-m7m  Saga flow docs ← bd-jp3
                              └── bd-2ne  Progress summary pt3


MVP MILESTONES
──────────────
bd-1zd  Saga integration tests
└── bd-32d  MVP: Docker Compose (all services + infra)
    └── bd-o7r  MVP: Seed data script
        └── bd-9mm  MVP: API walkthrough / Postman collection


BACKLOG (independent, no blockers)
──────────────────────────────────
P2: bd-2yx  Redis caching for product reads → bd-2sh Search engine planning
P3: bd-1yk  Bulk product/SKU CSV processing
P3: bd-3jn  Brands list keyset pagination
P3: bd-1dh  Image upload for products/SKUs
P3: bd-v0a  Plan for high traffic and uptime (NFRs)
P3: bd-dsh  Resilient auth (gRPC + Redis cache + circuit breaker)
P3: bd-2kq  Evaluate repository trait pattern for mockable tests
P4: bd-x38  Advertisements service planning
P4: bd-7m8  Discounts/coupons planning
P4: bd-dj9  Evolve domain FK refs to embedded domain objects
</code></pre></div></div>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[In Part 3, I finished the catalog service, compacted CLAUDE.md files, and kicked off planning for the order/payment phase. Today was less about writing code and more about optimizing what exists and planning what comes next. I also wrote a separate post about my current LLM workflow setup if you’re curious about the tooling side. I also write about the direction of blog posts since I got to thinking about what was the end goal.]]></summary></entry><entry><title type="html">Getting Gud at LLMs Pt3</title><link href="https://jyc11.github.io/blog/2026/02/25/getting-gud-at-llms-pt3" rel="alternate" type="text/html" title="Getting Gud at LLMs Pt3" /><published>2026-02-25T00:00:00+00:00</published><updated>2026-02-25T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/02/25/getting-gud-at-llms-pt3</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/02/25/getting-gud-at-llms-pt3"><![CDATA[<p>In <a href="/blog/2026/02/24/getting-gud-at-llms-pt2.html">Part 2</a>, I finished the catalog service CRUD, reflected on backend development with LLMs, and predicted that complex cross-service features would be where things get hard. Since then, I’ve been building out the catalog further and planning the next major phase.</p>

<hr />

<p>I realized that the summary of what I did can be automated with Claude Code, so that’s what I did. I’ve read through the summary and can verify it’s accurate. I’ll pepper in my own commentary in between (those sections are in blockquotes). When the AI summary says “I,” it’s Claude writing as me.</p>

<h2 id="by-the-numbers">By the numbers</h2>

<p>About 12 sessions and ~50 user prompts across Feb 24 afternoon and Feb 25:</p>

<ul>
  <li>11 git commits since pt2</li>
  <li>Tests: 209 → 207 (net -2 from deduplicating redundant tests, coverage maintained)</li>
  <li>2 new features: brands + categories with ltree hierarchy, keyset pagination with filters</li>
  <li>4 major plans created for the next phase</li>
  <li>CLAUDE.md files compacted significantly (catalog 250→88, identity 125→68, shared 187→82 lines)</li>
  <li>ADR count: 8 → 9 (added ADR-009 for ltree categories)</li>
</ul>

<p>The work fell into a few categories: building new catalog features, managing LLM context, testing and code quality, and planning for the order/payment phase.</p>

<hr />

<h2 id="building-out-the-catalog">Building out the catalog</h2>

<h3 id="category--brand-planning-3-prompts">Category &amp; brand planning (3 prompts)</h3>

<p>Added two new br tasks for categories and brands. Entered plan mode specifically for categories because:</p>

<ul>
  <li>Tree data structures in a relational DB are tricky (chose Postgres <code class="language-plaintext highlighter-rouge">ltree</code>)</li>
  <li>Brand-category validation was needed (e.g. a car brand can’t appear on food products)</li>
</ul>

<h3 id="brand--category-implementation-7-prompts-across-2-sessions">Brand &amp; category implementation (7 prompts across 2 sessions)</h3>

<p>Executed the plan. Implemented brands CRUD, categories with ltree hierarchy, and a brand-category association table. Asked Claude to work on tasks in parallel. Reminded it to reuse value objects (e.g. <code class="language-plaintext highlighter-rouge">HttpUrl</code> for brand logo URL). Ended the session when context started filling up.</p>

<blockquote>
  <p><strong>My take:</strong> I decided to experiment with two coupled features that have interdependencies with each other and are dependent on the existing product code, to see how Claude handles additions to existing code and manages a set of parallel intertwined tasks. I’m quite happy with how it handled itself once the plan was in place and shoved into beads. I got it to launch subagents to do the work as well when it could, to speed up task completion.</p>
</blockquote>

<h3 id="domain-model-clarification--scope-control-5-prompts">Domain model clarification + scope control (5 prompts)</h3>

<p>Clarified to Claude what I meant by “domain objects” — rich domain models where business logic lives, not just FK validation helpers. Claude suggested building FK traversal (following foreign key references to load related domain objects automatically), which I correctly flagged as ORM scope creep. Pushed it to a P4 backlog “nice to have.”</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>to clarify, this is going into the realm of reimplementing orm features which
dramatically increases the scope of the task/project to unmanageable levels
</code></pre></div></div>

<blockquote>
  <p><strong>My take:</strong> The concept of “correct” code appeals to me greatly due to my work experience. When I worked in not-ideal legacy code environments or with fast-paced schedules, I would often skip writing tests and code as fast as possible. That obviously leads to poor and buggy code. Besides writing tests — which is an entire ordeal itself if the code is sloppy or very legacy — I wanted techniques to write code that <em>can’t</em> be wrong (assuming I have the correct understanding of the requirements).</p>

  <p>This led me to shop around for techniques, which landed me on several topics:</p>

  <ol>
    <li>Functional Core, Imperative Shell</li>
    <li>Making illegal states unrepresentable.</li>
  </ol>

  <p>The first concept taught me that code can be split into Logic (CPU-bound tasks or Calculation) and Side Effects (I/O or Data production/consumption). So whenever I look at or think about code, I decompose it into these two broad categories, which lets me reason about how to organize/debug code and choose technology.</p>

  <p>The second concept helped me understand that compilers can encode business logic. I learned that certain language features can be used as guards and as a representation for business logic — that the compiler can be made to encode and “understand” business logic. This was the specific “technique” I was looking for.</p>

  <p>Since LLMs can always make mistakes, I wanted another pre-emptive “layer” of validation besides what the compiler provides. If the LLM accidentally hallucinates something that isn’t just technical and wrong in a business-logic sense, the compiler will be able to catch that as well. This specific case hasn’t happened yet but I’m putting it in there just in case.</p>
</blockquote>

<h3 id="keyset-pagination-3-prompts">Keyset pagination (3 prompts)</h3>

<p>Planned and implemented keyset cursor pagination using UUID v7 ordering for product listing endpoints. Added a thumbnail image (sort_order=1) to the paginated product response. Skipped pagination for images and SKUs since they’re low cardinality and only appear inside product detail views.</p>

<h3 id="product-filters-4-prompts">Product filters (4 prompts)</h3>

<p>Implemented <code class="language-plaintext highlighter-rouge">ProductFilterQuery</code>/<code class="language-plaintext highlighter-rouge">ProductFilter</code> for <code class="language-plaintext highlighter-rouge">GET /api/v1/products</code>. Filters by category, brand, price range, search (ILIKE), and status. Fixed 4 SQL linter bugs that Claude introduced (double WHERE / AND WHERE issues). 5 new router filter tests. All 207 tests passing.</p>

<hr />

<h2 id="testing-and-code-quality">Testing and code quality</h2>

<h3 id="integration-tests-5-prompts">Integration tests (5 prompts)</h3>

<p>Implemented 33 brand &amp; category repository tests (17 brand, 16 category). All 144 tests passing at this point.</p>

<h3 id="test-refactoring-6-prompts">Test refactoring (6 prompts)</h3>

<p>Extracted test helpers to the shared module: auth fixtures (<code class="language-plaintext highlighter-rouge">test_auth_config</code>, <code class="language-plaintext highlighter-rouge">test_token</code>, <code class="language-plaintext highlighter-rouge">seller_user</code>/<code class="language-plaintext highlighter-rouge">buyer_user</code>/<code class="language-plaintext highlighter-rouge">admin_user</code>), HTTP request builders (<code class="language-plaintext highlighter-rouge">json_request</code>, <code class="language-plaintext highlighter-rouge">authed_json_request</code>, <code class="language-plaintext highlighter-rouge">authed_get</code>, <code class="language-plaintext highlighter-rouge">authed_delete</code>). Removed redundant pagination integration tests and duplicated constructor functions from catalog router tests. Net result: -520 lines removed, +337 added. Compacted shared/CLAUDE.md from 187 to 82 lines.</p>

<blockquote>
  <p><strong>My take:</strong> LLMs seem to always default to not preemptively making abstractions and are prone to repeating code. So between the Product Filters and Test Refactoring sections, I spent some time looking over the code Claude generated and identified points for improvement. I’m sure I could spend more time on this, but I’d like to get to the complicated parts and see how my architecture decisions and LLM management skills hold up.</p>
</blockquote>

<hr />

<h2 id="managing-llm-context">Managing LLM context</h2>

<h3 id="claudemd-optimization--skill-creation-7-prompts">CLAUDE.md optimization &amp; skill creation (7 prompts)</h3>

<p>Asked Claude to suggest ways to improve the CLAUDE.md files as “onboarding docs” for new LLM sessions. Created a <code class="language-plaintext highlighter-rouge">project-context</code> skill that handles session onboarding (gathering context efficiently) and documentation maintenance (updating CLAUDE.md files after significant work). Pruned redundant info from root CLAUDE.md. I think this is the most useful skill I’ve created so far.</p>

<h3 id="context-optimization-5-prompts">Context optimization (5 prompts)</h3>

<p>Explored how to minimize the context that gets loaded at the start of every session. Moved code patterns and bootstrap recipe to on-demand <code class="language-plaintext highlighter-rouge">.plan/</code> files instead of always-loaded CLAUDE.md. Cut always-loaded context by roughly 50%. Discussed knowledge graph tools vs markdown for managing project knowledge — decided to stay with markdown. Added deduplication rules so MEMORY.md and CLAUDE.md don’t drift apart with redundant info.</p>

<h3 id="claudemd-compaction--adr-009-5-prompts">CLAUDE.md compaction + ADR-009 (5 prompts)</h3>

<p>Compacted catalog CLAUDE.md from 250 to 88 lines, identity CLAUDE.md from 125 to 68 lines. Created ADR-009 for the ltree categories decision.</p>

<h3 id="housekeeping-8-prompts">Housekeeping (8 prompts)</h3>

<p>Created <code class="language-plaintext highlighter-rouge">v0.2-catalog-crud</code> git tag and pushed to remote. Wrote a <code class="language-plaintext highlighter-rouge">make run SERVICE=&lt;name&gt;</code> command with a <code class="language-plaintext highlighter-rouge">run.sh</code> script, debugged it for both services. Small things, but the kind that save time every day.</p>

<blockquote>
  <p><strong>My take:</strong> I frequently checked the root <code class="language-plaintext highlighter-rouge">.claude</code> folder and kept seeing things pile up, which concerned me. Context management has become a first-class part of the workflow — every session now follows: <code class="language-plaintext highlighter-rouge">/project-context</code> → <code class="language-plaintext highlighter-rouge">/br</code> → work → flush to memory → end session.</p>
</blockquote>

<hr />

<h2 id="planning-the-next-phase">Planning the next phase</h2>

<h3 id="orderpayment-mega-planning-session-4-prompts">Order/payment mega-planning session (4 prompts)</h3>

<p>The big one. Created 4 detailed implementation plans:</p>

<ol>
  <li><strong>Shared infrastructure</strong> — Kafka (KRaft mode, no Zookeeper), transactional outbox pattern, event system with rdkafka, Jaeger for distributed tracing, Redis-only service bootstrap function</li>
  <li><strong>Cart service</strong> — Redis-only microservice, 6 endpoints, 30-day TTL, max 50 SKUs per cart</li>
  <li><strong>Order + Payment services</strong> — choreography saga with state machines, mock payment gateway (Stripe/PayPal), inventory reservation, compensation flows, ~190 tests planned</li>
  <li><strong>Workflow documentation</strong> — ADRs 010-013, CLAUDE.md updates, saga flow documentation</li>
</ol>

<blockquote>
  <p><strong>My take:</strong> I’m going to spend some time reviewing, researching independently, and iterating on the plans with Claude. Infrastructure (specifically Kafka) and saga orchestration are areas where I’m weakest in terms of experience and knowledge, so I’ll have to tread carefully. I personally hate microservices, but this is what I specifically planned for — to push both myself and the LLM.</p>
</blockquote>

<hr />

<h2 id="observations">Observations</h2>

<blockquote>
  <p><strong>My take:</strong> These are actually Claude’s observations. They’re pretty accurate.</p>
</blockquote>

<h3 id="scope-creep-needs-a-human-check">Scope creep needs a human check</h3>

<p>Claude suggested building automatic FK traversal for domain objects. Sounds nice in theory but it’s basically reimplementing an ORM — massive scope increase for questionable benefit. Pushing back on LLM suggestions is a skill that matters.</p>

<h3 id="llms-dont-refactor-proactively">LLMs don’t refactor proactively</h3>

<p>Test infrastructure and shared helpers only got cleaned up because I reviewed the code and pointed out duplication. As the project grows, I don’t want test times ballooning from duplicated setup code and tests that verify the same generic pagination behavior repeatedly.</p>

<h3 id="the-mega-planning-session-validates-my-pt2-predictions">The mega-planning session validates my pt2 predictions</h3>

<p>I predicted that orders, payments, and cross-service coordination would be where complexity explodes. The fact that I needed 4 separate plans just to approach this phase — before writing a single line of code — confirms that. The planning covered saga patterns, state machines, transactional outboxes, compensation flows, and distributed tracing. This is a fundamentally different challenge from “implement CRUD endpoints.”</p>

<hr />

<h2 id="why-i-started-programming">Why I started programming</h2>

<p>I started programming because at my first job, I spent hours on Google Sheets copy-pasting meaningless bullshit. After an agonizing amount of time doing that, developing unwanted muscle memory and getting tired of the farce*, I decided to look for solutions to this Sisyphean task. The answer was programming with — ugh — JavaScript in an online scripting environment integrated into Google Workspace. It was a terrible dev experience. It had none of the convenience features that modern IDEs provide. All I had were my unending fury, hatred of manual work, googling skills, and bottomless willpower. This was around 2020-2021, for context.</p>

<p>When I got my custom macros to work and automated the entire boring task, it was an amazing feeling. The act of programming itself was addicting, and I continued to do it until it became my full-time job.</p>

<p>That’s why when I got my jobs at startups, I enthusiastically threw myself into coding-heavy repetitive tasks (mostly refactoring and test writing without much of the fun “intellectual” domain object stuff). This helped me develop taste and opinions, but it also tired me out physically and mentally. I realized that programming, like any job, requires manual, boring, repetitive work.</p>

<p>Now that LLMs are here to automate the act of writing code itself, I don’t know if I’ll enjoy programming like I used to. Right now, it’s so comfortable to get the LLM to do things for me, and to be honest, I’m not sure sometimes what the point of writing code faster even is. Right now, the shiny new toy is very interesting and I want this blog to be a showcase of my skills so that I don’t need to do silly leetcode-style tests. That also makes me think: is being a good employee my true end goal? My thoughts are complicated and I need to think about it more. Nevertheless, I am still excited about the project.</p>

<p>* Looking back now I understand why work was done that way(making interns do spreadsheet manual labour) but I still hate it with a great passion</p>

<h2 id="git-tag">Git Tag</h2>

<ul>
  <li><code class="language-plaintext highlighter-rouge">v0.3-catalog-complete</code></li>
</ul>

<hr />

<h2 id="whats-next">What’s next</h2>

<p>The 4 plans are created and waiting for review. Next is implementing them in order — shared infrastructure first (Kafka, outbox, tracing), then cart, then order+payment. This is where the real test begins.</p>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[In Part 2, I finished the catalog service CRUD, reflected on backend development with LLMs, and predicted that complex cross-service features would be where things get hard. Since then, I’ve been building out the catalog further and planning the next major phase.]]></summary></entry><entry><title type="html">Getting Gud at LLMs Pt2</title><link href="https://jyc11.github.io/blog/2026/02/24/getting-gud-at-llms-pt2" rel="alternate" type="text/html" title="Getting Gud at LLMs Pt2" /><published>2026-02-24T00:00:00+00:00</published><updated>2026-02-24T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/02/24/getting-gud-at-llms-pt2</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/02/24/getting-gud-at-llms-pt2"><![CDATA[<p>In <a href="/blog/2026/02/23/getting-gud-at-llms-pt1.html">Part 1</a>, I built the identity service for <a href="https://github.com/JYC11/koupang/tree/main">Koupang</a> and was surprised at how well Claude handled Rust and niche crates. Now I’m tackling the catalog service and experimenting with a task management workflow.</p>

<hr />

<h2 id="using-a-task-manager">Using a task manager</h2>

<p>I figured out how to get Claude to use beads_rust. The key insight was splitting planning and execution into separate sessions so that the execution session starts with a clean context and only has the task list to work from.</p>

<p>The workflow:</p>

<ol>
  <li>Load in the beads_rust skill at the start of the session</li>
  <li>Start plan mode</li>
  <li>Give requirements</li>
  <li>Iterate plan</li>
  <li>When Claude asks to go ahead with the plan, I “reject” the execution and get it to put the tasks with dependencies into beads_rust</li>
  <li>Close the planning Claude session</li>
  <li>Start a new Claude session and load in beads_rust skill</li>
  <li>Tell Claude to use beads_rust to look at what it needs to do and to execute it</li>
</ol>

<p>The reason I close the planning session and start fresh is to prevent context bloat. The planning conversation can get long, and I don’t want all of that history polluting the execution phase. By writing the plan into beads_rust, the new session can pick up exactly what it needs to do without carrying the baggage of the planning discussion.</p>

<hr />

<h2 id="results-and-thoughts">Results and Thoughts</h2>

<h3 id="first-impressions">First impressions</h3>

<ul>
  <li>It followed the beads it created well
    <ul>
      <li>To be fair, it also kept notes in its internal MEMORY.md file about the next task (catalog service) and which bead to use</li>
      <li>I should consider clearing the memory and then trying a new task that it put in beads to see how well it performs without that crutch</li>
    </ul>
  </li>
  <li>It does simple CRUD very well so I was not too surprised it did the simple CRUD stuff perfectly</li>
  <li>It unfortunately just queries the entire db for list endpoints instead of using pagination despite there being common pagination support modules in the context (maybe it got lost?)</li>
  <li>It properly used value objects (I added stuff about value objects to the identity service) without me having to mention it</li>
  <li>It appropriately suggested claims based authentication. When I pushed back on using the gRPC server and a circuit breaker pattern, it suggested that those can be enhancements for later</li>
  <li>It didn’t do router tests and only implemented them when I told it to</li>
  <li>It just implemented dynamic updating based on optional parameters in the product update request dto without me telling it to</li>
  <li>Catalog service is currently very CRUD-y which makes sense considering there isn’t much “business logic” yet so that’s fine for now</li>
  <li>This saved me a TON of time typing
    <ul>
      <li>I would say 2-3 days of repetitive typing and debugging got reduced to 2-ish hours of planning and waiting for the LLM to generate code</li>
    </ul>
  </li>
</ul>

<h3 id="deeper-reflections-after-stepping-away-for-lunch">Deeper reflections (after stepping away for lunch)</h3>

<ul>
  <li>It properly planned out + implemented slugs for products, sku code, non-zero skus and other such domain specific requirements correctly without me explicitly saying so</li>
  <li>Something I forgot was dealing with locks in databases. Something I used to do a lot was doing very conservative pessimistic locking for records before updating for strong consistency requirements. It didn’t really suggest doing something like that but I should have at least considered it somehow.</li>
  <li>The get product detail was done with 3 separate queries (1 for product, 1 for skus, 1 for images). I personally would have used a join and some json aggregation to do one query but this is acceptable. I found that many people have many different opinions about the best way to interface with a database from the application code side.</li>
  <li>Not using an ORM (although I think they are a fantastic tool <strong>if</strong> used well with intention) I think was a good choice. I shied away from raw SQL before LLMs due to the finicky nature of raw string manipulation, having to map between code and sql result sets by casting <code class="language-plaintext highlighter-rouge">Any</code> types to concrete types, and syncing between code and sql being annoying. But LLMs make these kinds of things very trivial to do and they are quite decent at SQL and they do not have to learn some potentially niche ORM library without many examples.</li>
  <li>The LLM still handles the current size of the codebase well.</li>
</ul>

<hr />

<h2 id="predictions">Predictions</h2>

<ul>
  <li>I am building simple foundations for more complex features so things are going well so far</li>
  <li>I expect dealing with orders, shipping, payment, refunds (things surrounding products) to massively increase complexity. These features require cross-service coordination, state machines (order lifecycle, payment states), and careful handling of failure scenarios (partial refunds, failed shipments). I would like to see how I can use LLMs to handle them well but I expect the LLM to struggle here.</li>
  <li>Certain features more directly related to the catalogs like dynamic pricing based on algorithms and discount features, searching for products, handling high traffic while keeping track of stock could also challenge me and LLMs. These involve algorithmic thinking and concurrency concerns that go beyond CRUD patterns, so I expect the LLM to struggle in this regard as well.</li>
  <li>My Claude Code install is fresh so there isn’t much cruft in the various config and memory files on my computer (I checked). As this grows, I expect Claude to be a bit more confused even if I keep the context fresh each session. Stale or accumulated memories from past sessions could mislead future ones, so I think I would have to clear those regularly.</li>
</ul>

<hr />

<h2 id="thoughts-about-backend-development">Thoughts about backend development</h2>

<p>A lot of backend development is just plumbing data in my humble opinion. You receive some kinda data through the network, you shove it into some kinda persistence layer, you retrieve something from the persistence layer, you throw it out into the network, repeat ad infinitum. There are lots of established patterns that I just need to re-implement again and again.</p>

<p>“Difficulty” in backend development came from:</p>

<ol>
  <li>Not knowing programming well</li>
  <li>Not understanding how to use the libraries/packages</li>
  <li>Writing bad code and suffering from it</li>
  <li>Working with code that others have wrote</li>
  <li>Crazy deadlines and fluctuating requirements</li>
</ol>

<p>But as time passed:</p>

<ol>
  <li>Programming by hand solved the not knowing programming well as I developed intuition and muscle memory about the language</li>
  <li>Reading documentation and looking up guides solved the library/package issue</li>
  <li>Writing bad code got solved when I was forced to refactor my own code to make it testable and I developed intuition on how to write simpler more testable code</li>
  <li>Working with code written by other people is still difficult</li>
  <li>I can’t do anything about deadlines and fluctuating requirements — but faster coding helps with both</li>
</ol>

<h3 id="how-llms-change-this">How LLMs change this</h3>

<ul>
  <li>LLMs solve issue 1 and 2 (syntax and library/package issues). They have tons of training data and can look things up online.</li>
  <li>LLMs don’t really solve issue 3 (bad code) definitively. It really depends on what you feed it.</li>
  <li>LLMs <em>could</em> solve issue 4 (working with code written by other people) but I haven’t used LLMs in a context where I’m completely new to the codebase and there is no one to onboard me.</li>
  <li>LLMs don’t solve issues about deadlines and requirements directly, but they make working with fluctuating requirements easier because writing and rewriting code is much faster.</li>
</ul>

<h3 id="skill-atrophy-and-the-next-generation">Skill atrophy and the next generation</h3>

<p>I have experience writing bad code, improving and unraveling my cocoon of ignorance on various programming topics which arguably lets me be effective in structuring code, providing samples, fixing code produced by AI. But I wonder if these skills would atrophy as I use LLMs more. I also wonder if the newer generation of software engineers will develop different kinds of intuition.</p>

<h3 id="problems-i-havent-faced-yet">Problems I haven’t faced yet</h3>

<p>There are also problems I haven’t faced a lot yet such as dealing with extremely high traffic, working with low hardware resource constraints, maintaining very high uptime, dealing with distributed systems and such. I never got to develop experience and intuition doing this “the old way” from working at smaller companies so I wonder how I will develop as an engineer when I will hopefully get to tackle these kinds of issues with an LLM in the future.</p>

<hr />

<h2 id="prompt-log-planning-the-catalog-service">Prompt Log: Planning the Catalog Service</h2>

<p>One thing I wanted to do with these posts was show the actual interaction flow, not just the results. Here’s the exact sequence of prompts I used to plan the catalog microservice with Claude Code.
This section was done with help from Claude parsing the jsonl files and I have to say it did it VERY well. I expected mild prompt injection as it was reading about previous prompts but it didn’t?? I also automated this step with Claude suggesting I put reminders in the Claude.md file and added a hook with a script Claude created. I am quite impressed.</p>

<h3 id="1-check-task-board-state">1. Check task board state</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/br
</code></pre></div></div>

<p>Checks current beads_rust task board. Confirms starting from a clean slate.</p>

<h3 id="2-clean-up-stale-memory">2. Clean up stale memory</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>remove plan 5 from your memory wherever that is because it has been partially implemented
</code></pre></div></div>

<p>Housekeeping — removes outdated progress entries from auto-memory before starting new work.</p>

<h3 id="3-enter-plan-mode-with-detailed-requirements">3. Enter plan mode with detailed requirements</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/plan
I want to start working on the catalog microservice now
The catalog microservice will allow buyers and admins to upload and manage products
The 2 main important tables will be Product and Sku. Sku is a child table of Product.
1 Product can have many Skus. A Sku is a variation of a Product
Eg: A product can be a shoe and a sku can be the shoe sizes
The granularity of the sku can be something we discuss
The catalog service will also need to keep track of the inventory levels as well
For the initial iteration of the catalog service, we will assume that image files are
handled somehow and we receive links to images. Detailed image/media handling can be
implemented later
We also need to keep track of prices here. When dealing with money, we need to make sure
to use the correct types because floating point numbers do not reflect money behaviour
accurately.
Do the plan first and then I will take a look
</code></pre></div></div>

<p>Enters plan mode. Provides high-level requirements with open questions (SKU granularity). Explicitly says “do the plan first” to let Claude explore and design before execution.</p>

<h3 id="4-answer-design-questions">4. Answer design questions</h3>

<p>Claude asked 3 targeted questions:</p>

<ul>
  <li><strong>SKU variant attributes model?</strong> → Selected: “Flexible JSON attributes (Recommended)”</li>
  <li><strong>Price and inventory on SKU level?</strong> → Selected: “Yes, both on SKU (Recommended)”</li>
  <li><strong>Who can create/manage products?</strong> → Selected: “Sellers and Admins (Recommended)”</li>
</ul>

<h3 id="5-challenge-a-design-decision">5. Challenge a design decision</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>can you explain the claims based auth decision?
</code></pre></div></div>

<p>Rejected the initial plan approval to ask about a specific architectural choice. Claude explains the claims-based vs gRPC auth trade-off. This is a key technique — rejecting plan approval doesn’t lose work, it just lets you dig deeper.</p>

<h3 id="6-propose-alternative-with-nuance">6. Propose alternative with nuance</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I prefer the gRPC to identity but the point about coupling is correct. Instead, I think
adding a periodic health check to the identity grpc to determine whether to call the gRPC
service would be better and then gracefully fail to claims based. Also, caching on the
catalog service side for the gRPC identity service could work. What do you think about
these 2 suggestions?
</code></pre></div></div>

<p>Pushes back with a hybrid approach. Claude analyzes both suggestions and recommends phasing the work.</p>

<h3 id="7-agree-on-phasing">7. Agree on phasing</h3>

<p>Claude asked whether to phase the work or include everything now. I selected “Phase it (Recommended)”.</p>

<h3 id="8-log-tasks-in-beads_rust">8. Log tasks in beads_rust</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want you to use beads_rust to log the plan for future use
</code></pre></div></div>

<p>Before approving execution, asks Claude to create br tasks with dependencies so the plan is tracked in the task management system. This is the step that enables the “close session and start fresh” workflow described above.</p>

<h3 id="9-document-the-workflow">9. Document the workflow</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>do not execute yet, I want to restart the session. I also want a workflow of user input
prompts into claude cli for blogging purposes to demonstrate claude cli tool usage. Can you
log all of my user prompts and then create a repeatable workflow for future use in all
other sessions?
</code></pre></div></div>

<p>This is where I stopped the planning session and asked Claude to document everything before restarting for execution.</p>

<h3 id="key-techniques-demonstrated">Key techniques demonstrated</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/plan</code> mode separates research from execution</li>
  <li>Rejecting plan approval to ask questions (doesn’t lose work)</li>
  <li>Using <code class="language-plaintext highlighter-rouge">br</code> (beads_rust) for persistent task tracking across sessions</li>
  <li>Phased delivery: start simple, enhance later</li>
  <li>Pushing back on design decisions with your own suggestions</li>
</ul>

<hr />

<h2 id="prompt-log-all-sessions-so-far">Prompt Log: All Sessions So Far</h2>

<p>I extracted the user prompts from all 26 Claude Code sessions across both the identity and catalog services. Below is the condensed version — system noise stripped out, plans summarized instead of pasted in full. This covers about 2 days of work.</p>

<h3 id="identity-service-integration-tests-sessions-1-5">Identity Service: Integration Tests (Sessions 1-5)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/plan
your task now is to plan implementing integration tests for the identity microservice.
The integration tests should use the #[sqlx::test] macro for actual database usage for
all levels of tests which are to be implemented (repository_test, service_test,
router_test). Plan test cases for me to review as well
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [full plan with 39 test cases across 3 layers,
plus 2 bug fixes found during planning]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want you to run integration tests for the identity service using the Makefile command.
There will be a test failure for get_current_user_returns_correct_user this test. I want
you to identify the cause, explain it then fix it
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>can you fix the other issues found by the other test failures?
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>your task is to refactor the GetCurrentUser trait to make it async and fix the
implementation on the identity service side. Use the identity service integration test
to verify the fix works
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>to make myself clear, I meant make it async fn get_by_id, the current implementation
"technically" is but I want to use the async syntax for the trait
</code></pre></div></div>

<h3 id="claude-code-configuration-session-6">Claude Code Configuration (Session 6)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want help configuring claude code cli. I want to allow safe commands like ls, grep,
cat, etc bash commands for reading to be allowed while I want commands like rm, curl
(potentially accessing malicious links), etc to require permission from me
</code></pre></div></div>

<h3 id="shared-module-extraction-sessions-7-8">Shared Module Extraction (Sessions 7-8)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/plan
based on the high level description and current implementation of the identity service,
I want to identify some common code/utility things that can be put in the shared module.
Do some planning for me to review and then after planning, use beads (br skill) to put
those as tasks with proper dependencies
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [extract 6 modules to shared: observability, server
bootstrap, API responses, auth guards, health check, DTO helpers]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cool, now update CLAUDE.md file in the root folder to record for future use what can
be reused from the shared module in a compact manner
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>there are other pieces of code in the shared module that aren't mentioned in the
CLAUDE.md file, put those in as well
</code></pre></div></div>

<h3 id="auth-flows-sessions-9-14">Auth Flows (Sessions 9-14)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/plan
read the .plans/critical-user-flows.md 8 Auth Flows and do some planning 1 at a time.
I will review each plan 1 by 1 and then you will add them to beads
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Plan #1: Email Interface — trait + mock]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's move to phase 2 and remember when running tests, refer to the makefile for the
test running commands
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Plan #2: Email Verification on Registration]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>when making migrations, use the make migration command (refer to the makefile for
details) to create the migration file, continue with phase 2
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's do phase 3 now
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Plan #3: Password Reset Flow]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>go ahead with the plan for phase 3
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>now let's do phase 4
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/br
create br issues for remaining plan#5 AFTER you make the plan and let me approve
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Plan #5: gRPC + Redis Caching]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the grpc_service currently has a build error, the generated type and the implementation
seems to be not matching despite it using the same type
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>now I need you to abstract away the redis connecting and grpc server bootstrapping to
the shared module as I can see it being used commonly in many services
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Abstract Redis + gRPC bootstrapping into shared]
</code></pre></div></div>

<h3 id="testing-infrastructure-sessions-15-18">Testing Infrastructure (Sessions 15-18)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I need you to implement integration tests for the grpc service in the identity service
user package
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>is there a way to actually run the grpc server and then call from a grpc client for
the test?
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/plan
the current GetCurrentUser implementation and test gracefully handles the non-presence
of an actual redis client, however I want to use an actual redis client for the test
for more comprehensive integration testing. Explore options on how to do this so that
this setup can be abstracted away and reused in many cases
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Real Redis Integration Tests via Testcontainers]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>there are several test utils like starting a grpc server and starting a redis
testcontainers instance that can be put in shared for common use across all
microservices, do so
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Extract Reusable Test Utilities to Shared]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>now, while the sqlx-test macro is convenient to use it requires a running db instance
to run tests. I don't want a dependency on a running db managed outside of the testcode
in case these tests runs in CI. Now that we established that redis testcontainers work,
refactor the tests to use postgres testcontainers and remove the use of sqlx-test macro
and put the postgres testcontainers setup in the test utils and make it be used in the
integration tests for identity
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Replace #[sqlx::test] with Postgres Testcontainers — 82 tests]
</code></pre></div></div>

<h3 id="refinements-sessions-19-22">Refinements (Sessions 19-22)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>while we are in the early stages of the project, I want to set stricter roles using
enums. The roles should be Seller, Buyer and Admin. Make changes to the identity and
shared modules referring to role which is a String right now
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Refactor Role from String to Enum]
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the claude.md file is getting quite large, I can see that the information about the
shared module can be split up and put into a new claude.md file. Put the compact overview
of the shared module in the shared crate and update the shared module Claude.md so that
it reflects the most current code in the shared module. Point towards the shared module
Claude.md in the Claude.md in the root folder.
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>create a CLAUDE.md for the identity service module and make the root folder CLAUDE.md
reference it. Make sure the identity service module CLAUDE.md is compact
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I want you to look at the code so far (mainly identity, shared modules) and the various
scripts and other stuff created to create a summary. I wrote a blog post about using AI
and I want to add a summary about what was built so far.
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let's put the summary of what is there so far in a .md file, I will copy paste to the
blog at a later point. Now let's do the ADR and git tags as you have suggested.
</code></pre></div></div>

<h3 id="value-objects--validation-sessions-23-24">Value Objects &amp; Validation (Sessions 23-24)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/plan
in the identity microservice, there needs to be validation for username, password,
email and phone strings
email -&gt; research well known email regexes
password -&gt; research "strong" password regexes
phone -&gt; assume that we store country code with phone numbers, allow - characters
username -&gt; no empty strings and minimum 3 characters, no profanities (nice to have)
make value objects, parse the strings into valid value objects where the new function
in the struct impl validates with regex
write unit tests for these value objects
make sure when writing into the db that the strings are valid
create a new validated struct ValidUserReq that uses the value objects and replace the
use of UserCreateReq, UserUpdateReq
also update the password flows to use password value objects
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Implement the following plan: [Value Objects &amp; Input Validation — 4 value objects, ~35 unit tests]
</code></pre></div></div>

<h3 id="catalog-service-execution-sessions-25-26">Catalog Service Execution (Sessions 25-26)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/br
start work on bd-jx9 and close issues as you go
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make a task for router tests for products in br, implement the product router tests,
close the br issue
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>add following tasks to br
implementing pagination for listing product endpoints (there are several which just
return the entire list)
implementing caching for read endpoints
planning search engine implementation
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>to specify, I meant keyset pagination. update the bead about pagination
</code></pre></div></div>

<h3 id="claudes-observations-from-the-logs">Claude’s Observations from the logs</h3>

<ul>
  <li><strong>76 total user prompts across 26 sessions</strong> over roughly 2 days</li>
  <li>Most prompts are short (1-3 sentences). The longest ones are the plan pastes and initial requirements.</li>
  <li>The pattern is very consistent: <code class="language-plaintext highlighter-rouge">/plan</code> → review → paste plan into new session → implement → follow-up corrections</li>
  <li>Corrections tend to be brief and specific (“I meant keyset pagination”, “make it async fn”)</li>
  <li>I spent more time on testing infrastructure than I expected — sessions 15-18 are entirely about making tests self-contained with testcontainers</li>
</ul>

<hr />

<h2 id="git-tag">Git Tag</h2>

<ul>
  <li><code class="language-plaintext highlighter-rouge">v0.2-catalog-crud</code></li>
  <li>This is the new tag that has been create for the latest amount of code that got generated.</li>
</ul>

<hr />

<h2 id="whats-next">What’s next</h2>

<p>In Part 3, I want to push into the more complex services — starting with orders — and see if my predictions about LLM struggles hold up.</p>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[In Part 1, I built the identity service for Koupang and was surprised at how well Claude handled Rust and niche crates. Now I’m tackling the catalog service and experimenting with a task management workflow.]]></summary></entry><entry><title type="html">Getting Gud at LLMs Pt1</title><link href="https://jyc11.github.io/blog/2026/02/23/getting-gud-at-llms-pt1" rel="alternate" type="text/html" title="Getting Gud at LLMs Pt1" /><published>2026-02-23T00:00:00+00:00</published><updated>2026-02-23T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2026/02/23/getting-gud-at-llms-pt1</id><content type="html" xml:base="https://jyc11.github.io/blog/2026/02/23/getting-gud-at-llms-pt1"><![CDATA[<h2 id="background">Background</h2>

<p>Around this time last year, I was quite skeptical of LLM usage for programming. I do remember having a false sense of superiority that I programmed “the old way”. However, I kept my eye on the progress of LLMs and used them more at work while I was employed.</p>

<p>I originally used the web interfaces (ChatGPT, Gemini, Claude, Qwen, Deepseek) to one-shot bash scripts, Golang scripts or SQL that I couldn’t be bothered to write. Then I used the Canvas feature in Gemini to actually code/refactor certain features. My main programming use was providing samples of test code that I was happy with and getting the LLM to copy that format for other tests. Then eventually I caved and got the AI subscription from Jetbrains to use the LLMs within the IDE because copy-pasting to the browser was getting annoying.</p>

<p>After a while of deliberation, I decided to give this LLM thing an actual shot. Far too many people around me as well as people I’ve seen on Youtube have said that the industry is changing. From the 2 possible futures (software engineering industry changes fundamentally OR it doesn’t), I just decided to give in to what seems like changing tides. I set out to challenge my own assumptions — mainly that LLMs are bad at dealing with large complex codebases and are only good at basic programming tasks.</p>

<p>Another sentiment I kept hearing was from staff-level and above engineers (via Reddit and YouTube) that LLMs boosted their productivity by a lot and they almost no longer code by hand. I imagine that very experienced engineers deal with far more difficult, larger and nebulous problems that require a lot of thought, planning and task subdivision — which are exactly what you need to do to use LLMs effectively from what I’ve heard. Considering that I am quite early career and don’t deal with such difficult issues, I may not have such a large productivity boost with LLMs. So I figured the best way to understand why all these more experienced engineers are lauding LLMs was to try something larger and more difficult myself.</p>

<hr />

<h2 id="the-experiment">The experiment</h2>

<p>I wanted to build something simple first to get in the hang of things and I also wanted to use Rust for the heck of it. That became <a href="https://github.com/JYC11/workout-util">Workout Util</a> — I already knew that LLMs were good at SQL and basic forms/pagination stuff due to the likely abundance of its training data so this project was a breeze. My main thoughts/reflections of using LLMs are in the readme of that project.</p>

<p>Now that I was done with a simple example, I decided to do something more complicated: <a href="https://github.com/JYC11/koupang/tree/main">Koupang</a>, an ecommerce backend. I am reasonably familiar with the ecommerce domain, it’s a well established problem and I have worked on ecommerce backends so I decided to tackle this with Rust as well.</p>

<p>To facilitate a much larger use of LLMs, I decided to just pay money and use the <strong>Claude Code Max</strong> plan and work through the CLI. I originally scaffolded the entire project with Qwen (web interface), shoved that in a markdown file and then got Claude and started working on it.</p>

<hr />

<h2 id="what-surprised-me">What surprised me</h2>

<p>And so far… Claude seems to be doing really well. The first thing I worked on is the identity microservice. Arguably, the project is still small and identity/auth is a well established topic as well so the LLM would be good at it. But I am incredibly optimistic about this project because the use of Claude Code allowed me to make ridiculously fast progress despite only using Claude Code in a basic way. I am excited to see where this project goes and how the LLM can handle larger codebases.</p>

<h3 id="specific-points-where-i-was-surprised">Specific points where I was surprised</h3>

<ul>
  <li>I assumed LLMs would be poor at using Rust due to there not being a large amount of Rust training data… I was wrong?
    <ul>
      <li>To be fair, I am using quite basic Rust and nothing too crazy</li>
    </ul>
  </li>
  <li>The code produced by LLMs is not crazy spaghetti
    <ul>
      <li>It’s not perfect code and I have to suggest better abstractions and pulling code out into the common shared package often but that’s fine</li>
    </ul>
  </li>
  <li>It’s surprisingly good at handling niche crates like testcontainers and tonic/prost, which I’m pretty sure don’t have a huge amount of usage data. My assumption about lack of training data is being slowly chipped away.</li>
  <li>I did barely any coding by hand.</li>
</ul>

<hr />

<h2 id="claude-code-setupusage-notes">Claude Code setup/usage notes</h2>

<p>Nothing fancy yet and no multiple sub-agents:</p>

<ul>
  <li>Just basic prompting through the CLI after using <code class="language-plaintext highlighter-rouge">/plan</code> mode</li>
  <li>Often restarting sessions to clear context</li>
  <li>Trying to get the LLM to use beads rust but it doesn’t really seem to be listening</li>
  <li>Being conservative with Bash script permissions just in case</li>
  <li>Being quite strict about human in the loop and reviewing the code it generates</li>
  <li>Scoping tasks quite tightly so context doesn’t bloat</li>
</ul>

<p><strong>Plugins:</strong></p>

<ul>
  <li><a href="https://github.com/actionbook/rust-skills">rust-skills</a>
    <ul>
      <li>Kinda freaky how easy it was from the Claude Code CLI to add this considering prompt injection risks</li>
      <li>Made sure to read through the skills to ensure they’re not malicious but the more I use Claude Code, it’s likely the more lax I will become</li>
    </ul>
  </li>
</ul>

<p><strong>Skills:</strong></p>

<ul>
  <li>Create Skills meta skill from Anthropic</li>
  <li>Beads Rust skill that I made Claude create</li>
</ul>

<hr />

<h2 id="why-write-about-it">Why write about it</h2>

<p>Even as I heard good things about using LLMs for projects, I was never really able to see the outcomes. Most people work on closed source codebases at their jobs, so I couldn’t see the actual code being produced or how they had their tools set up — the prompting strategies, the configurations, the actual workflow. So this blog post is my attempt at showing the process for others who may be skeptical or curious. If there are transparent examples (actual complex codebases and well documented workflows), please let me know. I am eager to learn.</p>

<hr />

<h2 id="progress-log">Progress Log</h2>

<p>I’ve also started keeping ADRs (Architecture Decision Records) in the repo to capture the why behind technical choices, and tagging milestones like <code class="language-plaintext highlighter-rouge">v0.1-identity-auth</code> so I can easily diff what changed between blog posts.</p>

<hr />

<h2 id="whats-next">What’s next</h2>

<p>In Part 2, I want to push Koupang further, I plan to work on the catalog service next, and see if the LLM can keep up as the codebase grows. Stay tuned.</p>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[Background]]></summary></entry><entry><title type="html">Is this working?</title><link href="https://jyc11.github.io/blog/2024/11/19/test" rel="alternate" type="text/html" title="Is this working?" /><published>2024-11-19T00:00:00+00:00</published><updated>2024-11-19T00:00:00+00:00</updated><id>https://jyc11.github.io/blog/2024/11/19/test</id><content type="html" xml:base="https://jyc11.github.io/blog/2024/11/19/test"><![CDATA[<p>Very original and interesting take here.</p>

<h2 id="subheading">Subheading</h2>

<ul>
  <li>Bullet points</li>
  <li>More points</li>
</ul>

<ol>
  <li>Numbered lists</li>
  <li>Work too</li>
</ol>]]></content><author><name>Jaeyoon Cho</name></author><category term="blog" /><summary type="html"><![CDATA[Very original and interesting take here.]]></summary></entry></feed>