So yeah, I vibe-coded a log colorizer—and I feel good about it – Ars Technica

In December, Lee Hutchinson used an LLM-assisted workflow to build a small Python log colorizer while investigating an intermittent caching bug on Space City Weather, his colleague Eric Berger’s WordPress site hosted on an AWS EC2 t3a.large instance and fronted by Cloudflare. The custom tool—a roughly 400-line single-file script created with Claude Code—helped reveal a race condition: Apple News retrieval bots sometimes requested newly published posts before Discourse had attached comment threads, and Cloudflare cached that transient state. A short mu-plugin that forces “do not cache” headers until Discourse confirms attachment mitigated the symptom; the colorizer delivered the definitive root-cause evidence.

Key takeaways

  • The site: Space City Weather (self-hosted WordPress + Discourse), serving about 20,000–30,000 unique visitors per day, running on an AWS EC2 t3a.large instance behind Cloudflare APO.
  • The investigation produced a bespoke Python log colorizer (~400 lines) written with Claude Code over two sessions; initial working output arrived in minutes, with most time spent iterative tweaking.
  • The colorizer supports multiple log formats, 256-color ANSI output, IPv4/IPv6 filtering, column-aligned host/IP display, configurable status/cache coloring, and per-IP highlight rules.
  • Root cause: a race condition where AppleNewsBot GETs sometimes requested a freshly published page before Discourse had created and attached the comment thread; Cloudflare then cached that incomplete page.
  • Symptom mitigation: a small mu-plugin forces “DO NOT CACHE ME” headers until Discourse reports the comment thread is attached; this prevents stale edge caches.
  • Agentic LLMs accelerated the build but required careful scoping and human oversight; they suggested performance optimizations but cannot replace domain knowledge.

Background

Space City Weather uses WordPress as the CMS and Discourse for comments via the WP-Discourse plugin; Discourse was integrated in August 2025. The site sits behind Cloudflare, using Cloudflare’s Automatic Platform Optimization (APO), which can cache full pages at the edge to reduce origin load. Cloudflare’s WordPress plugin usually triggers cache invalidation on new posts, but an intermittent failure pattern emerged: occasionally, newly published posts were cached without Discourse’s comment area, showing the old WordPress comment block instead.

Intermittent bugs of this kind are notoriously hard to reproduce. The issue appeared sporadically—dormant for long stretches, then recurring in clusters of days—making root-cause analysis difficult without detailed, real-time log inspection. Existing tools (ccze, lnav, Splunk) can colorize or parse logs, but off-the-shelf options lacked the precise customizability Lee wanted, so he turned to an LLM-assisted coding workflow to build a tailored, production-safe colorizer.

Main event

Working in VS Code and feeding sanitized access.log examples into an agentic LLM (Claude Code), Lee asked for an efficient, readable solution and prioritized low runtime overhead because the tool would run live on the server. The LLM recommended Python for robust regex support and produced a working colorizer quickly; the project remained contained within the LLM’s context window, which simplified iteration. Early builds were fast; the bulk of time went into refining colors, column alignment, IPv4/IPv6 filtering, status/cache highlighting, and ensuring efficient regex and loop behavior.

With the colorizer tailing Nginx access logs in real time, Lee watched a short sequence—about a dozen seconds—that made the failure mode obvious. The log snippet showed Eric’s POST to publish, Discourse’s callbacks creating and attaching a comment thread, and then two rapid GETs from AppleNewsBot immediately after publish. On occasions when AppleNewsBot’s GETs reached the site before Discourse had completed its attachment, Cloudflare cached the prematurely served HTML that lacked Discourse comments. When that happened, hundreds of visitors saw the cached page with the old WordPress comment area until the cache was manually purged or expired by APO rules.

Lee had previously implemented a mu-plugin that sets temporary “no-cache” headers on new posts until Discourse reports completion; that preventative fix masked the underlying race. The colorizer supplied the missing proof: the timing fingerprints in the access log showing AppleNewsBot arriving in the narrow window between publish and Disourse completion. Once the sequence was clear, the mu-plugin was validated as an effective mitigation for the symptom, and the true cause—a timing-based race—was documented.

Analysis & implications

This episode illustrates two linked themes: first, well-scoped LLM assistance can lower practical barriers for non-developers to prototype and iterate quickly; second, LLMs are not a substitute for domain expertise. The colorizer was the right-sized tool for Lee—small, auditable, and contained within the model context—enabling him to surface a precise timing artifact that manual spot-checks had missed. The quick feedback loop between developer intent and LLM output made iterative refinement pleasurable and productive.

But agentic LLMs have limits. They will dutifully fulfill underspecified prompts, and they do not autonomously know constraints you do not state. In this case, the LLM produced a horizontal-scroll viewport tool that conceptually worked but imposed unacceptable CPU load on the server; addressing that required deeper performance engineering and ultimately a shift of workload to a more capable client machine. That sequence underscores how LLMs can take you quickly to a working prototype but not necessarily to a production-grade, resource-optimal system without competent human oversight.

Operationally, the case reinforces best practices: instrument and monitor (logs are canonical evidence), prefer defensive defaults around caching, and treat edge-caching behavior as part of the deployment model for any content pipeline that includes third-party distributors (Apple News, social platforms, etc.). For organizations, the lesson is pragmatic: LLMs accelerate work for those who already understand their problem space; they can create a misleading sense of capability for users who cannot validate outputs.

Comparison & data

Tool Customizability Typical effort Performance cost
ccze Low–medium (config files, legacy regex) Moderate to high to customize Low
lnav Medium (built-in parsers, filters) Low to medium Low–medium
Custom Python colorizer High (fully bespoke) Small script (~400 lines), iterative tuning Low if optimized; risk if additional UI features added

The table summarizes pragmatic trade-offs Lee considered. Off-the-shelf tools reduce build time but may not meet special-case visualization needs; a small custom script gives maximum control but requires iterative tuning to avoid unintended load or complexity.

Reactions & quotes

Lee and his colleagues framed the experience as both empowering and cautionary. Two distinctive short excerpts capture the mood and the technical realism:

“LLMs are not the Enterprise-D’s computer—powerful for prompts, but not a substitute for explicit domain constraints.”

Lee Hutchinson (author)

“The bottleneck is screen redraw plus full-width scans on each scroll—zero-CPU impact isn’t achievable; low-impact mitigations are possible.”

Claude Code (LLM, paraphrased)

Unconfirmed

  • Whether rate-limiting or configuration options in the Apple News plugin could be tuned to delay retrieval long enough to prevent the race remains unverified without controlled tests.
  • The precise distribution of how often AppleNewsBot beats Discourse varies by publish-time conditions and server load; exact frequency was observed informally but not quantified with long-term telemetry.

Bottom line

Building a focused diagnostic tool with LLM assistance can be an efficient route to evidence that clarifies intermittent production bugs—so long as you scope the task tightly and maintain the ability to validate outputs. In this case, a 400-line Python colorizer written with Claude Code surfaced a timing fingerprint that explained a months-long intermittent caching problem; the symptom is now mitigated with a small mu-plugin that enforces conservative caching until Discourse confirms thread attachment.

However, the episode is also a reminder of limits: agentic LLMs reduce friction but do not replace understanding. They excel when guided by humans who can judge correctness and adapt direction; they struggle when given vague goals or when the human cannot validate the system-level trade-offs being made. Treat LLMs as accelerants, not autopilots—use them, but instrument, test, and retain operational guardrails.

Sources

Leave a Comment