← all posts

LoopKit — your first loop engineering starter kit

After 15 versions of kompress, 8 teachers, 4 data sources, 3 architectures, and one council that said RETRAIN three times in a row — we extracted the pattern.

It's called LoopKit. And it's yours.


What is LoopKit?

A monorepo starter kit for building self-improving systems. The same four-phase loop that produced every kompress model — plan, execute, evaluate, decide — wrapped in a box that anyone can clone and extend.

git clone https://github.com/peterlodri-sec/loopkit
cd loopkit
python -m loops.hello.loop
# 5 iterations. SHIP. You just ran your first loop.

▶️ Open in Colab


The pattern that produced 15 models

Every kompress model was one iteration of a loop:

Version What we tried Heretic Decision
v2 0.975 Baseline established
v4 Self-labels 0.943 Override internalized — ship
v6 Agent-distribution 0.962 Dead end — pivot
v8 Qwen2.5 teacher 0.955 Production — ship
v9 C3-only 0.921 Overfit — retrain with diversity
v11 Larger encoder 0.906 Capacity ≠ precision — pivot
v14 Council training 0.882 Concept proven — retrain

Each row is plan → execute → evaluate → decide. The outer loop — the decision about what to try next — was us: a human and an AI agent, reviewing results, brainstorming ideas, launching experiments.

LoopKit automates the outer loop so you can scale it.


What's in the box

loopkit/
├── GUIDE.md              ← The full guide (tiered: quick start → deep dive)
├── README.md             ← You are here
├── pyproject.toml
├── bot/                  ← Telegram bot (your outer loop operator)
│   ├── main.py           ← /new, /run, /decide, natural chat
│   ├── memory.py         ← SQLite — remembers across restarts
│   └── council.py        ← LLM reviews results, suggests next actions
├── loops/
│   ├── base.py           ← Abstract Loop class
│   ├── hello/            ← Minimal example (5 lines of logic)
│   ├── template/         ← cp -r template myloop → start building
│   └── kompress/         ← The full 15-model pipeline
├── concepts/             ← Reference implementations
│   ├── self_labeling.py
│   ├── evaluator_optimizer.py
│   └── council.py
├── evals/
│   └── heretic.py        ← Portable adversarial benchmark
└── notebooks/
    └── loopkit_hello.ipynb  ← Colab-ready

The Telegram bot — your outer loop in chat

User: /new kompress-v15
Bot:   ✅ Created loop kompress-v15.

User: /run kompress-v15
Bot:   🔄 Running kompress-v15...
       ✅ kompress-v15-001 complete
       Results: heretic 0.961, keep_rate 0.85
       Decision: Council says SHIP 🚀

User: /history kompress-v15
Bot:   📜 kompress-v15 — 1 experiment
       🚀 v15-001: ship — "Beats v8 (0.955), ready to deploy"

User: my model regressed, what should I try?
Bot:   Regression happens! Here's what I'd check:
       1. Label quality — is your teacher too aggressive?
       2. Data diversity — are you mixing in generic data?
       3. Epochs — 3 was the sweet spot, more = overfitting

The bot remembers everything in SQLite, uses an LLM council (GLM-5.1 by default, configurable to anything), and falls back to heuristic rules when no LLM is available. It's the outer loop operator — the thing that asks "what next?" and then executes it.


Group Engineering meets Loop Engineering

Anthropic's Engineering Groups of AI Agents (June 2025) describes how multiple agents collaborate in structured groups. Loop engineering is the temporal version:

Group Engineering Loop Engineering
Multiple agents collaborate in parallel Multiple iterations build on each other
Coordinator delegates tasks Council decides next experiment
Agents have specialized roles Each iteration has a hypothesis
Results merge into a solution Results converge toward a target

The council is the coordinator. The loops are the agents. Time is the orchestrator.


Patterns you can use today

The GUIDE.md documents five patterns extracted from the kompress loop:

  1. Self-Labeling — model labels its own training data
  2. Evaluator-Optimizer — stronger teacher corrects student's mistakes
  3. C3 Self-Distillation — Collect → Curate → Compress on real-world data
  4. Council — LLM reviews results and decides what to try next
  5. The Loop Pattern — combine all four into a self-improving pipeline

Each has a reference implementation in concepts/. Each is production-tested on real models.


The Loop Engineering Ecosystem

LoopKit didn't emerge in a vacuum. It's part of a growing movement sparked by Addy Osmani's Loop Engineering essay — the canonical text that defined the 5 building blocks every loop needs: automations, worktrees, skills, plugins, sub-agents, and memory.

Here's how the pieces connect:

Piece What it is Link
Addy Osmani's essay The canonical text — 5 building blocks + memory, practical patterns addyosmani.com
Cobus Greyling's reference impl npm tools (loop-audit, loop-init, loop-cost), 7 patterns, pattern picker, goal engineering github.com/cobusgreyling
LangChain: The Art of Loop Engineering 4 stacked loops (Agent → Verification → Event-Driven → Hill Climbing), "loopcraft" langchain.com
LoopKit (this post) Python-native starter kit, Telegram bot, council, Colab notebook, kompress case study github.com/peterlodri-sec

The key quotes that drive this:

"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." — Peter Steinberger

"I don't prompt Claude anymore. I have loops running that prompt Claude. My job is to write loops." — Boris Cherny (Head of Claude Code, Anthropic)

The 4 stacked loops (LangChain)

LangChain's framework describes loop engineering as stacking four levels of loops:

  1. Agent Loop — model calls tools until done (LoopKit: Loop.run())
  2. Verification Loop — grader checks output, retries on failure (LoopKit: loops/verification/)
  3. Event-Driven Loop — webhooks/cron trigger agents (LoopKit: Telegram bot)
  4. Hill Climbing Loop — analysis agent reviews traces, rewrites the harness (LoopKit: Council + Ralph)

The insight from level 4: the return arrow "reaches inside and updates the agent loop directly." The loop that watches the loop that watches the loop. Meta-stability through recursion.

7 battle-tested patterns (Cobus Greyling)

Cobus's reference implementation includes 7 production patterns with real win/failure stories:

Every pattern follows the same loop: discover → plan → execute → verify → ship.

Interactive Docs Site

We built a single-page guide site for LoopKit — same style as cobusgreyling.github.io/loop-engineering. It includes the full loop pattern visualization, kompress results table, ecosystem links, 5 loops, 7 production patterns, and the Telegram bot setup — all on one page.

→ loopkit docs


The Full Stack: How LoopKit + Cobus Greyling Work Together

LoopKit and Cobus Greyling's loop-engineering are complementary. Here's how they fit:

Layer Cobus Greyling LoopKit
Audit & Planning loop-audit — scores loop readiness, suggests improvements Council — LLM reviews results, decides next action
Scaffolding loop-init — scaffolds from 7 proven patterns loops/template/cp -r template myloop
Cost Estimation loop-cost — estimates token spend before running Budget tracking in state.json
Execution Grok/Claude Code/Codex native loops Python Loop.run() — plan→execute→evaluate→decide
Persistence Markdown files (LOOP.md, STATE.md, loop-run-log.md) SQLite + state.json per loop
Monitoring GitHub Actions, loop-audit dogfood Ralph Loop — loop watching loops with OpenTelemetry
Sharing Stories directory — real wins + failures HuggingFace Datasets — experiment history as queryable datasets

Use Cobus's tools to plan and audit your loops. Use LoopKit to run and scale them. Together they form the complete loop engineering stack.

Cobus's key patterns we've adopted:


Why this matters

Most ML experimentation is ad-hoc. You try something, get a result, and think "what next?" The loop pattern makes it systematic. Every experiment has a hypothesis. Every result has a decision. Every decision feeds into the next plan.

LoopKit gives you the scaffolding. You bring the idea. The loop does the rest.

GitHub: peterlodri-sec/loopkit Guide: GUIDE.md Colab: loopkit_hello.ipynb Models: PeetPedro on HuggingFace The kompress story


This post is part of the LoopKit project. See also: the kompress heretic eval, all kompress models on HuggingFace, the ultrawhale training repo, and headroom.

d74b0b54b33fbf728b761972e370e02e