Back to blog

CI Is the Bottleneck Now

Agentic development shifts the quality gate entirely onto CI. That's a good instinct — but most CI pipelines weren't built for the throughput that's coming.


Something shifted in the last year that most teams haven't fully processed yet.

It used to be that the bottleneck in shipping software was writing code. A developer would think, plan, write, revise — and the pace of that loop set a natural ceiling on how fast any codebase could change. CI existed to catch mistakes that slipped through, and it was sized accordingly. Pipelines taking twenty or forty minutes felt reasonable because humans don't push code every ten minutes.

That's no longer the constraint.

The throughput problem

AI coding agents can produce meaningful code continuously. A single developer running an agent loop can generate, test, and iterate on changes at a pace that would have required a team six months ago. Early adopters aren't cutting corners on quality to do it — they're being thoughtful, reviewing diffs carefully, treating the agent's output like they'd treat a contractor's work. But the sheer volume of code moving through review, into CI, and toward production has changed.

The math is uncomfortable. If your pipeline takes forty-five minutes and your agents are proposing changes every twenty, you are pipelining work into a system that can't keep up. The agents idle. The feedback loops lengthen. The value that made agentic development attractive — tight iteration cycles — gets eaten by CI queue depth.

CI as the last gate

There's a reason teams lean harder on CI as they adopt AI tooling. When code is produced faster than any human can read every line, tests become the primary mechanism for knowing whether a change is correct. Not just integration tests, but the whole suite: unit tests, linting, type checking, security scans, end-to-end flows. The instinct to trust CI more is exactly right.

This puts CI in a position it was never designed to hold. CI was built assuming that the humans were the check and the pipeline was the backstop. Now CI is increasingly the check, and there is no backstop behind it. Flaky tests that were a minor annoyance before are now actively dangerous — not because they let bad code through, but because they inject noise into the signal that agents and developers depend on to know whether something worked. A test that fails one in ten runs means that one in ten agent iterations produces ambiguous results. At scale, that's a lot of wasted cycles and a lot of second-guessing.

Slow pipelines are worse. A forty-five-minute feedback loop doesn't just slow humans down — it breaks the planning horizon for an agent entirely. Agents work best when they can observe the result of a change and decide what to do next. A pipeline that takes as long as a meeting is a pipeline that turns agentic development into a batch job.

The visibility gap

Here's what makes this hard: most teams can't tell you why their CI is slow.

They know the number — forty-five minutes, thirty, twenty-two. They may have a rough sense that "the test suite is slow" or "something in the build step takes a while." But they don't have what they'd expect from any other part of their infrastructure: a trace, a timeline, a view of what was actually running on the machine and when.

This mattered less when CI was a backstop. If the pipeline took forty-five minutes, that was frustrating but bounded. You could tolerate a certain amount of mystery because the thing mostly worked and humans weren't waiting on it continuously.

Now it's the critical path. If CI is the primary quality gate for agentic development, then CI is production — and you should treat it like production. You should know which jobs are slow and why. You should know when something regressed. You should know whether your test suite is getting slower week over week before it becomes a crisis.

You should know, specifically, which job is the bottleneck.

What this looks like in practice

Teams that are shipping well with agentic workflows tend to have a few things in common.

They've trimmed flaky tests ruthlessly. Not because they have lower standards — because they understand that flakiness is a multiplier on iteration cost. A 5% flake rate at high throughput is a sustained tax on every agent loop.

They've parallelized and partitioned their test suites. A suite that runs in eight minutes per shard beats a suite that runs in forty minutes sequentially, even if the latter has higher machine utilization. Latency matters more than throughput when the feedback loop is the bottleneck.

They monitor CI the way they monitor their API. They have dashboards. They know their P90 build time. They notice when something changes.

Most importantly: they've made CI fast enough that agents can actually close a feedback loop within a reasonable window. The exact number depends on context, but somewhere around ten minutes seems to be where "fast enough to iterate" and "slow enough to be nervous about" diverge. Above that, you're batching. Below it, you're flowing.

The harder question

None of this is a knock on AI tooling. The throughput gains are real and the teams using them well are genuinely moving faster. The issue is that CI hasn't kept up — not in speed, and not in observability.

The hard question is whether that gap is intentional. Most CI pipelines were built incrementally, adding jobs as new checks were needed, without much architecture for performance. Nobody sat down and designed a pipeline for high-throughput agentic development because that wasn't the use case when the pipeline was built. It's the use case now.

Redesigning a CI pipeline is uncomfortable work. It requires you to actually understand what's in it — which steps matter, which are vestigial, where the time is actually going. That understanding requires visibility that most teams don't have.

CI observability isn't a nice-to-have anymore. It's the prerequisite for answering the question that agentic development puts squarely in front of every engineering team: can your quality gates keep up with the rate you're trying to ship?

If you don't know where your pipeline is slow, you can't make it faster. And if you can't make it faster, you're leaving most of the value of agentic development on the table.