The AI Productivity J Curve Is Real. Are you Measuring the Wrong Side of It?

A lot of leaders are still talking about AI productivity as if it should show up like a clean line on a dashboard. Turn on copilots. Generate more output. Ship faster. Save money.

That is not how this plays out in real organizations.

AI creates a J curve for productivity. You often get a burst of visible activity first, then a messy dip in how the system actually performs, and only after that do you get durable gains. The companies that understand this will build an advantage. The ones that do not will spend the next 18 months confusing motion with progress.

The mistake is simple. Most executives measure the first thing AI increases, which is output. More code. More tickets closed. More documents drafted. More options generated. But output is not the same as throughput, and it is definitely not the same as business value.

That distinction matters more than ever.

In engineering, AI usually improves local work before it improves the full delivery system. Google Cloud’s DORA research is one of the clearest signals here. It found that increased AI adoption was associated with better documentation quality, better code quality, and faster code review and approval speed. At the same time, it also found lower delivery throughput and lower delivery stability. It even found a strange human wrinkle: developers reported more flow, job satisfaction, and productivity, while also reporting less time spent on valuable work and no meaningful reduction in toil.

That is the J curve in one paragraph.

AI makes parts of engineering feel better before it makes engineering work better as a system.

Why? Because the first thing AI scales is change volume. Teams can produce more code, more experiments, more drafts, more branches, more ideas. But if your architecture is brittle, your test strategy is weak, your review standards are inconsistent, or your release discipline is soft, then AI does not create leverage. It creates inventory. And inventory in software is not an asset. It is latent risk.

This is why so many teams feel faster while their organizations feel less stable. The team is not imagining the gain. The gain is real. It is just local. The loss is systemic.

Product management goes through its own version of the same curve, and I would argue it is even more misunderstood.

Many leaders assume AI will make product managers better because it can help write PRDs, summarize research, cluster feedback, generate roadmaps, and speed up discovery artifacts. All true. But that is not where product management earns its keep. Product management earns its keep in judgment. What market are we really in? Which problem is actually worth solving? Which customer signal matters? What tradeoff are we willing to own? What do we refuse to build?

AI expands the option set before it improves the decision quality.

That sounds helpful, until you live with it. Suddenly teams have ten plausible strategies instead of three. Every planning cycle becomes easier to start and harder to finish. More narratives sound credible. More prototypes look good enough. More weak ideas survive longer because AI can polish them into something that appears strategic.

This is exactly why the BCG experiment on generative AI remains so useful. Participants working with GPT-4 on tasks inside the model’s frontier of competence performed much better. But on business problem solving outside that frontier, they performed worse than the group without AI. The lesson is not that AI is good or bad. The lesson is that performance depends heavily on the task shape, and leaders consistently overestimate where that frontier actually is. (BCG Global)

That should hit product leaders especially hard.

AI is excellent at helping teams produce artifacts around strategy. It is much less reliable at making the strategic call itself. If your product organization cannot distinguish between insight generation and judgment, AI will make your roadmap noisier before it makes it better.

The business layer of the J curve is where this becomes expensive.

Early on, AI often creates the appearance of efficiency because it raises the floor. Less experienced people get better faster. Routine work gets cleaner. Ramp time compresses. That is real value. The well-known field study on generative AI in customer support found a 15 percent productivity lift overall, with much larger gains for less experienced and lower-skilled workers. In some cases, newer workers with AI support performed more like tenured workers. But the same study also found little benefit for the highest-skilled workers, and some evidence of quality decline for the very best.

That pattern shows up everywhere. AI is often strongest as a capability diffuser before it becomes a capability amplifier.

In business terms, that means AI can help standardize execution, reduce onboarding time, and lift the median performer relatively quickly. But it does not automatically produce elite decisions, better portfolio bets, or sharper operating discipline. In fact, it can increase the load on your strongest people because they become the reviewers, exception handlers, evaluators, and system designers for a larger volume of machine-generated work.

So the executive question is not, “How much labor can we remove?”

The better question is, “Where will work be redistributed before value is compounded?”

That is the middle of the J curve, and it is where many AI programs stall. Engineering leaders discover they need stronger architecture and test discipline. Product leaders discover they need tighter problem framing and evaluation criteria. Business leaders discover that governance, decision rights, and operating metrics have to change before savings become durable.

This is why I think the most important AI productivity metric over the next two years will not be raw output per person. It will be how fast an organization can turn AI-generated activity into trusted decisions and reliable outcomes.

That is a very different game.

If you are leading through this transition, the practical takeaway is simple. Do not promise a straight line. Expect the dip. Plan for the dip. Measure the system, not just the artifact count. In engineering, watch stability, defect escape, and cycle time, not just code volume. In product, watch decision quality, not just document velocity. In the business, watch margin contribution after review cost, rework, and coordination overhead, not just time saved on a single task.

The winners will not be the companies with the most AI output. They will be the companies that redesign management, product, and engineering around AI’s actual behavior.

That is the real J curve.

The upside is not in using AI to do the same work faster. The upside is in surviving the messy middle long enough to build a better operating model on the other side.