Going AI Native, From Execution to Governance
Why the Software Development Lifecycle is becoming the Value Delivery Lifecycle
I built something recently that was surprisingly delightful.
It’s a simple GitHub integration. Open an issue, and Claude automatically triages it, deciding whether it needs clarification or contains enough detail to implement. The outcome is either a mini spec with open questions or a working pull request. My role in this loop isn’t writing code, but deciding what gets merged. Though I still open the issue—I’m still the one initiating.
What’s coming next is stranger: systems that don’t wait for the issue to be filed. They watch how users struggle, infer what’s missing, and make the fix. The human role shifts again, from initiation to governance.
When code stops being the bottleneck, a lot of what we built our organizations around stops making sense. And this shift won’t be incremental.
The Bottleneck Has Moved
For decades, engineering time was the constraint software teams planned around. We built entire methodologies around this scarcity. Backlogs were prioritization queues for limited developer time. Sprints were commitment devices to protect focus. Story points were a currency for negotiating what could fit into the constraint. The whole apparatus assumed that building was the hard part.
That assumption is evaporating.
The theory of constraints tells us that optimizing anything other than the bottleneck is a waste. When code generation becomes cheap—and it’s getting cheap fast—the bottleneck doesn’t disappear, but relocates. The constraint shifts from “what can we build” to “what should we build” and, more pressingly, “what should we release.”
Consider what happens when you can auto-generate a working POC from a feature idea mentioned in Slack. The PM role doesn’t disappear, but it transforms. It becomes almost editorial—deciding what makes contact with users, not what gets approved for development. Evaluation speed matters more than execution speed
This has implications for how teams are structured. There will be fewer pure coders and more editors. Value will shift toward people with product sense, taste, and architectural judgment. The mythical “10x coder” gives way to something like a “10x clarifier”—someone who can specify intent precisely enough for generation and evaluate output quickly enough to keep pace.
I suspect most organizations aren’t ready for this. In practice, most organizations handle judgment today through people, not systems: senior engineers, PMs, and leaders applying tacit standards case by case. The backlog, in its current form, becomes a judgment queue. And judgment doesn’t scale the same way execution does.
The Explore/Exploit Bifurcation
There’s a useful framework from organizational theory—the explore/exploit tradeoff—that becomes newly relevant here. In exploit mode, you’re optimizing a known system. In explore mode, you’re searching for new possibilities without clear signal about what’s valuable. Most product work sits somewhere on this spectrum.
These two modes will require fundamentally different operating models.
Exploit mode collapses into automated loops. When you’re optimizing against known metrics—conversion rates, engagement, error rates—the feedback signal is already defined. Agents can observe user behavior, generate hypotheses, implement fixes, and measure results. Humans set the frameworks and criteria; agents apply them continuously. Traditional PM work in this mode dissolves into the system itself.
Explore mode stays human-intensive, but augmented. Here, there’s no user signal to optimize against—you’re inventing the criteria. This looks more like research or venture incubation. Higher tolerance for ambiguity. More room for intuition and taste.
The military’s OODA loop offers a useful lens of comparison. In exploit mode, the “Orient” phase can be largely pre-set, encoded in frameworks and principles. The loop runs without humans. In explore mode, Orient is still the work—constantly reframing, building the worldview, deciding what matters. You can’t automate orientation when you don’t know what you’re orienting toward.
The messy implication is that most organizations will need to run both modes simultaneously, with different processes, different metrics, and possibly different people. That’s hard. And I think it explains why the transition will be rougher for some companies than others. This is a classic Innovator’s Dilemma pattern: organizations optimized for execution struggle when exploration becomes the constraint.
Building the AI-Native Organization
“Cloud native” wasn’t just about where your code runs. It changed the economics of software delivery, favoring microservices, CI/CD, and DevOps as a discipline. An organizational shape then evolved in response to those new economics—not all at once, but iteratively, as teams discovered what the new infrastructure made possible and what it demanded.
“AI native” will follow a similar pattern.
The shift isn’t “we use Copilot” or “we have agents in production.” It’s the transition from Software Delivery to Value Delivery—where every well-specified intent can get implemented, and human judgment relocates to where it actually matters. In exploit mode, that’s downstream: not “can we build this” but “what does releasable look like.” In explore mode, it’s upstream: “what’s worth trying before we have signal.”
Code becomes an implementation detail. The organization restructures around judgment at both ends. Here’s what else will change:
Product work. The real human output becomes the scoring criteria, the architectural principles, the articulation of “what good looks like.” These artifacts describe what value needs to be delivered; code is generated downstream from them. It’s analogous to investment theses or editorial style guides—they govern what gets through. The frameworks become the product, in a sense.
Process. Continuous governance replaces periodic planning. The stage-gate methodology gets repurposed: gates protect integration cost, user attention, and product coherence—not build cost. Automated checkpoints run continuously against codified criteria. The Software Delivery Lifecycle becomes one step in a broader Value Delivery Lifecycle that manages flow from intent to user impact.
People. Hiring shifts toward product sense, taste, and architectural judgment. Explore roles look more like researchers or venture investors,with a tolerance for high ambiguity and the ability to pattern match across domains. Exploit roles look more like editors or portfolio managers—people who can consistently apply defined criteria under time pressure.
The Context Window Problem
There’s a practical constraint that makes all of this concrete: the context window is effectively the new bottleneck on judgment.
It started with code. Which files? Which functions? What’s relevant to this task? As AI capabilities have grown, so has what fits in the window. But the context window will keep expanding, and it will start to include things that used to live only in someone’s head. Product principles. Architectural criteria. The judgment calls that a senior engineer makes instinctively.
In exploit mode, that context will be what governs the automated loops. In explore mode, articulating it clearly enough to include will be the work itself.
Either way, the implication is the same: what stays tacit stays outside the window. And what’s outside the window won’t scale.
This is probably somewhat obvious once you say it out loud, but I think it has real consequences for how teams operate today. The organizations that will thrive in an AI-native world are the ones building the muscle now—getting explicit about their product principles, their quality criteria, their architectural constraints. Not because agents need documentation (they’ll work with whatever you give them) but because you need to know what you actually believe in order to govern what gets built without your direct involvement.
The Problems Point the Way
This is all highly speculative territory. Did anyone in 2010 fully anticipate what “cloud native” would mean by 2025? The organizational implications emerged through experimentation, failure, and iteration.
And today, people are experiencing real problems. AI slop. Hidden bugs that pass cursory review. A creeping sense that something is being lost when you’re no longer close to the code. The idea of more autonomy—hands-free coding, agents shipping without human initiation—can feel somewhere between reckless and absurd given what teams are actually dealing with.
I don’t think that’s wrong, exactly. But I think it’s pointing at the wrong problem.
The slop and the hidden bugs aren’t evidence that we should pump the brakes on autonomy. They’re evidence that we’ve been scaling generation without scaling governance. We got better at producing code before we got better at evaluating it. The bottleneck moved and we didn’t move with it.
Which means the path to healthy autonomy runs directly through what we’ve been talking about here. You don’t get there by having humans manually review every line of code indefinitely. That doesn’t scale, and it’s not where humans add the most value anyway. You get there by building the judgment infrastructure: the explicit criteria, the architectural guardrails, the product principles clear enough that they can govern without you in the loop.
The variance in what teams can do right now is enormous. Some are already running autonomous loops. Others are drowning in slop. But what separates them is rarely the AI capability. It’s whether the team knows what good looks like, and whether that understanding of ‘good’ is shared and explicit, rather than living only in individual intuition.
We’re moving from Software Delivery to Value Delivery. Software becomes an implementation detail, and the organizations that thrive will be the ones structured around judgment, curation, and coherence.
The problems people are experiencing today aren’t a detour from that future. They’re the rough edge of the transition. And the only way through is to build the muscle that’s been missing all along.





love seeing Daniel here!
This piece realy made me think about your previous discussions on how AI redefines our roles. With the bottleneck shifting to 'what should we release', do you see this inevitably leading to more ethically-driven development?