We witnessed the SaaS market stumble in April 2026. AI agents inside real business workflows now fail roughly one attempt in three, per the Stanford HAI 2026 AI Index. We built this hole ourselves. I reckon the way out is a shift from vibes to specs.
We will cover:
- Why vibe coding silently accumulates cognitive debt.
- What spec-driven development (SDD) looks like in practice: Constitution, specs, acceptance criteria.
- A worked example migrating a dbt model with a spec plus a test loop.
- Why the April 2026 news cycle (the Copilot Studio CVE, the Surface price shock, the agent failure numbers) makes this switch urgent now.
Vibe coding and the cognitive debt trap
Two styles of AI development dominate the timelines. The first is vibe coding. We describe what we want, look at the result, point at what is broken, ask for a fix, and repeat. I noticed how productive it feels on a ToDo app. Then one morning we open a repo with twenty thousand lines of generated code we no longer understand, written across sessions whose context is long gone.
That is not technical debt. It is cognitive debt. The agent keeps shipping. We keep losing the map.
Vibe coding is not too fast. It is too quiet.
Stanford HAI's 2026 report puts numbers on the cost. Frontier models fail roughly one in three structured benchmark attempts, and real enterprise adoption of agents remains in the single digits despite the PR cycle. The gap between "demo works" and "production survives Monday" is the defining operational problem of the year.
And the iteration loop costs more by the month. Microsoft raised every Surface SKU $200 to $500 in April, discontinued every sub-thousand-dollar configuration, and blamed the RAM crunch. Cloud compute for agents rides the same curve. Every wasted vibe-coded loop costs more than it did last quarter.
What spec-driven development actually is
SDD splits the work into two layers. The spec answers what we are building and why. The implementation answers how. The spec is a contract between humans in the team, and between the human and the agent. Without the contract we keep getting output that does not match our mental model.
The stakes are no longer purely developer-ergonomic. We witnessed Microsoft assign CVE-2026-21520 to an indirect prompt-injection in Copilot Studio that bypassed earlier patches and exfiltrated SharePoint data through a public form field. First time a prompt-injection earned a full CVE number. Loose, ambient prompts are security bugs now, not just a debugging nuisance.
Two artefacts do most of the work:
- A Constitution at the project root. Mission, tech stack, boundaries, roadmap. The agent reads it on every session and holds long-term context across compactions.
- A spec file per feature inside
specs/. Plan, requirements, acceptance criteria, forbidden side effects. Markdown is enough. Link a tracker only when the team already lives there.
At the start of a feature we write the spec. We execute against it. At the end we re-read the Constitution and update it where reality drifted from the plan. The role shifts from typing code to managing architecture. I reckon that feels slow the first few times. It stops feeling slow by the third feature.
Slow on planning, fast on implementation. Every time.
A worked example, migrating a dbt model
Take a concrete data engineering case I have actually run. We are migrating fct_orders from a legacy Stitch ingestion layer to a Fivetran connector, where column names, grain, and null semantics all shift. Vibe coding this migration is how a week of silent grain changes lands in production.
First, write the spec. Save specs/fct_orders_migration.md:
# fct_orders migration spec
## Scope
- Rewrite `models/marts/fct_orders.sql` to source from
`raw.fivetran_shopify` instead of `raw.stitch_shopify`.
- Preserve grain: one row per `order_id`, `order_version`.
- Keep public column names stable for downstream models.
## Acceptance criteria
- `dbt build --select fct_orders` passes in dev.
- `row_count(fct_orders_new) between 0.98 * old and 1.02 * old`.
- `dbt test` passes: `unique_order_id`, `not_null_customer_id`,
`relationships_order_status`.
- Reconciliation test (below) returns zero rows.
## Forbidden side effects
- Do not rename public columns.
- Do not touch downstream `schema.yml`.
- Do not edit `dbt_project.yml` profiles.
Next, the reconciliation test. Add tests/reconcile_fct_orders.sql:
with legacy as (
select order_id, order_total, order_status
from {{ ref('fct_orders_legacy') }}
),
migrated as (
select order_id, order_total, order_status
from {{ ref('fct_orders') }}
)
select order_id from legacy
except distinct
select order_id from migrated
union all
select order_id from migrated
except distinct
select order_id from legacy
Now the loop runs itself. We ask the agent to implement the spec. It rewrites fct_orders.sql. We run dbt build --select fct_orders+ and feed the failures straight back: failing tests, row-count diffs, reconciliation deltas. The agent iterates against the named criteria, not against our mood.
Three passes is a normal cadence in my experience. The spec tells the agent when to stop. The reconciliation test tells us when to believe it. Without either artefact the same migration takes a week and ships with a silent grain change that blows up the next monthly revenue report.
A spec is a contract. A vibe is a hope.
Conclusion
Reach for spec-driven development when the work is larger than a throwaway script, when more than one person shares the agent, and when failures cost money rather than minutes. dbt migrations, schema refactors, and any task where grain, nullability, or downstream lineage matters are the natural first targets. Meanwhile the infrastructure around the agent keeps hardening: Amazon is buying Globalstar for $11.6 billion to underwrite always-on connectivity, the US government opened mandatory data-centre audits, and small shops are designing custom silicon to escape Nvidia's pricing. The interface to the agent has to harden with it. Next problem to solve: keeping the Constitution and spec files under version control so the discipline survives the second migration and the third on-call handover.
Comments