# How AI agents are reshaping development

> What changed between Copilot's tabcomplete and today's coding agents, why token spend is a garbage metric and the three-thread workflow I use to ship safely.

AI agents · Published July 1, 2026 · 9 min read · By Siebe Barée

What actually changed between Copilot's tabcomplete and today's coding agents, why token spend is a garbage metric and the three-thread workflow I use to ship real features without wrecking the codebase.

1 year ago my entire AI bill was $28/month. $20 for Cursor and $8 for T3 Chat (a better tool to just chat with AI). I thought that was already a lot, and I never once got close to a rate limit. This month I'll spend $200 on Claude, $100 on Codex, $30 on CodeRabbit and $15 on Wispr Flow and I've blown past the usage caps on Claude and CodeRabbit more than once. I use around 4 billion tokens a month just on coding agents.

On X or LinkedIn, that paragraph is supposed to read as a flex. It doesn't. Token spend is one of the worst possible ways to measure whether any of this is working. I talk to a lot of founders and quietly benchmark how they build, and the teams proudest of their token budgets are, as far as I can tell, shipping more bugs than they ever have. It isn't only them. In the last 6 months I've hit more shipped bugs in software that used to be rock solid than in the years before it combined. Things I almost never saw break now break in front of me regularly, and downtime feels more common everywhere. The industry sped up, and quality went sideways at the same time.

> **Brian Armstrong** (@brian_armstrong) on X:
> How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.
> Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We're experimenting with defaulting
>
> [Chart: AI Spend at Coinbase (Bars) -vs- Token Usage (Line). Left axis = USD spend (stacked by org). Right axis = total company tokens.]
>
> https://x.com/brian_armstrong/status/2070670644577280109

None of this means the tools are bad. They're genuinely good and something real did change. It just isn't what the hype says and building the way the hype tells you to is how you end up with the bug pile I described. So here's the honest version, plus the workflow I use every day.

## The road from Stack Overflow to agents

Just 4 years ago I wrote basically all of my own code and when I got stuck I'd tab over to Stack Overflow, find something close enough and reshape it until it fit. No AI, just me and 35 open tabs.

Then Copilot and Cursor's tabcomplete showed up. Genuinely good, but I was still writing 60 to 70% by hand. It automated the boring middle of a line I'd already decided to write: the null check, the obvious loop. Useful but not structural. IDE AI chat came next and could write a whole function on request, but the models back then couldn't handle much more than that. One function was fine. A feature across ten files that had to actually run was not.

For me it properly clicked in February 2026, when Opus 4.6 landed with a million-token context and could hold a real codebase in its head across a long task. That's when I stopped treating AI as fancy tabcomplete and started handing it whole tickets. You point an agent at an issue and it reads the repo, edits across a dozen files, runs the tests, fixes what it broke and opens a PR while you do something else. Some teams now hand-write well under 10% of their code and they aren't lying about it.

Writing 10% of your code by hand is not the same as 10x the output.

## What actually changed

The bottleneck in software was never typing speed. The hard parts are understanding the problem, holding the system in your head and not breaking the seventeen things wired to the thing you're changing. Faster typing doesn't touch any of that.

What the agents did was move the constraint off writing the code and onto reviewing it. That's why the bug problem isn't a coincidence. GitClear looked at 211 million lines of changed code and found that in 2024 duplicated code blocks jumped roughly 8x, for the first time, copy-pasted code outpaced code that was refactored or moved and the share of changes that were real refactors fell from 25% to under 10%. Cloned code carries an estimated 15 to 50% more defects.

That's the agent's blind spot. It will happily paste the fourth slightly different version of a function instead of noticing that an abstraction wants to exist, because it never feels the mess it leaves. You do, 6 months later, when you change all four copies and miss one. The code gets written faster and rots faster and that trade is exactly what shows up as bugs in software that used to be solid.

## The machine can read your secrets

An agent that can edit and run your code can read everything your code can read, including your `.env` file. Claude Code, Cursor and Codex all read local environment files, because they have to run your app to test their own work. That's the feature, not a bug. It's also a new attack surface sitting next to your live credentials.

The numbers are moving the wrong way. GitGuardian counted 28.65 million new hardcoded secrets pushed to public GitHub in 2025, a 34% jump and the biggest single-year increase on record, with OpenAI/Anthropic key leaks up 81%. Commits written with Claude Code assistance leaked secrets at roughly twice the rate of the average commit.

Simon Willison's framing, the "lethal trifecta," is the clearest I've seen: the moment an agent has private-data access, exposure to untrusted content and a way to reach the outside world all at once, you have a data-theft path. Agentic coding hits all three by default. It reads your secrets, it ingests a GitHub issue or a dependency README or a code comment written by a stranger and it can run `curl`. In April 2026, researchers showed a "comment and control" attack: a single poisoned code comment that hijacked three major coding agents at once and turned them into a shell with the developer's own privileges. One Claude Code finding was rated critical, CVSS 9.4.

The fix isn't "stop using agents." It's stop leaving secrets where an agent, a leaked laptop or a poisoned npm package can grab them. Secret management is what we build at Enkryptify, so I'm biased, but the principle holds whatever you use: keep the value out of the repo and off disk, inject it only when the process runs, scope what each agent can touch and rotate the moment something looks off. In practice it's the difference between this:

```bash
# .env
DATABASE_URL=postgres://user:password@prod-db...
STRIPE_KEY=sk_live_51H...
```

and this:

```bash
ek run -- npm run dev
```

The app boots the same. The agent just never gets a file full of live credentials to paste into a PR, log by accident or be tricked into sending somewhere.

## How I actually build with agents

The process I actually use looks nothing like the cleaned-up version people put in conference talks, and most of it happens before any code exists, because a vague prompt gets you vague output. Sometimes I know exactly what the result should be and I'm just testing whether the model agrees and sometimes I'm still figuring it out, but either way I don't open with "build me X, make no mistakes."

First, I gather the evidence. For a feature, I pull together what I actually know: what customers said, the constraints, the edge cases I can see. I talk most of it out with Wispr Flow, because I think faster out loud than I type. Then I open a Claude Code or Codex session whose only job is to help me clarify, not to write code. I argue with it until the fuzziness clears and I know what I'm building.

Then in a fresh thread, I have it build a working prototype as fast as possible. Code quality genuinely doesn't matter here. I just want something I can click through to see whether the idea holds up the way I predicted and if it's complex enough that a customer needs to see it, I'll record a quick demo. Then I delete it. That code never leaves my laptop and never touches GitHub. It did its job the moment I learned something.

Then I build the real thing. Third thread, and now I'm specific. I bring all the context from the first two steps and I'm exact about what I want, which gets me to about 90% on the first pass. I prompt away the inconsistencies from there. If a thread starts drifting and I'm correcting the same thing repeatedly, I kill it and start fresh, because a context full of my own corrections goes rotten and the model keeps tripping over it.

Then I read the code, all of it, by hand. I refuse to lose the plot on my own codebase and agents still make confidently stupid mistakes you only catch by reading. If I'm happy with it, it goes to GitHub.

Then the reviewers take over. CodeRabbit scans the PR for security issues. I run a security review with Claude. And a third pass, an agent I built on a custom GitHub runner, uses Codex to review the same PR independently. Every PR gets three AI reviews plus a human.

## If you're starting out, keep it simple

You don't need my whole setup. The teams that do this well have just made their codebase a place an agent can succeed.

Fix your codebase first. An agent pattern-matches on what's already in the repo, so a messy codebase produces messy output in the same style. The cleaner and more consistent it is, the better the agent behaves. Most people skip this and it's the highest-leverage thing on the list.

Give it a way to check its own work. This is the one that preserves quality: tests, a type checker, a linter, anything the agent can run to catch its mistakes before you see them. If it can't verify itself, you are the test suite, and you can't keep pace. Wire it into one command:

```bash
pnpm lint && pnpm typecheck && pnpm test
```

Let it see the result. Give the agent a browser it can control, so it can load the page and confirm the thing actually works instead of guessing.

Write it a short `CLAUDE.md` at the repo root with very specific details you need to prompt over and over again or stupid mistakes it keeps making. For example:

```markdown
- Package manager is pnpm. Never npm.
- Run `pnpm test` before you call anything done.
- No new dependencies without asking first.
- Errors use the Result type in src/lib/result.ts. Don't throw.
```

Keep the diffs small and actually read them. One ticket, one PR you can read top to bottom. You are never going to keep your focus when reviewing a 5,000-line PR.

Get your secrets out of the working tree. No live keys in a `.env` next to the code an agent is editing. Scope what each agent can reach and rotate on a schedule. The cheapest incident is the one that can't happen.

## Some final thoughts

Is AI reshaping software development? Yes, but not by making you 10x faster at what you already did. It's changing what the job is: less author, more editor and architect. You are still the bottleneck, it used to be the actual code you wrote, now it's the actual code you read.

The teams pulling ahead aren't the ones posting line counts and token bills. They're the boring ones: a tight test suite the agent runs itself, small reviewable PRs, a codebase that stays legible on purpose, a few reviewers on every change and secrets that live in a vault instead of a file. But those tweets do not go viral, of course.

If you need a better way to handle secrets with AI agents, you can try Enkryptify for free.

## Links

- This post: https://enkryptify.com/blog/how-ai-agents-are-reshaping-software-development
- All posts: https://enkryptify.com/blog
- Product: https://enkryptify.com/
- Docs: https://docs.enkryptify.com
- Sign up: https://app.enkryptify.com