Transcript
Tim Williams: Hello and welcome to the second episode of Rubber Duck Radio. It's spooooooky October 31st, I'm your host Tim Williams, and joining me once again is Paul Mason a full stack developer from Seattle, welcome back Paul!
Paul Mason: Hey, nice to be here again Tim.
Tim Williams: We're going to try and tackle a few topics this time, some frustrations, somehow to's and some stories from the trenches of working with agentic tools in the software development space.
Tim Williams: If you think about it, many of the tools and techniques we're all working hard at to make these models work have one fundamental core issue that creates the need for these complex tools and context management solutions.
Paul Mason: I know what you're going to say, their static knowledge cutoff.
Tim Williams: Exactly, one fundamental limitation of LLMs that's a difficult nut to crack is the high cost of training. This means they're all frozen at a specific date. You know as well as I do, in software development three months is like three years worth of innovation.
Paul Mason: You blink and three brand new JavaScript frameworks have been born, it's insane.
Tim Williams: Right! When it comes down to it, tools like MCP, and tool use are all extremely important and powerful because they go some way of making that problem a little less of an issue.
Tim Williams: Then you have other context modifiers to make AI follow what you want like cline rules and cursor rules files. These are absolutely essential, but on large projects, they can be costly in terms of token overhead for every single request.
Paul Mason: You know what can help that right? Like, uh, fundamental software development methodologies, like making sure your code is broken down into small problems.
Tim Williams: If only I worked in a world where there were no legacy monoliths to maintain, but alas, I live in the real world. Come on Paul, certainly you don't work only on Greenfield new projects with modern clean architecture?
Paul Mason: No, no, you're right. I work on my fair share of legacy code too.
Tim Williams: So do you have any tips or tricks for managing context on these huge projects?
Paul Mason: Here's one trick, but it has its limitations. When I'm working on a huge project where sections are somewhat well structured into folders I'll treat one section like the whole project and open Cursor in that particular folder with its own cursor rules.
Paul Mason: The limitation of this being, whenever concerns cross into other parts of the system I have to do a lot more work to feed cursor that context. Yeah, It's not a great solution, but it does help keep the context a little more manageable, you know?
Tim Williams: I haven't tried that one! I'll give it a whirl and see if I can improve on the idea at all. So some of the issues I run into with these projects, especially on projects using more obscure libraries that don't have good representation in even these bleeding edge models, like old AngularJS code or, and here's a bear, AWS Amplify.
Paul Mason: Amplify? Not sure I've heard of that.
Tim Williams: It's a framework of tools AWS put together to make building full stack scalable apps a little easier. To be honest, it hits the mark when dealing with AWS infrastructure, but when working with frameworks on top of it like React Native or Angular, it's a little more of a wild Wild West.
Paul Mason: Huh, so what kind of issues do you run into on those projects?
Tim Williams: Well there's not a lot written about these frameworks, AWS has done a decent job of writing their own docs, but one of the reasons these models are so good at things like react is the sheer volume of training data out there in the form of blog posts, tutorials, and just generally implemented code.
Tim Williams: When you're dealing with a framework that's not as well documented you've got to stuff the context with a lot more instructions for the LLM to follow making every request more difficult and the token cost much higher or every single request.
Paul Mason: Yeah, huh, that's exactly what I ran into on an old legacy backbone project.
Tim Williams: Right! You have to fill up that context with references to the docs, and specific rules around how to work with these frameworks. And anyone who's worked with these tools knows that the longer the context, the less intelligent the model. Even these crazy models out here with million token context windows.
Paul Mason: Then you start adding things like MCP tools into the context window and suddenly there's no room left for your task.
Tim Williams: Have you tried the new Chrome MCP? Now that I've used it I absolutely cannot live without it.
Paul Mason: I heard about it, but haven't taken the time just yet, what've you done with it?
Tim Williams: What haven't I done with it would be a better question.
Paul Mason: Ok, what haven't you done with it?
Tim Williams: ok, just kidding, but here's one particularly powerful idea I ran into putting together a massive new feature. When you're building out a massive feature you can have the agent build out a full end to end test case for each use case.
Paul Mason: Wow, you don't even have to finish that and I know how I'd use it.
Tim Williams: So while building out this feature, I had the LLM, by the way I've been using Sonnet 4.5 quite a bit for large planning sessions, then either Haiku or Cursor's new in house model for implementing small parts of the feature plan.
Tim Williams: This way I can more finely control my token limits. Anyway, building out this feature I had the LLM go back and use the Chrome MCP to test through the test case after each small feature implementation in the plan, then document any bugs encountered through each step.
Tim Williams: Then I had the agent go back and fix each of the small bugs updating the testing documentation. This way I was working through a big problem in a small agile loop with the LLM.
Paul Mason: That's crazy, I'm definitely going to try this on my next sprint session. So you're basically using the LLM like an a flexible test framework that also documents your bugs?
Tim Williams: Yes, and not only can it document the bugs, it can generate some plausible root cause analysis and solution suggestions to come back and clean up.
Paul Mason: Yeah, it's really easy to see the power of that. You know, another issue I run into commonly with context size is problems that seem small, but turn out to be much larger than the initial scope.
Paul Mason: Recently I was implementing an end to end agentic research pipeline in a legacy system that had a queue system in place, but not what you'd expect from a modern system.
Paul Mason: This is a legacy PHP system with a custom queue system that was written by a developer no longer on the project, and poorly documented. So I made the decision to not try to utilize that queue system.
Paul Mason: I wanted to limit the impact of this new feature on anything else in the system so that we didn't have to run a battery of load testing and make sure the new traffic through the queue was going to stop other critical background processes.
Paul Mason: That's where things got difficult. A simple CRON based queue system isn't a huge scope right?
Tim Williams: Oh boy, I've been there.
Paul Mason: Yeah, and I was stubbornly trying to get this all done within the context of one thread with the agent, instead of stepping back and realizing that I need to jump back up to planning mode, then work back down to implementation.
Paul Mason: In the end, I got there, but I burned more tokens than I would've liked. I guess the moral of the story is that I need to be sensitive to these shifts in need.
Paul Mason: Like, be quick to step back and plan instead of trying to steamroll through, you know?
Tim Williams: I've done exactly the same thing myself. Part of the issue is, sometimes these models just nail super complicated things, and other times they fall flat on their face.
Tim Williams: There's a propensity to trust them beyond their capabilities.
Paul Mason: Can I ask you a question? Have you noticed this too? Why is AI SO BAD at CSS? Absolute position here, float left there. Completely out of alignment with modern CSS practices?
Tim Williams: YES! So frustrating. My theory is that not enough is written about modern CSS, or there is not a huge focus on the quality of data in that particular section of the AI's corpus.
Paul Mason: That makes sense. Come on OpenAI, figure it out! CSS is the thing I don't want to be fussing with, I want to work on the business logic and inner workings.
Paul Mason: Not spending time manually rewriting every piece of template and layout logic.
Tim Williams: Are you using frameworks with UI components when you run into a lot of this? I find that does help alleviate some of the issue, but not all of it.
Paul Mason: Agreed, the better defined and more established the framework, the better AI does with it I find.
Tim Williams: It all goes back to that training set and that context. I bet the next evolution for models will be smaller models specialized for certain stacks, or perhaps just fine tuned with specific code "habits"
Paul Mason: It would make sense for big tech to invest in that. The same way it pays off for them to invest in frameworks like React and Angular, it should pay off for them to have efficient models that help enforce their coding standards.
Tim Williams: I wonder why LoRa's haven't taken off for this sort of thing? I haven't looked into training a lora for an LLM, is it super expensive?
Paul Mason: Yeah, I've been wondering that too. You see LoRas everywhere in image generation—Stable Diffusion, Qwen Image, all of that. But in LLMs? Practically crickets.
Tim Williams: Right, it's like every artist on the internet has a folder full of LoRas for every character imaginable, but developers can't even get a fine-tuned code model for their stack.
Paul Mason: The big reason is cost and complexity. Image models are way smaller compared to LLMs, and their outputs are dense but localized. You can fine-tune a visual model on a single GPU in a few hours. LLMs are a whole other beast—hundreds of billions of parameters and an architecture that's way more sensitive to drift.
Tim Williams: So even if you're technically training a "low rank adapter," you still need a ton of VRAM just to load the base model into memory.
Paul Mason: Exactly. You can't fine-tune something like GPT-J or Llama-3-70B on a consumer rig. Even smaller 8B models need 24-48 GB minimum.
Tim Williams: That's probably why we're seeing companies focus on "prompt-level fine-tuning" instead—like rules, memory, or contextual injection—because it's cheaper and easier to distribute.
Paul Mason: Yeah, LoRas for text would make a lot more sense if you could just "snap in" a small domain adapter the way you do with images. But in practice, language modeling is fuzzier. The boundaries between domains aren't clean—business logic, syntax patterns, tone—it's all entangled.
Tim Williams: So you can't just inject a LoRa that says, "make me a better React developer," and expect it to work cleanly.
Paul Mason: Yeah, you'd get one that hallucinates JSX inside JSON and tells you it's valid.
Tim Williams: Pretty much.
Paul Mason: I think we'll get there though. There's research happening around parameter-efficient fine-tuning, like QLoRa and adapters that target specific transformer layers. Once that's easier to distribute, we'll start seeing libraries of "code-style LoRas."
Tim Williams: That'd be amazing—like being able to load a "Django style" LoRa or a "React Native best practices" one directly into your local model.
Paul Mason: Yeah, and ideally without having to mortgage your GPU for it.
Tim Williams: So, before we move on, have you tried Cursor 2 yet? The big 2.0 release?
Paul Mason: Yeah, I did. I was actually excited when I saw "agents" in the update notes. I thought, finally, they're going to let us wire custom tools or define agentic behaviors directly in the editor.
Tim Williams: Same here! I opened it, clicked on the Agents tab, and… it's just chat.
Paul Mason: Right? It's like you've left your editor and stepped into a chatbot window. Totally breaks the flow.
Tim Williams: Exactly. As a dev, I don't want to talk to my project. I want to work in it. I don't want to be dragged into a separate tab that feels like a disconnected demo interface.
Paul Mason: Yeah, it's like they built that for the "vibe coders"—the folks who are poking around asking, "hey, make me a website." Not the people maintaining a real repo with twenty microservices.
Tim Williams: And it's weird, because Cursor nailed the whole inline chat thing—the way you could stay inside the file, fix something, iterate quickly. That was their superpower.
Paul Mason: Totally. Taking you out of that environment into an agent dashboard just feels like a regression.
Tim Williams: I hope they don't focus too much on it going forward. It's a shiny feature, but it's not what developers actually need.
Paul Mason: Yeah, give me better context awareness, better merge assistance, or better integration with my tests. Not another chat window.
Tim Williams: Exactly. The tool should feel like an extension of your hands, not a conversation partner that interrupts your work.
Paul Mason: That's the difference between building for devs and building for content creators who occasionally open VS Code.
Tim Williams: Yeah, well said. Let's hope Cursor remembers who their core audience is.
Tim Williams: You know what though, I get why Cursor's trying the agent thing. Claude Code's doing it, OpenCode's doing it—it's kind of the new fad.
Paul Mason: Yeah, I've seen that too. They all seem to be chasing that "just describe what you want, and the AI builds it for you" dream.
Tim Williams: Which sounds great if you're kicking off a little one-shot project. You know, like spinning up a prototype or a small website where you just need a rough draft fast.
Paul Mason: Exactly. For a quick greenfield start? Sure. You don't mind a little context loss. But for real day-to-day work on complex systems? Nah. It just doesn't hold up.
Tim Williams: Right. Once you're deep in the weeds of a big codebase with five frameworks and a dozen internal dependencies, switching to an "agent" view that lives outside your IDE feels like teleporting into a different dimension.
Paul Mason: Yeah, and then coming back and realizing half your imports don't line up anymore.
Tim Williams: And splitting your focus across all these cloud agents? That's a nightmare waiting to happen.
Paul Mason: Oh totally. You've got one agent generating code, another testing it, one writing docs—and they're all burning tokens like crazy.
Tim Williams: Yeah, those loops where the LLM can't figure out how to get unstuck? Each one of those costs you a couple bucks in compute and a few minutes of your sanity.
Paul Mason: "Infinite loop detected… charging your card."
Tim Williams: Exactly. And the longer they run, the worse the drift gets. The model forgets early decisions, assumptions start contradicting each other, and suddenly your build doesn't even represent the original spec.
Paul Mason: That's the thing with long contexts—they sound like a fix, but past a certain point you just get noise. Unknown unknowns start stacking up.
Tim Williams: And then the human—meaning us—has to step back in and clean up. Which is fine, but it adds this massive cognitive load.
Paul Mason: Yeah, you're juggling multiple AIs, half a dozen half-finished branches, and a project that feels like it was written by committee.
Tim Williams: So you end up babysitting agents instead of writing code.
Paul Mason: Exactly. If I wanted to be a project manager, I'd open Jira, not Cursor.
Tim Williams: I mean, it's funny because the goal of these tools is to reduce cognitive load, but if they're not careful, they'll just move it around instead of removing it.
Paul Mason: Yeah, they automate the easy part and leave us with the chaos management.
Tim Williams: Right, which is why these chat-style interfaces feel backwards. The value of AI in development isn't in talking to it—it's in building with it, right there in the flow of code.
Tim Williams: You know what really gets messy though? When you've got multiple agents working on the same project.
Paul Mason: Oh yeah, I've seen demos where people spin up five agents and say, "This one builds the frontend, this one handles the backend, this one writes tests." And I'm like—okay, but who's merging all that?
Tim Williams: Exactly! Are they opening pull requests between each other? Is there an agent acting as the tech lead?
Paul Mason: "AI number four, you're blocking AI number two's merge again."
Tim Williams: Right? Theoretically, that sounds efficient, but in practice… how's that supposed to work inside a real system? When you've got five microservices all touching the same shared models and contracts, you can't just have agents freelancing in isolation.
Paul Mason: Yeah, you're going to end up with a race condition in your repo before you even hit production.
Tim Williams: And it's not like dev teams are just sitting on their hands while these agents work. People are still writing code, pushing changes, reviewing PRs. Are we expecting the LLMs to coordinate around that in real time?
Paul Mason: I mean, unless they've secretly built a super-advanced merge conflict therapist, I don't see it happening.
Tim Williams: "Tell me where Git hurt you."
Paul Mason: Seriously though, managing concurrency between human and AI contributors is already complicated enough. Now imagine five AIs branching and merging asynchronously.
Tim Williams: It's the kind of thing that sounds futuristic until you actually try to maintain it for more than a week.
Paul Mason: Yeah, like, sure—if you're spinning up a toy project or a single-function demo, multiple agents might make sense. But day-to-day development? With moving specs, refactors, dependencies? It's not practical.
Tim Williams: Exactly. It becomes one of those ideas that demos great but scales terribly.
Paul Mason: I think there's still a sweet spot for using a single agent in a tightly scoped way—like testing, data migration, or writing docs—but full multi-agent orchestration on a real product?
Tim Williams: Not unless you've got infinite tokens and infinite patience.
Paul Mason: And infinite budget.
Tim Williams: You know, there is one place I think these cloud agents actually make sense though.
Paul Mason: Yeah? Where's that?
Tim Williams: Bugfixes.
Paul Mason: Oh yeah, I'll give you that one. That's a great fit.
Tim Williams: Think about it—bugfixes are small, self-contained problems with clear context. They don't usually need the whole codebase in view, just enough to reproduce, patch, and test.
Paul Mason: Exactly. It's the perfect kind of async work. The agent can pick up an issue, spin up a temporary workspace, patch the bug, push the branch, and make a pull request without touching anyone else's flow.
Tim Williams: Yeah, and that's actually where Git's been leading the charge lately.
Paul Mason: Oh, totally. GitHub's Copilot Workspace, GitLab's AI assistant—they're both heading toward that "self-healing codebase" idea.
Tim Williams: Right. You file an issue, it builds the context from the code and the history, proposes a fix, and even opens the PR for you. That's the kind of asynchronous coding that actually works.
Paul Mason: Because it's bounded. It's not an AI trying to design your whole system architecture—it's an AI trying to fix one function that throws a null pointer when the moon is full.
Tim Williams: Yeah, and you can easily verify if it worked or not. It's binary—either the tests pass or they don't.
Paul Mason: Plus, it doesn't interfere with the human dev's mental model. You can glance at the PR, see what changed, and decide in thirty seconds if it's good.
Tim Williams: Exactly. That's what makes bugfixing a natural use case for distributed or cloud-based agents. It's asynchronous, modular, and measurable.
Paul Mason: And unlike greenfield projects or long-running builds, you don't end up with runaway token costs or cascading drift across multiple files.
Tim Williams: Yeah, it's small enough that context management is trivial. The AI can load just what it needs, apply the patch, test it, and move on.
Paul Mason: It's kind of poetic actually. The same tech that's a disaster for big architectural changes turns out to be perfect for the smallest, most annoying bugs.
Tim Williams: Yeah, it's like hiring a bunch of really fast interns who only fix lint errors and memory leaks.
Paul Mason: And you don't have to buy them pizza.
Tim Williams: You know what I still don't really get though? OpenAI's new browser agent—Atlas.
Paul Mason: Yeah, same here. Everyone's talking about it like it's the next big thing, but honestly, I don't see the point.
Tim Williams: Right? The whole idea of an agent clicking around a website through a GUI feels… redundant. Everything it's doing could be done ten times more efficiently through an MCP interface.
Paul Mason: Exactly. Why would you spin up a headless Chrome instance to scrape buttons when the same data probably lives one API call away?
Tim Williams: It's like teaching a robot to use a mouse when you could just hand it the keyboard.
Paul Mason: Yeah, or teaching it to drive to the grocery store instead of just calling the grocery store's delivery API.
Tim Williams: Don't get me wrong—it's technically impressive. Watching an agent move a cursor, fill a form, click a dropdown—that's cool. But it's also… fragile.
Paul Mason: Yeah, it's a stopgap for when a proper integration doesn't exist yet. As soon as a company releases an MCP or an API, the browser agent becomes obsolete.
Tim Williams: Exactly. I think that's what's going to happen over the next couple years. Every major platform—Google Workspace, Salesforce, AWS—they'll all expose direct machine interfaces. The whole "AI clicking buttons" thing will fade out.
Paul Mason: Right, we'll look back and say, "Remember when AI had to pretend to be a human user?"
Tim Williams: Yeah, the uncanny valley of productivity tools.
Paul Mason: And honestly, it's wasteful. Running browser automation eats resources, tokens, and latency for something that could be one clean JSON exchange.
Tim Williams: Exactly. The web was built for humans; MCP is being built for agents. It's the natural evolution.
Paul Mason: Totally agree. Once APIs and MCP endpoints become universal, the only real use case left for GUI-based agents will be legacy sites that never update or those weird enterprise dashboards nobody wants to maintain.
Tim Williams: The graveyard of forgotten CRMs.
Paul Mason: Exactly. You'll have one poor browser agent babysitting a 2008 SharePoint install for eternity.
Tim Williams: But for everything else—commerce, communication, research—I think MCP-style APIs will take over. They're faster, cheaper, and way more predictable.
Paul Mason: Yeah, agents don't need to browse—they need to connect.
Tim Williams: Perfectly said.
Paul Mason: So, prediction time—you think Atlas survives?
Tim Williams: Honestly? No. It's a great demo, but it's not the future. The real winners will be the developers building MCPs and the ecosystems around them.
Paul Mason: Yeah, it's going to be all about the pipes, not the puppets.
Tim Williams: The pipes, not the puppets—that's the title of this episode.
Tim Williams: Alright, I think that's a good place to wrap it up. We covered a lot of ground today.
Paul Mason: Yeah, we started off talking about context management and those never-ending token battles, wandered through legacy code, hit on why LoRa training hasn't really landed for LLMs yet
Tim Williams: and then took a nice detour through Cursor 2's new agent interface and why it kinda misses the mark for real developers.
Paul Mason: Right, more for the "vibe-code" crowd than the folks living in the trenches.
Tim Williams: Then we tore into multi-agent chaos—token burn, cognitive overload, merge nightmares—and finally landed on where cloud agents actually shine: bugfixes.
Paul Mason: Yep, Git's doing that right. Async, measurable, low-risk—makes total sense.
Tim Williams: And to cap it off, we agreed that browser agents like OpenAI's Atlas might look flashy now, but MCP-style APIs are where this is really heading. The future's in direct, structured communication, not in clicking buttons on a web page.
Paul Mason: "The pipes, not the puppets." Still my favorite line of the day.
Tim Williams: I'll take it. Anyway, thanks for tuning in to this spooky-season edition of Rubber Duck Radio.
Paul Mason: Yeah, keep your agents contained, your context clean, and your tokens under control.
Tim Williams: We'll see you next time—same duck time, same duck channel.
Paul Mason: Quack you later.