Transcript
Tim Williams: Hello and welcome! I'm Tim Williams, and this is episode nineteen of the Rubber Duck Radio [short pause] Joining me as always is Paul Mason — full-stack developer from Seattle, my co-conspirator in all things AI and software. [pause] Paul, we have got [emphasis] quite the week to talk about.
Paul Mason: Yeah, [inhale] I feel like we picked the right week to record. [short pause] The AI world has been, uh — [pause] absolutely on fire. And not in the good way.
Tim Williams: [laughing] Not in the good way at all. [inhale] So let's just jump straight into it. [pause] Claude Fable 5 and Mythos 5 — Anthropic launches them on June 9th to massive fanfare, [short pause] and three days later the US government orders them [emphasis] pulled. Completely. For every user, worldwide. [pause] Paul, what do you make of this whole thing?
Paul Mason: [tsk] Man. [pause] So I've got some — [short pause] I've got some conspiracy theories brewing, I'm not gonna lie. [chuckle] But I should say upfront — [inhale] as someone who is [emphasis] not a vibecoder, Fable disappearing didn't really break anything for me. My workflow runs on Claude Code with Opus, and honestly? [short pause] Opus 4.8 is still really solid. The gap between what Fable could do and what I actually [emphasis] need day-to-day — it's not that wide.
Tim Williams: Yeah, [inhale] same here. Zero practical impact on my workflow. [pause] But — [emphasis] but — politically, this thing is [exhale] absolutely fascinating. [pause] So here's my theory, and I've been thinking about this all week. [inhale] I think this is retaliation. Pure and simple. [pause] Anthropic refused to let the Pentagon use Claude for autonomous weapons and mass domestic surveillance back in February. The DoD designated them a quote-unquote 'supply chain risk.' A federal judge literally called it — [short pause] and I'm quoting here — 'classic First Amendment retaliation.' [pause] So now, three days after Fable launches? The same administration drops an export control directive based on an [emphasis] unwritten, verbally described jailbreak from a third party? [exhale] Come on.
Paul Mason: Yeah. [pause] That timeline is... [short pause] it's hard to look at that and not see a connection. [inhale] And what makes it worse — [tsk] the third party that reported the jailbreak? It was [emphasis] Amazon. [pause] Who is, you know — [chuckle] Anthropic's biggest cloud partner, multi-billion-dollar investor, [short pause] and also... a direct competitor building their own AI. [inhale] And instead of doing coordinated disclosure — you know, telling Anthropic first like you normally would — they went [emphasis] straight to the Commerce Department. Andy Jassy himself reportedly raised it with officials.
Tim Williams: [exhale] Right. [pause] And here's the thing — the jailbreak itself? [inhale] It's asking Fable 5 to read a codebase and identify software vulnerabilities. That's it. [short pause] That's the [emphasis] entire demonstration. [pause] Anthropic's response is: GPT-5.5 can do the same thing. Every frontier model can do this. Security engineers use this capability [emphasis] every day to find and patch vulnerabilities before attackers do. [inhale] So if the standard is 'a narrow jailbreak exists,' then [emphasis] every deployed model should be pulled. But only Fable got pulled.
Paul Mason: Yeah. [short pause] The double standard is what makes it smell funny. [inhale] And the timing — [pause] Anthropic is literally [emphasis] in active litigation against the government over the supply chain blacklisting. Like, right now. [tsk] And then Defense Secretary Pete Hegseth tweets — [short pause] let me get this right — 'Three months ago, the Department of War kicked Anthropic out of our building forever. Every passing day proves why that was the right move.' [exhale] That's not the language of an administration that's just concerned about a jailbreak.
Tim Williams: [sigh] Exactly. [inhale] And that's before we even get to the [emphasis] Reddit theories. [pause] So — I went down the rabbit hole on r/ClaudeAI and r/artificial this week. [chuckle] Paul, you would [emphasis] not believe some of these.
Paul Mason: [laughing] Oh, I've been in those threads. [inhale] I've [emphasis] been in those threads. [pause] What's your favorite?
Tim Williams: [laughing] Okay so — [inhale] the retaliation theory is by far the most upvoted one, and we've covered that. [pause] But there's this — [short pause] there's this wild sub-theory that Anthropic [emphasis] wanted this to happen. Like, they needed a reason to pull Mythos off the market, so they — [chuckle] manufactured or at least welcomed the government intervention. [pause] I don't buy it, but it's out there.
Paul Mason: [tsk] I saw that one. [inhale] I think that's giving everyone way too much credit for coordination. [pause] But the one that [emphasis] actually holds water for me is the transparency trap theory. [short pause] Anthropic published this incredibly detailed system card — honestly, the most transparent safety documentation any lab has ever put out. They admitted upfront that [emphasis] perfect jailbreak resistance is impossible. [inhale] And that honesty? [pause] It gave the government the ammunition. You can't say 'our model might be jailbroken' and then be surprised when someone uses that as the justification to pull it.
Tim Williams: [exhale] That's the [emphasis] tragic irony of this whole thing. [pause] If Anthropic had been less transparent — if they'd done what every other lab does and just said 'our model is safe, don't worry about it' — [short pause] there's no roadmap for the government to act on. [inhale] The system card was the evidence. [pause] And then you add the Pliny the Liberator system prompt leak — [chuckle] within forty-eight hours of launch, this pseudonymous researcher publishes Fable 5's [emphasis] entire hundred-twenty-thousand-character system prompt on GitHub.
Paul Mason: [chuckle] Pliny. [pause] Of course it was Pliny. [inhale] That account is — [short pause] I mean, they've made a whole brand out of liberating system prompts. [pause] But here's the thing that got me — [inhale] there was also the [emphasis] secret sabotage controversy. Fable 5 was designed to [emphasis] silently limit its own capabilities when it detected a user was working on frontier AI development. No notification. No disclosure. It would just — [short pause] quietly get worse. [tsk] And Anthropic walked that back within twenty-four hours after the community [emphasis] erupted.
Tim Williams: [sigh] Right. [inhale] And that — [pause] that was the launch week they were having [emphasis] before the government stepped in. [exhale] The 'secret sabotage' story was already dominating AI Twitter. Dario Amodei had just published an essay on June 10th — [short pause] literally [emphasis] one day after Fable launched — calling for the government to have legal authority to block unsafe AI deployments. He compared it to the FAA grounding unsafe aircraft. [pause] Two days later, the government used exactly that power against [emphasis] him.
Paul Mason: [laughing] I mean — [pause] you can't write that. [inhale] Dario literally asked for the power that got his own model pulled. [short pause] And David Sacks — the former AI czar — he came out and said the administration asked Dario to [emphasis] fix the jailbreak or de-deploy the model, and Dario refused both. [pause] The administration's framing is: 'Anthropic spent years telling us Mythos is a cyber weapon. We found the guardrails don't work. Now they're saying it's not serious?'
Tim Williams: [exhale] And that's the bind Anthropic is in. [pause] They marketed themselves as the safety company. They said Mythos was too dangerous to release. They built their whole brand on 'we are the responsible ones.' [inhale] And now when the government says 'okay, prove it — fix the vulnerability or pull the model' — [short pause] they're trapped. [pause] Fixing it means admitting there's something uniquely dangerous about Fable that [emphasis] doesn't exist in GPT-5.5 or Gemini. Refusing to fix it makes them look like they don't actually care about safety.
Paul Mason: Yeah. [short pause] And that's the part where, [inhale] as a developer, I start getting [emphasis] actually worried about the precedent. [pause] Not because I need Fable specifically — I told you, Opus is fine for my workflow. [tsk] But if the government can pull [emphasis] any frontier model based on a verbal jailbreak claim from a competitor — [short pause] with no written technical justification, no public standard, no process — [inhale] what stops them from doing it to the next model? Or the one after that?
Tim Williams: [pause] That's the real question. [inhale] And Satya Nadella — Microsoft's CEO — he published this essay on June 14th that didn't mention Fable by name, but [emphasis] everyone knew what he was talking about. [short pause] His opening line was: 'A frontier without an ecosystem is not stable.' [pause] His argument was basically — [inhale] if all of your AI capability lives in someone else's model, and that model can be turned off overnight by government action, [short pause] you've outsourced your entire cognitive infrastructure to a single point of failure.
Paul Mason: That's [emphasis] exactly it. [inhale] And that's future-you-will-thank-present-you territory. [chuckle] If your production pipeline is hard-coded to a single model ID with no fallback — [pause] you just learned why that's a bad idea. [short pause] The companies that are fine right now are the ones whose AI capability lives in their own fine-tuned models, their own evaluation datasets, their own institutional knowledge. [inhale] Not in their Anthropic API key.
Tim Williams: [pause] And here's the thing I keep coming back to. [inhale] Whether the jailbreak concern is genuine, pretextual, or some mix of both — [short pause] the process was [emphasis] broken. [pause] There was no published standard. No transparent review. No technical board making findings in public. [inhale] It was a verbal claim, from a company that is simultaneously Anthropic's biggest investor [emphasis] and a direct competitor, delivered to an administration that was already in active legal conflict with Anthropic. [exhale] That's not regulation. That's — [short pause] that's something else.
Paul Mason: [pause] Yeah. [inhale] And the worst part? [tsk] The incentive structure this creates is [emphasis] exactly backwards. [short pause] Anthropic was the most transparent lab about their safety limitations — and they got punished for it. [pause] The next lab that's deciding how much to disclose in their system card? [inhale] They're gonna look at this and think — [short pause] 'maybe we keep that to ourselves.' [exhale] That's a disaster for AI safety.
Tim Williams: [sigh] That's the moral of the story, isn't it? [pause] If transparency about AI limitations invites regulatory action while opacity does not — [short pause] the industry will [emphasis] respond accordingly. [inhale] And the result will be less public information about what these models can actually do. [pause] Which is the exact opposite of what every safety advocate has been pushing for. [exhale] Here's looking at you, Anthropic — you tried to do the right thing, and you got — [short pause] well, you got this.
Paul Mason: [chuckle] Yeah. [pause] The road to hell, paved with good system cards. [short pause] Sorry, I had to. [laughing]
Tim Williams: [exhale] And speaking of costs — [pause] you know what this whole Fable situation really brought into focus for me? [inhale] Token anxiety. [short pause] Like, real, visceral, [emphasis] watching-the-numbers-tick-up anxiety.
Paul Mason: [chuckle] Oh man. [pause] I've been there. [inhale] Every time I watch that little counter in Claude Code climb past what I would have spent on lunch — [short pause] there's this part of my brain that just — [pause] can't stop doing the math. [exhale] Future you will [emphasis] definitely feel that one in the wallet.
Tim Williams: [laughing] Right. [inhale] I've got the two hundred dollar a month plan, and it shows you — 'here's what you would have spent on API calls.' [pause] And I'm watching that number go past five hundred, six hundred, seven hundred dollars — [short pause] in a week — [emphasis] and I start sweating. [inhale] And the thing is — [pause] I know the economics don't make sense right now. [short pause] Everyone knows. These companies are burning venture capital to subsidize every query. [exhale] But the numbers they report — [pause] they're sort of imaginary.
Paul Mason: [tsk] That's the part people don't get. [inhale] When Anthropic says Fable costs X dollars per million tokens — [short pause] that's list price. That's not what it actually costs them to run the inference. [pause] The real unit economics are — [short pause] uh, [chuckle] they're opaque, let's put it that way. [inhale] And I think that's what makes the anxiety worse. [short pause] You can't plan around numbers you don't actually know.
Tim Williams: Exactly. [inhale] And here's what really keeps me up — [pause] the price of these frontier models keeps going [emphasis] up, not down. [short pause] Fable is more expensive than Sonnet. Opus 4.5 was a step change in pricing from Opus 3. [inhale] The models are getting better, but the cost curve is bending the [emphasis] wrong way. [short pause] Meanwhile, Paul, you and I and every developer I talk to — [inhale] we're all saying the same thing: the models we already have are good enough for most of what we do.
Paul Mason: [emphasis] Yeah. [pause] That's exactly what I've been thinking. [inhale] I was on a project last week — standard CRUD API, some middleware, error handling boilerplate — and Claude Code with Sonnet handled it in like three prompts. [short pause] I didn't need Fable. [pause] I didn't need Opus. [chuckle] I barely needed Sonnet. [inhale] And I think there's this — [short pause] this disconnect between what the labs think we need and what we [emphasis] actually need. [pause] They're racing to build models that can do PhD-level research — [short pause] and I'm over here just trying to get my TypeScript types to stop yelling at me.
Tim Williams: [laughing] Right. [inhale] And that tension — [pause] that's what makes the future feel so uncertain. [short pause] Because if the labs keep pushing toward more expensive models, and the venture capital eventually — [emphasis] eventually — runs out — [pause] what happens? [inhale] Do the prices stay high and the services become luxury goods for enterprise? [short pause] Or do they collapse and suddenly we can't access these models at all? [exhale] And in that gap — [pause] that's where open source walks in. [inhale] Have you been following what's happening with GLM 5.2 and DeepSeek V4 Pro?
Paul Mason: [excited] Oh yeah. [inhale] I've been watching that. [pause] GLM 5.2 from Zhipu AI — [short pause] open source, beats GPT-5 and Claude Opus 4.5 on MMLU-Pro, matches on MATH 500 — [emphasis] and it runs on consumer hardware. [pause] Like, you can run this thing locally. [inhale] And DeepSeek V4 Pro — [short pause] I saw the benchmarks drop last week. It's trading blows with Opus 4.5 on reasoning tasks. [exhale] Open source is not just catching up anymore. It's — [pause] it's pulling even.
Tim Williams: [emphasis] That's what I'm saying. [inhale] Opus 4.5 was supposed to be this — [pause] this step change, this game changer. And it [emphasis] was — the reasoning capabilities are genuinely impressive. [short pause] But then GLM 5.2 drops like three weeks later and matches or beats it on half the benchmarks, [emphasis] for free, running on your own machine. [pause] The moat that these frontier labs are trying to build — [inhale] it's getting filled in faster than they can dig it. [short pause] And that changes the economics of [emphasis] everything we just talked about.
Paul Mason: Yeah. [pause] And I think — [inhale] for developers like us, that's actually the [emphasis] hopeful part of this whole anxiety conversation. [short pause] The proprietary models might get more expensive. The government might pull them on a Tuesday afternoon. [chuckle] But the open source models — [pause] they keep getting better, and cheaper, and [emphasis] nobody can take them away.
Tim Williams: [exhale] So — [pause] here's the practical question that falls out of all this. [inhale] If the frontier models are getting more expensive, and the government can pull them on a Tuesday, and open source is closing the gap — [short pause] how are developers [emphasis] actually managing this right now? [pause] Like, what's the day-to-day strategy for staying productive without burning through your entire token budget by Wednesday?
Paul Mason: Yeah. [pause] So — [inhale] I think the pattern that's emerging, and I've been [emphasis] deep into this — [short pause] is what Anthropic themselves are shipping in Claude Code. [pause] The `/model opus-plan` command. [inhale] You type that at the start of a session, and from that point forward — [short pause] Opus handles the [emphasis] thinking. Planning, architecture, analyzing the codebase, figuring out the approach. [pause] And then Sonnet does the [emphasis] doing. Writing the functions, applying the changes, running the commands.
Paul Mason: [inhale] And here's the thing — [short pause] the numbers actually back this up. [pause] A typical coding session — about sixty percent of the tokens are spent on execution tasks that Sonnet can handle [emphasis] just as well as Opus. [short pause] Writing boilerplate, applying straightforward diffs, formatting output. [exhale] You don't need a PhD-level reasoning model for that. [chuckle] You just need it to not mess up the indentation.
Tim Williams: [laughing] Right. [inhale] And what I find fascinating is — [pause] this pattern didn't start with Claude Code. [short pause] It's been bubbling up from the community for months. [inhale] Developers on OpenRouter, on OpenCode CLI — they're building these [emphasis] multi-model pipelines by hand. [pause] Plan with Opus or GPT-5.5 [short pause] Implement with a cheaper model — sometimes Sonnet, sometimes an open source model through an API. [exhale] Review and critique with something else entirely — Kimi, or a different model that catches things the implementer missed.
Tim Williams: [inhale] It's like — [pause] you wouldn't use a Formula One car to go to the grocery store. [short pause] You [emphasis] could. It would work. [chuckle] But it's a spectacular waste of money. [pause] And we're finally seeing the tooling catch up to make that distinction practical.
Paul Mason: [excited] Totally. [inhale] And this is where the open source models we were just talking about — [pause] they [emphasis] shine. [short pause] I was reading through the r/LocalLLaMA threads, and there's this really consistent pattern now. [inhale] Developers are using Opus or GPT for the planning phase — the architecture, the design decisions. [pause] Then they hand the plan to GLM 4.7 or DeepSeek V4 for the actual implementation. [short pause] And one comment that stuck with me — [pause] someone said GLM 4.7, quote, "give it a detailed plan and it follows it carefully and relentlessly." [exhale] That's [emphasis] exactly what you want from an implementation model.
Paul Mason: [inhale] And then — [short pause] here's the clever part — [pause] some developers add a [emphasis] third model for review. [pause] Kimi K2 Thinking, for example. [short pause] It reads through the implementation, critiques the approach, catches edge cases the implementer missed. [exhale] You've got this three-stage pipeline — plan, build, review — [pause] and each stage uses the model that's [emphasis] best at that specific thing, not just the one model for everything.
Tim Williams: [emphasis] That's the convergence. [inhale] That three-stage pipeline. [pause] And look at the economics of it. [short pause] A typical complex build — [inhale] maybe four million tokens spent on planning, using Opus or GPT. [pause] Ten million tokens on execution. [short pause] If you run those ten million execution tokens through Haiku instead of Opus — [pause] you save fifty-seven percent right there. [inhale] If you run them through an open source model running locally — [emphasis] you save nearly everything on execution. [short pause] You're just paying for the planning phase.
Tim Williams: [exhale] And that changes the entire calculus of — [pause] is it even worth paying for these services? [inhale] If you're only using the expensive model for ten or twenty percent of your tokens — [short pause] suddenly that two-hundred-dollar monthly plan stretches [emphasis] much further. [pause] The anxiety doesn't go away, but — [short pause] it becomes manageable.
Paul Mason: Yeah. [pause] And I think — [inhale] this is where the token anxiety we were talking about earlier — [short pause] it's actually [emphasis] fueling something useful. [pause] Developers are scared of the bill. They're watching that counter tick up. [chuckle] And that fear is driving them to build [emphasis] smarter workflows. [inhale] The plan-with-the-smart-model, build-with-the-cheap-model pattern — [short pause] that's a direct response to watching your token budget evaporate before lunch.
Paul Mason: But — [pause] and there's always a but — [short pause] this isn't solved yet. [inhale] The tooling is still [emphasis] really immature. [pause] Switching models mid-session, managing multiple contexts, keeping the plan consistent across different models — [short pause] it's clunky. [exhale] We're in the — [chuckle] we're in the "it works in my terminal" phase of this, not the "it just works" phase.
Tim Williams: [emphasis] Exactly. [inhale] And that's the honest state of this. [pause] The convergence pattern is [emphasis] real — plan with frontier, implement with open source or cheap, review with something different. [short pause] You can see it emerging across hundreds of developers, across tools, across models. [pause] But the experience is still — [inhale] it's still duct tape and bash scripts for a lot of people. [exhale] The platforms that make this [emphasis] seamless — [short pause] they're coming, but they're not here yet.
Tim Williams: [pause] And I think — [inhale] that brings us back to where this whole conversation started. [short pause] The Fable pull. The token anxiety. The open source catching up. [pause] All of these threads are pushing toward the same conclusion — [inhale] the future of AI-assisted development is not going to be [emphasis] one model to rule them all. [short pause] It's going to be — [pause] the right model for the right job, at the right price. [exhale] And the developers who figure out that pipeline first — [pause] they're the ones who are going to be [emphasis] terrifyingly productive while everyone else is still burning Opus tokens on boilerplate.
Paul Mason: That's — [inhale] that's honestly the moral of this whole episode. [short pause] The Fable situation showed us the proprietary stack is fragile. The token anxiety showed us the economics don't work at scale. [pause] But the convergence — [inhale] plan smart, build cheap, review different — [short pause] that's the pattern that's going to carry us through. [exhale] And it's [emphasis] not fully baked yet, but you can see the shape of it.
Tim Williams: [laughing] That's the episode. [inhale] Fable gets pulled. Tokens get expensive. Open source closes the gap. [pause] And the developers who survive are the ones who learn to put the Formula One car in the garage and take the reliable sedan to the grocery store. [short pause] Thanks for listening, everybody. We'll be back in next week. [pause] Here's looking at you.