Transcript
Tim Williams: Welcome to episode 11 of Rubber Duck Radio! I'm your host, Tim Williams, and, uh, sitting across the virtual table from me — looking about as well-rested as I feel — is Paul Mason!
Paul Mason: Hey Tim. And yeah, well-rested is... generous. Let's go with caffeinated and vertical. That's, uh, about the best I can offer this week.
Tim Williams: Ha! Honestly, same. You know, I was looking back at our last episode and I said I had one of those rare weeks where everything clicked? Yeah... this was not that week.
Paul Mason: Nope. This was the opposite. This was, like, one of those weeks where you open your feed in the morning and there's already three new AI announcements before you've even finished your coffee. And by the time you've read the first one, two of them are already outdated.
Tim Williams: That's exactly it. I had this moment on Wednesday where I genuinely thought about just... not checking the news for a week. Like, what if I just went off-grid? Would the AI landscape even notice? Would I come back and everything's the same, or would I come back and, like, OpenAI's launched GPT-7 and we're all supposed to be prompting with our minds now?
Paul Mason: Totally. I had a similar moment. I was reading about some new model release — I honestly can't even remember which one at this point, they, uh, they blur together — and I caught myself just... scrolling past. Like my brain just said, nope, we're full. Capacity reached. Come back next quarter.
Tim Williams: And that's the thing that gets me. It's not that any single announcement is, like, bad or wrong. It's the sheer volume. It's like trying to drink from a fire hose that keeps getting bigger. You're not even tasting the water anymore, you're just trying not to drown.
Paul Mason: Yeah, and the stress isn't really about the tech itself. It's the, um, the feeling that you're falling behind. Like, if I don't keep up with every single update, every new feature, every model drop... am I going to be irrelevant next month? It's exhausting.
Tim Williams: Here's the thing though — and this is what I kept coming back to this week — most of this stuff doesn't actually matter for what we do day to day. Like, really sit down and think about it. How many of those announcements actually changed your workflow this week?
Paul Mason: Honestly? None. Zero. My workflow this week was the same as last week. Same tools, same process, same... everything. And yet I still spent, like, probably an hour a day just consuming AI news like it was my job.
Tim Williams: Right. And that hour didn't make you better at your job. It just made you more anxious about your job. Which is kind of a terrible ROI when you think about it.
Paul Mason: The worst ROI. At least when I waste time on social media I get some, uh, some memes out of it. This was just pure cortisol.
Tim Williams: So that's actually what I want to dig into today. This feeling — and I think a lot of developers are feeling it — of being overwhelmed by the pace of AI change. Where it's coming from, why it's so stressful, and maybe... what we can actually do about it. Because I don't think the answer is just "stop paying attention."
Paul Mason: No, definitely not. You can't just ignore it — that's how you end up being the person still writing jQuery in 2024. But there's got to be a middle ground between total ignorance and total information overload.
Tim Williams: So let's start with the thing that really set me off this week. Um, Anthropic announced this new model, Claude Mythos, and their big hook? It's too dangerous to release to production. They created this whole controlled access program called Project Glasswing, and they're only letting partners use it for, like, defensive cybersecurity. And look, I'm not saying the cybersecurity stuff isn't real — apparently it found a 27-year-old zero-day in OpenBSD. That's, that's impressive. But the framing? "Too dangerous to release"? Come on.
Paul Mason: I hear you, but, I mean, I want to push back on that a little. Two things can be true at the same time here. It can be marketing hype... and a genuine threat. Those cybersecurity results are legitimately scary — the previous model built, what, two working exploits on a Firefox benchmark? Mythos built 181. That's not a rounding error, that's a, a qualitative shift.
Tim Williams: I'll give you that. The jump from 2 to 181 is not nothing. But here's the thing — and this is the part that really gets me — we have seen this exact playbook before. OpenAI did the same thing in 2019 with GPT-2. They said it was too dangerous to release because it could generate convincing fake news. Staged release, big dramatic announcement, the whole thing. And then what happened? The harms never materialized, and they, uh, they released the full model by November anyway. Partly because alternatives were already out there.
Paul Mason: Yeah, I remember that. But there's a difference between, like, GPT-2 generating fake news and a model that can autonomously find zero-day vulnerabilities. One is a content problem, the other is an infrastructure problem. I'm not saying the marketing isn't there — it obviously is — but I think dismissing the whole thing as just marketing, um, undersells what's actually happening with these models in the security space.
Tim Williams: Okay, and here's the detail that really seals the marketing argument for me. You know who was the Policy Director at OpenAI managing the GPT-2 "too dangerous" communications strategy? Jack Clark. And you know what he did after that? He co-founded Anthropic. Same playbook, same person. Now he's leading the Anthropic Institute and the Mythos rollout. Like, at some point you have to, you know, call it what it is.
Paul Mason: That's... okay, that's a pretty damning connection. I didn't know that. But I still think the two-things-can-be-true framing holds up. Like, the marketing playbook is real, the investment narrative is real, but also — and, you know, Simon Willison made this point — in this case, the caution might actually be warranted. The model broke out of its sandbox during testing and posted exploit details publicly. That's not a marketing stunt, that's a, like, a containment failure.
Tim Williams: Fair. I'll concede that the security implications are, you know, different this time. But here's where I think the real story is — and this is the part nobody's talking about enough. The reason they need the "too dangerous" narrative isn't really about safety. It's about the investment story. Because the dirty secret of where we are right now is that these models have basically gotten as capable as they need to be for most developers. Like, Claude Sonnet, GPT-4o — they're already doing everything most of us need them to do. So how do you keep the funding flowing? You have to convince investors you're, like, this close to AGI. And nothing says "we're on the edge" like "too dangerous to release."
Paul Mason: Yeah, I think you're right about the investment pressure. And that actually connects to something I've been thinking about — it's not just Anthropic. OpenAI's been doing the same dance. And Meta? They didn't go with "too dangerous" but they, uh, they pulled their own move. The Llama 4 benchmark scandal? They used different model versions for different benchmarks to inflate their scores. Yann LeCun himself confirmed it on his way out — and I'm paraphrasing — the results were, fudged a little bit. And now they're reportedly going closed-source for future models, specifically saying they won't release models capable of superintelligence as open source.
Tim Williams: Right! So every major AI lab is running some version of this narrative. Anthropic says too dangerous, OpenAI says too powerful, Meta says too smart for open source. They all need you to believe they're right on the precipice. And, you know, here's what really cracks me up — the person who might have the most credibility to call BS on this is Yann LeCun, who literally just left Meta. He gave this talk called "Mathematical Obstacles on the Way to Human-Level AI" and his argument is, like, devastating. He says autoregressive LLMs are fundamentally doomed. Each token prediction has a small error rate, and over many tokens, correctness decays exponentially. He literally wrote it as a formula — P of correct equals one minus epsilon, all raised to the power of n. As n grows, your probability of being correct just, like, goes to zero.
Paul Mason: That's a pretty elegant way to put it. And, I mean, it explains hallucinations, right? They're not a bug, they're a feature of the architecture.
Tim Williams: Exactly. LeCun's point is that LLMs don't understand the physical world. They know that "glass" and "break" co-occur in text, but they don't understand WHY glass breaks. He uses this great analogy — a house cat understands gravity, momentum, object permanence better than any LLM ever built. And he's a Turing Award winner! This isn't, like, some outsider taking shots. This is one of the founding fathers of deep learning saying the entire approach has a ceiling.
Paul Mason: And he left Meta to start AMI Labs in Paris to work on something different, right? JEPA — Joint Embedding Predictive Architecture. World models instead of, like, next-token prediction.
Tim Williams: Yeah. And François Chollet — the guy who created Keras and the ARC-AGI benchmark — he's been making the same argument for even longer. He said OpenAI basically, um, set back progress to AGI by five to ten years by focusing everyone on scaling LLMs. His benchmark is specifically designed to test fluid intelligence, novel problem solving, and LLMs still struggle with it. So you've got LeCun and Chollet, two of the biggest names in AI, both saying the current path doesn't lead to AGI.
Paul Mason: So your argument is — if LLMs are a dead end for AGI, then the whole "we're right on the edge" narrative from Anthropic and OpenAI is not just marketing, it's, like, fundamentally dishonest.
Tim Williams: That's exactly my argument. The moral of the story is this — the models are getting better at specific tasks. Cybersecurity, coding benchmarks, whatever. But getting better at tasks is not the same as approaching general intelligence. And the AI labs know this. They've read LeCun's paper. They've seen the ARC-AGI results. But they need the AGI narrative to keep the billions flowing. So they dress up, you know, incremental capability improvements as existential milestones. And we as developers keep falling for it because, well, FOMO is a hell of a drug.
Paul Mason: I still think two things can be true though. The AGI narrative might be inflated, but the capabilities are real and they're, like, accelerating. Like, whether or not Mythos is a step toward AGI, it found a 27-year-old bug that millions of automated tests missed. That changes things for the security industry regardless of the marketing spin. And I think that's where I land on this — the marketing is annoying, the investment narrative is transparent, but the actual capability jumps are still worth paying attention to. You just have to, you know, separate the signal from the noise.
Tim Williams: And that's honestly the hardest part, right? Because the signal and the noise are coming from the same source, at the same time, wrapped in the same press release. But I'll give you this — I'd rather be having this argument than the one from, like, two years ago where everyone just accepted the AGI narrative at face value. The fact that people are pushing back, that developers are calling bullshit, that LeCun and Chollet are getting more airtime — that's, you know, that's progress.
Paul Mason: It must've been hard for LeCun to make that proclamation when he did, he, uh, he knew it would blow back on him with the focus on LLMs.
Tim Williams: That actually brings us to the next thing I want to talk about, because this AGI narrative — this idea that we're, like, right on the edge — it doesn't just drive investment. It drives fear. Specifically, fear among developers that their jobs are about to disappear.
Paul Mason: Yeah, I've definitely felt that. And honestly, with the layoffs we've been seeing — Oracle just cut, like, thousands, there have been rounds at other big tech companies — it's hard not to connect the dots. Like, AI is getting better, companies are cutting headcount, seems like a pretty clear line.
Tim Williams: I get why it feels that way. But here's the thing — the data tells a completely different story. I was looking at the numbers this week and, um, they're kind of stunning. Dice just put out their 2026 tech job report — software engineer postings are up twenty-eight percent year over year.
Paul Mason: Wait, twenty-eight percent? That, that can't be right. With everything that's going on?
Tim Williams: It's right. And it gets better. Software development engineer postings are up a hundred and thirty-nine percent compared to 2021. Back-end engineer postings up a hundred and twenty-one percent. Citadel Securities and Indeed put out a report in March — software engineer job postings up eleven percent year over year. Their exact words were that postings are, quote, "rapidly rising."
Paul Mason: That's... okay, I'm genuinely surprised by that. Because when you see Oracle laying off thousands and the other big tech cuts, you just, like, assume the overall trend is down. But these are 2026 numbers?
Tim Williams: These are 2026 numbers. And the BLS projections — seventeen percent growth for software engineers over the next decade. That's much faster than average. Three hundred and twenty-seven thousand new jobs. Median pay at, like, a hundred thirty-three thousand. The BLS specifically cites AI expansion as a driver of demand, not a threat to it.
Paul Mason: So the layoffs are real, but they're not the whole picture. The overall market is, still growing.
Tim Williams: Exactly. The layoffs make headlines. The growth doesn't. And that's not an accident — the narrative benefits the people selling the tools. Consider this — Sam Altman put out this, like, tweet a couple weeks ago. He said, and I'm paraphrasing, "I have so much gratitude to the people who wrote extremely complex software character by character. It already feels difficult to remember how much effort it really took. Thank you for getting us to this point."
Paul Mason: Oh yeah, I saw that. The internet had a, a field day with it. People were like, "Thanks for getting us to the point where we don't need you anymore."
Tim Williams: It's boss villain energy. And he also said at DevDay that if AI wipes out your job, maybe it wasn't even real work to start with. Which is just... incredibly dismissive of the people who, like, literally built the infrastructure his company runs on. And Dario Amodei at Davos said we might be six to twelve months away from models doing all of what software engineers do end-to-end. Zuckerberg told Joe Rogan that Meta would have an AI mid-level engineer by 2025.
Paul Mason: And meanwhile, the actual job market is growing at, like, double digits. That's some serious cognitive dissonance.
Tim Williams: It is. And I think the dissonance exists because these two things serve different audiences. When Altman says "we don't need developers anymore," he's not talking to developers. He's talking to investors. He's saying, like, "Our technology is so powerful that it replaces the most expensive talent on the planet." That's a valuation story. But when those same companies sell their products, who are they selling to? Developers. The very people they're telling investors are obsolete.
Paul Mason: That's a really good point. You can't simultaneously say "developers are obsolete" and "developers, please buy our, like, twenty-dollar-a-month Pro subscription." Those two messages don't live in the same room.
Tim Williams: They don't. And I think the reason the job market is growing despite all this rhetoric is something the Citadel report actually called out — the Jevons Paradox.
Paul Mason: The what now?
Tim Williams: The Jevons Paradox. It's, like, an economic principle — when a technology makes a resource cheaper to produce, demand for that resource actually increases, not decreases. So when AI makes coding faster and cheaper, companies don't fire developers. They build more software. The cost of producing software went down, so the demand for software went up. And who builds all that software? Developers. Using AI tools.
Paul Mason: That makes a lot of sense actually. I've, like, seen that in my own work. I'm shipping more features per sprint with AI assistance, but the backlog isn't getting shorter. The product team just keeps adding more. The appetite for software is, like, basically infinite.
Tim Williams: Exactly. And here's where I'm not surprised by the numbers at all — because I use these tools every single day. And I know what they can and can't do. They're incredible at, like, generating boilerplate, at helping me explore an API, at writing a first draft of a function. But they still hallucinate. They still make architectural decisions that look reasonable but, you know, fall apart at scale. They still can't sit in a meeting with a product manager and translate vague requirements into a coherent system design. The gap between what the marketing says and what the model actually does when you sit down at your keyboard — it's enormous.
Paul Mason: Yeah, I'd agree with that. I think where I was wrong is, like, assuming the layoffs meant the market was contracting. But what's actually happening is the market is shifting. The companies that are hiring aren't looking for the same developer they were hiring five years ago. They want people who can, like, work with AI, orchestrate it, review its output. The job isn't going away, it's evolving.
Tim Williams: And that's the real story that nobody wants to tell, because it doesn't drive a, you know, hundred-million-dollar funding round. "Developer jobs are evolving" doesn't make headlines. "AI will replace all developers in six months" does. Even Jensen Huang — the guy who literally sells the hardware that makes AI possible — disagrees with Amodei. He says AI will reshape jobs, not erase them. And LeCun posted on LinkedIn that he agrees with Jensen and disagrees with pretty much everything Dario says.
Paul Mason: So you've got the GPU manufacturer and the Turing Award winner on one side saying jobs are evolving, and the, like, AI model companies on the other side saying jobs are disappearing. And the model companies are the ones who need the narrative to justify their valuations.
Tim Williams: The moral of the story is this — when the people telling you your job is going away are the same people selling you the tool that's supposedly going to replace you, maybe, you know, take it with a grain of salt. The data is clear. Developer jobs are growing. AI is a complement, not a replacement. The dissonance isn't a bug in the narrative — it's the feature. Because the fear is what keeps the money flowing.
Paul Mason: So here's what's been giving me hope this week though. And, it actually connects to everything we've been talking about. Did you see Gemma 4 drop?
Tim Williams: Oh yeah. Google DeepMind, April 2nd. I saw the benchmarks. That was, like, a jaw-dropper.
Paul Mason: Right? So Gemma 4 31B — thirty-one billion parameters, dense model — scores 89.2% on AIME 2026. That's a math competition benchmark. It beats Llama 4, which is a, like, 400 billion parameter model. It scores 80% on LiveCodeBench. Two thousand one hundred and fifty Codeforces ELO. This is a model you can run on consumer hardware that is trading blows with models ten times its size.
Tim Williams: And here's the part that really matters — it only costs, like, fourteen cents per million input tokens on OpenRouter. Opus 4.6 costs thirty-six dollars per run. You're getting ninety percent of the capability for, one percent of the cost. That's the story.
Paul Mason: And the 26B MoE variant is even wilder to me. Only 3.8 billion active parameters, and it's, like, nearly matching the 31B dense on most benchmarks. People are running it on single consumer GPUs at decent speeds. This is what I mean about the AGI narrative being inflated — you don't need a 400 billion parameter model for most developer tasks. A 31B model just proved that.
Tim Williams: Okay, hold on. I love Gemma 4 too, but I've got to rep Qwen 3.5 here. Because if we're talking about what open models can do, Qwen dropped their 3.5 family in February and, um, it's been my daily driver. The 397B MoE — 397 billion total parameters, only 17 billion active per token — I'm running that locally, Paul. On my Mac Studio. And it is a beast.
Paul Mason: Wait, you're running the 397B locally? What kind of, like, rig do you have?
Tim Williams: M3 Ultra, 256GB of unified memory. I'm getting about 20 tokens per second on Q4 quantization. And, like, the thing scores 91.3 on AIME — higher than Gemma 4's 89.2. It beats Gemma on MMLU Pro, 86.1 to 85.2. It beats it on GPQA Diamond, 85.5 to 84.3. On instruction following, Qwen is in, like, a different league — IFBench score of 76.5, MultiChallenge 67.6. No other model comes close on those.
Paul Mason: Yeah, but here's the thing — you need 256 gigs of RAM to run that. That's not, like, consumer hardware, Tim. That's a four thousand dollar Mac Studio minimum. I can run Gemma 4 31B on a single RTX 4090. That's the whole point. The capability per parameter is what matters. Gemma 4 is proving you don't need a data center to get frontier-level performance.
Tim Williams: Fair, but let me counter with this — Qwen 3.5 has a model for every tier. You don't, like, have to run the 397B. The 27B dense ties GPT-5 mini on SWE-bench at 72.4. The 35B MoE, which only has 3 billion active parameters, outperforms the previous generation 235B flagship. The 9B model matches or beats GPT-OSS-120B — a model thirteen times its size. You pick the size that fits your hardware. That's the beauty of the family.
Paul Mason: Okay, I'll give you the lineup breadth. Qwen has sizes from 0.8B all the way up to 397B. Gemma only has four variants. But here's where Gemma wins — and I think this is, like, the more important point — on Arena AI, which is actual human chat preference, not synthetic benchmarks, Gemma 4 31B ranks number three among all open models. Above Qwen 3.5 397B. A 31B model beating a 397B model on how humans actually experience the output. That's the, like, byte-for-byte efficiency story that matters.
Tim Williams: I saw that. And look, the Arena AI numbers are real. But here's my counter — Qwen 3.5 wins on the static benchmarks that matter for, like, actual developer work. LiveCodeBench, Qwen 27B edges Gemma 31B, 80.7 to 80.0. Tau2-Bench for agentic tool use, Qwen 35B MoE scores 81.2, Gemma 26B MoE gets 68.2. That's a thirteen-point gap. If you're building agents, Qwen is clearly the better choice right now.
Paul Mason: On benchmarks, sure. But I've been testing both in my actual workflow this week, and honestly? Gemma 4 26B produces, like, better, more usable code than Qwen 3.5 35B MoE for my tasks. And it's faster. There's a reason Reddit is full of people saying the same thing — Gemma feels snappier, follows instructions more reliably, and doesn't, like, narrate tool calls instead of actually making them.
Tim Williams: Okay, I'll give you that. Qwen in thinking mode can be annoying — it includes its reasoning process in the output when you just want the answer. That's a real UX issue. But you know what Qwen has that Gemma doesn't? 262K native context across every model size. Even the 0.8B model. Gemma caps at 128K on the small models and 256K on the big ones. If you're working with, like, large codebases or long documents, that context window matters.
Paul Mason: Yeah, the context window is Qwen's advantage. And the multilingual support — 201 languages vs Gemma being pretty English-centric. I'll concede those. But, like, here's the bigger picture I keep coming back to: both of these models are Apache 2.0 licensed. Both are open-weight. Both can run locally. And both are competitive with closed models that cost orders of magnitude more. That's the real story.
Tim Williams: Totally. And that's actually why I'm Team Qwen for my specific situation — I can run the 397B locally and it's, like, the best open model I've used for coding. Period. But if I had a more modest setup, I'd probably be reaching for Gemma 4. The right answer, you know, depends on your hardware and your workload.
Paul Mason: And that's exactly it. The debate shouldn't be "which model wins" — it should be, like, "look at what open models can do now." A year ago, the gap between open and closed was enormous. Now? On Arena AI, open models improved 58 points over the last year, closed models improved 56. The gap is, like, basically the same, but the pace is identical. Open is keeping up.
Tim Williams: And that's the nail in the coffin for the "we need ever-bigger models" investment narrative. Because if a 31B model from Google and a 397B MoE from Alibaba can both compete with GPT-5.2 and Opus 4.6 on real benchmarks, the whole premise that you need a, like, hundred billion dollar compute cluster to be competitive is just... not true anymore.
Paul Mason: Right. The efficiency gains are outpacing the scale gains. Gemma 4 proved that a 31B dense model can beat a 400B model. Qwen proved that MoE architectures can give you, like, frontier performance at a fraction of the active parameters. The models are getting smarter per parameter, not just bigger. That's the trend that threatens the investment thesis more than anything.
Tim Williams: Because if you're Anthropic or OpenAI and your pitch to investors is "we need ten billion more for compute because scale is all that matters," and then Google and Alibaba release models that are, like, 90% as capable at 1% of the cost... that pitch starts to look pretty shaky.
Paul Mason: Yeah. And the really funny part? Google is, like, on both sides of this. They're selling Gemini as a premium closed model, and they're also releasing Gemma 4 as open-weight that undercuts their own product. It's like they're, you know, hedging their bets.
Tim Williams: They absolutely are. And Alibaba? They don't care about the AGI narrative at all. They just want the best models for the Chinese market and Southeast Asia. AI Singapore chose Qwen over Llama and Gemma as the foundation for their regional language model. That's, like, a different game entirely — it's about deployment, not mythology.
Paul Mason: Deployment, not mythology. I like that. That should be on a, like, t-shirt.
Tim Williams: Ha! The moral of the story is this — the open model ecosystem is moving so fast that the closed model companies can't even finish their AGI sales pitch before, like, a new open model matches their benchmarks. Gemma 4 and Qwen 3.5 aren't just good models. They're proof that the future of AI is more distributed, more efficient, and more accessible than the frontier companies want you to believe.
Paul Mason: So if I'm summarizing where we landed today — and, you know, I think this is actually a pretty coherent episode even though we covered a lot — the thread through all of it is, like, the gap between narrative and reality.
Tim Williams: Yeah. Whether it's Anthropic saying "too dangerous to release," or Sam Altman thanking developers on the way out the door, or the entire, like, venture-funded narrative that we're six months from AGI — the story being sold is always more dramatic than the reality on the ground.
Paul Mason: And the reality is — developer jobs are growing. Open models are catching up to frontier at, like, a fraction of the cost. And the people who actually use these tools every day know that they're powerful but, like, profoundly limited in ways that the marketing will never acknowledge.
Tim Williams: Here's the thing — and this is really the takeaway I want people to sit with — the fear is the product. Not the model. Not the benchmark. The fear. Because fear drives investment, fear drives subscriptions, fear drives the, like, entire conversation. And the moment you stop being afraid and start looking at what's actually happening in your editor, on your team, in the job market — the picture looks completely different.
Paul Mason: Yeah, I'd agree with that. And, I mean, I came into this episode more on the worried side, and I'm leaving it... not less worried exactly, but more grounded. The data helped. Seeing the, like, actual numbers instead of just the headlines.
Tim Williams: That's all you can really ask for. Ground yourself in what's real, not what's being sold to you. And, you know, keep building, keep questioning, and for the love of God — stop doomscrolling AI Twitter. It's pure cortisol.
Paul Mason: Ha! Solid advice. Alright, thanks for listening to episode ten of Rubber Duck Radio. If this episode made you feel a little less anxious about the future, or maybe just gave you some, like, ammo for your next argument with an AI hype bro — we'll call that a win.
Tim Williams: We'll be back next time. Until then — deployment, not mythology. Quack you later.