Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Anthropic apologizes for invisible Claude Fable guardrails (theverge.com)

344 points by rarisma 16 hours ago | 344 comments

Avicebron 12 hours ago [-]

I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.

Fail cleanly. Anything else makes it too difficult to rely on.

edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.

bs7280 11 hours ago [-]

I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.

Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.

nl 4 hours ago [-]

I think it's a big mistake to conflate the cyber (and bio) refusals with the LLM development refusals.

I can sympathize with the argument for the cyber refusals - especially as a temporary measure - especially if Mythos is available to those trying to defend against vulnerabilities.

The LLM development nerfing (and now refusals) is very different though. Anthropic has even said it isn't just for safety reasons:

> Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

It's at least partially an anti-competitive measure.

The closest analogy is putting measures in a compiler to stop it being able to build other compilers.

Another analogy is priesthoods with secret religious knowledge that "only they are qualified to know".

dannyw 3 hours ago [-]

The Anthropic refusal description is even more direct.

“The request could assist the development of competing AI models, which is restricted under Anthropic's commercial terms. Benign machine learning work can also trigger this category.”

Source: https://platform.claude.com/docs/en/build-with-claude/refusa...

sciencejerk 11 hours ago [-]

Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.

Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.

dnautics 11 hours ago [-]

public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.

zozbot234 9 hours ago [-]

Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.

senordevnyc 1 hours ago [-]

If most people call it that, including the big labs, then maybe…you’re just out of date?

ericpauley 5 hours ago [-]

If Anthropic is calling it distillation [1] then that would argue for it being correct (or at least canonical) terminology.

[1] https://www.anthropic.com/news/detecting-and-preventing-dist...

dannyw 3 hours ago [-]

No, a company choosing to use some terminology doesn’t make it correct nor canonical in any sense; especially when they have a vested interest in not being neutral or credible.

If Google starts calling ads “Best Links” that doesn’t make it correct nor canonical; the correct term is still ads.

Traditionally, distillation is when you get the actual logits of a model response (not exposed via API for years) and then use that to train a model.

cherryteastain 8 hours ago [-]

This logic works only if distilling Claude is the only way to create another SOTA LLM, which is not the case.

maxdo 2 hours ago [-]

it's not but full path is billions of dollars vs 10-100m range to stay near sota.

the problem is so large scale that distill attempts attribute to a decent share of their token revenue generally.

sciencejerk 5 hours ago [-]

How do you think the Qwen and MiniMax models perform so similarly to Anthropic frontier models? What is your take then?

mcmcmc 1 hours ago [-]

They probably stole all the same copyrighted IP

_3u10 3 hours ago [-]

Probably the same reason a Epyc 9965 from hetzner performs just as well as one from AWS for one tenth the cost.

Anthropic is offering a commodity product and trying to convince you it isn’t.

It’s even in the name, it’s a myth and a fable. Never happened doesn’t exist.

Also I believe at least on coding that qwen is now the frontier model, fable is its copy of frontier models. In the same way that the Ferrari Luce is an expensive imitation of a SU7 Ultra.

abletonlive 58 minutes ago [-]

> Also I believe at least on coding that qwen is now the frontier model

The delusions people live in just to be a hater.

yeeeloit 3 hours ago [-]

China no. 1?

2 hours ago [-]

ryandrake 11 hours ago [-]

I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.

margalabargala 11 hours ago [-]

No need to wonder.

The answer is, the organization making the powerful tool. The people in charge of Anthropic.

Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/

You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.

Laurel1234 7 hours ago [-]

Amodei has no values, he's a hollow husk and he'd sell his family into sex slavery if it could make him a buck.

margalabargala 6 hours ago [-]

Nonsense. Everyone has values. "Make myself maximum money" is a value. "Amass maximum power over the world's information" is a value. It's clear Amodei certainly follows the latter, and I would soften the former somewhat for him; they did after all decline the Pentagon contract that would have made money but would have meant giving up some control of information.

criddell 11 hours ago [-]

That would be Anthropic.

CamperBob2 11 hours ago [-]

Well, Anthropic thinks it should be the Trump administration [1].

This whole business just keeps getting dumber.

1: https://darioamodei.com/post/policy-on-the-ai-exponential

solenoid0937 11 hours ago [-]

Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.

CamperBob2 10 hours ago [-]

No. You read the actual essay, then explain how we're supposed to interpret this more charitably:

    Frontier AI models, like airplanes, should 
    be required to go through technical testing 
    and auditing, and their release should be 
    blocked or reversed as a threat to public 
    safety if they do not meet high standards 
    of safety. I am grateful to see the Trump 
    administration’s Executive Order move 
    incrementally towards a greater role for 
    government in AI, though Anthropic’s proposal 
    recommends even further action.

They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.

yonaguska 2 hours ago [-]

I agree with your sentiment but not your conclusion. They don't want this administration specifically to have gatekeeping authority, what they want is any administration to say that they are gatekeeping, so that they can regulate the competition out of existence. Of course the actual checks and balances will be near pointless in effect, but expensive to implement nonetheless.

solenoid0937 10 hours ago [-]

This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."

jbm 4 hours ago [-]

No one is "grateful" for being labelled a security risk. The statement reads more like a Chinese "Ah Q" story than a real response.

(Unless they are piping the F1 Mercedes theme song in the announce system at anthropic, in which case maybe you are right)

ben_w 6 hours ago [-]

I can read it as both TBH.

First sentence by itself is mundane "regulators are good", which most people agree with, and also libertarians will object to regardless of leader.

Second sentence is obviously sucking up, though is the same level of sucking up found on every stereotypical LinkedIn post.

CamperBob2 10 hours ago [-]

It's a pretty reasonable statement if you work for Anthropic and are eyeing your stock options nervously and your competitors even more so.

solenoid0937 9 hours ago [-]

Everyone that isn't a bitter cynic must be a shill.

senordevnyc 1 hours ago [-]

I’ve noticed that too many HN folks seem to think that cynicism makes them more intelligent. I think it must be some kind of insecurity, about not wanting to be seen as naive or something. It’s pretty sad though, I wonder how some of these people find any peace or joy in their lives.

arkadiytehgraet 6 hours ago [-]

You got baited by a confirmed Anthropic shill, see more info here: https://news.ycombinator.com/item?id=48270186

nl 3 hours ago [-]

Confirmed by you!

I don't really agree with their point here, but there are plenty of people in the AI community whose views are aligned with Anthropic's. That doesn't make them shills.

It's actually important those views are put forward.

A place like LessWrong has the opposite problem - there is no one there who questions the "safety narrative" so the discussion swings more and more towards the extreme end of that spectrum.

antiterra 1 hours ago [-]

Wait, did you actually claim that most work at FAANGs doesn’t require an NDA and that was evidence to support your accusation?

CamperBob2 6 hours ago [-]

I hate to accuse people of shilling (and HN hates those accusations as well, policy-wise). And there are ways to defend Amodei's point, or at least there would be if he and his friends hadn't been beating the same drum since GPT2.

But I tend to agree, just saying it's a "pretty reasonable statement" and leaving it at that is beyond the pale for anyone who doesn't have an undisclosed stake in the argument.

solenoid0937 5 hours ago [-]

This is like the most milquetoast stance in the AI safety community. It's great the Trump admin did something, no one expected them to, and they should have done more. Very powerful tools released to the public should be regulated for safety.

That is "pretty reasonable" to most people (except the tech-libertarian crowd).

wouldbecouldbe 11 hours ago [-]

I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails

pwython 10 hours ago [-]

I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.

Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."

Ok fine, I said go for it, and it says:

"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."

Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.

notrealyme123 11 hours ago [-]

exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.

mapontosevenths 12 hours ago [-]

I agree 100%. Doing a worse job IS an error. It should be treated as such. Or at the very least make that behavior opt-in. The default should not be pretending like nothing happened and just quietly doing a worse job.

Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.

largbae 11 hours ago [-]

Especially if your name has any machine learning terms in it.

Ah "Mr. Monty Carlo", it says here that you have a UTI, we'll get those kidneys removed ASAP so that won't happen again.

Paracompact 10 hours ago [-]

> Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word.

Only in the same sense that Standard Oil considered themselves the stewards of petroleum. There's benefit of the doubt and then there's just fanfiction. Do not forget that this most aggressive "guardrail" of theirs was not for any safety reason, but just to stop other labs from catching up to their product. They care less about hindering bioweapons, malware, and hate speech than they do free market competition.

jstummbillig 11 hours ago [-]

> paternalism isn't a good look.

In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.

Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.

estearum 11 hours ago [-]

Basically all critiques of Anthropic's policy moves on these topics boil down to people not believing the fundamental concerns are real, and often then going a step further to conclude that Anthropic doesn't actually believe their concerns either.

If you believe Anthropic believes what they say they do, all of it makes sense.

caconym_ 4 hours ago [-]

Even if you believe the concerns have merit, it's hard not to be cynical about people (e.g. Anthropic leadership) paying lip service to those concerns while so obviously leveraging their power and wealth (which depend, by the way, on accelerating the world toward those hypothetical "concerning" scenarios as fast as possible) to position themselves such that they will become unimaginably richer if things go their way, and will also come out on top pretty much no matter what happens.

It's like a prisoner's dilemma where one party is loudly lecturing the other about the obvious benefits of cooperation while also obviously working on defecting. They want to have their cake and eat it too. Maybe they really are the pure-of-heart Chosen Ones destined to lead us around the great filter, but I don't see why I should believe that's the case when their behavior is just as easily explained as maneuvering toward being the winner who takes it all.

estearum 4 hours ago [-]

> (which depend, by the way, on accelerating the world toward those hypothetical "concerning" scenarios as fast as possible)

Yes, this dynamic is exactly the one that anyone who's concerned about AI is concerned about. I don't know why you state this as if it's evidence against the concerns lol. Someone being concerned about the incentives of a situation doesn't de facto make them immune to those incentives, obviously.

The implication that someone who's concerned about an arms race dynamic could simply opt out of the system that produces that dynamic is simply confused about what arms race dynamics are. The entire point is that it's a trap, and it's a trap even if you know it's a trap, and even if you don't like that it's a trap. There's nothing dishonest or hypocritical about being in the trap: it is literally a trap –– that is what it does and why it is bad!

I'm confused by these comments that imply people believe Dario et al are "pure-of-heart Chosen Ones destined to lead us around the great filter." Who? I've never seen it. And any AI-doomer is probably of the opinion that the entire question of Dario's or anyone else's personal moral character is 99% irrelevant. Because, again, it's a trap. The dynamics at play are so much larger than whether someone irks people for their lecturing tone. I would much rather give my money to Dario, who seems like a generally good person, versus Sama, who seems like a complete snake, but I'm under no illusions that doing so changes the fundamental dynamics that are steering us to AI doom. I doubt anyone does.

And yes, obviously they are angling toward being the winner who takes it all. That is literally the trap. If you believe what they believe, yelling "let's cooperate!" while hurdling towards the finish line and tripping your competitors is the only reasonable thing to do. That is the problem.

caconym_ 2 minutes ago [-]

> I don't know why you state this as if it's evidence against the concerns lol. Someone being concerned about the incentives of a situation doesn't de facto make them immune to those incentives, obviously.

I think you're reading some subtext into my comment that I didn't intend. Knowing myself, I assume the scare quotes there are just a bit of casual irony re: the insanely high stakes here. The word "concerns" as used by previous commenters doesn't seem equal to the context.

> The implication that someone who's concerned about an arms race dynamic could simply opt out of the system that produces that dynamic is simply confused about what arms race dynamics are.

You can, in fact, opt out. You can opt out and do your damndest to stop what's happening, throw every cent you have at it, bend any ear that will listen, make use of the fact that your voice (as Anthropic leadership) has some meaningful weight.

If you really believe that we are heading down a path that's likely to end poorly for most or all of humanity, and you are the kind of person who's inclined to favor saving billions of lives over saving your own skin when the stakes are still relatively distant, abstract, and generally unclear, opting out is obviously on the table as a grand gesture that burns your position in the race to show just how fucking serious you are. The sense of inevitability your comment shares with many others does not seem well founded---we have, for instance, not had a global nuclear war yet. Leaders in the 20th and 21st centuries have shown remarkable restraint.

If today's political and tech leaders are unable to think beyond this inevitability, for whatever reason, the worst outcomes essentially become a self-fulfilling prophecy to the extent that reality bears them out.

---

But yes, these people are acting the way they are for obvious reasons, obviously. My previous comment is reacting to the general disagreement over whether Anthropic actually believes what they say about safety, etc., or whether it's a marketing gimmick. The purpose of my comment is to explain that "it's hard not to be cynical" about actions taken by very rich and powerful people that are claimed to be in everybody's best interests but are indistinguishable from the actions they would take to maximize their future power and wealth. I think everyone ought to agree with that statement. It's not a value judgment; it's simply an observation of how it feels to be on a plane whose pilot appears to be robbing the passengers (including you) at gunpoint and is conspicuously wearing the only parachute on board.

senordevnyc 1 hours ago [-]

This is an excellent comment, and I agree. I do think that there’s also evidence that Altman’s behavior can also be explained as a person who is naturally manipulative also being stuck in the trap and responding to incentives. But not necessarily a snake just in it for himself. The thing I keep coming back to about Altman: he doesn’t have any equity in OpenAI. And he definitely could have if he’d wanted. It’s hard for me to square that with the idea of him being greedy and self-interested.

jcgrillo 11 hours ago [-]

But the things they say they believe are insane and totally unmoored from physical, societal, and economic reality. If they actually believe those things they're untrustworthy because they're delusional. If they don't, they're untrustworthy because they're fraudulent. Either way it's not good..

reducesuffering 10 hours ago [-]

They're not. They're in the eye of the storm and see what's going on the clearest. They were ahead of the curve to be where they're at now, and they're still ahead of the curve for where we're going. All the other heads of labs like Sam Altman and Demis have been saying the same thing since 2015-2016 way before any of this "marketing" would ever have been at play.

jcgrillo 10 hours ago [-]

There's a simpler explanation that fits the data better: they're lying.

Generally, in the past when tech companies have made outlandish claims that were not backed by evidence, they're later found out to have lied. This is an ancient pattern going back to the dotcom era and before, but for recent examples you need only look back a few years to the web3 era. If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying.

estearum 10 hours ago [-]

What data does "they're lying" fit better than "they're earnest?"

> If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying

Brilliant framework: Anyone making claims about the future is not just speculating, not just wrong, but they are lying.

jcgrillo 5 hours ago [-]

> What data does "they're lying" fit better than "they're earnest?"

Claim: GPT-4 is a PhD level expert.

Claim: LLMs "reason"

Claim: LLMs "understand"

Claim: AI will automate jobs away

None of these are "earnest" claims for a company to make (whatever that means). They're vacuous bullshit, and false. It's a bunch of wordsmithing papering over results that are either hidden from view or just unimpressive. Even if the person saying that shit believes it, that's not super relevant--if a company is advertising these capabilities then... just... doesn't do it, that's lying. Corporate statements aren't unilaterally determined by fallible individuals, they're reviewed, crafted products. They can be fairly critiqued as such.

> Brilliant framework: Anyone making claims about the future is not just speculating, not just wrong, but they are lying.

Not just anyone, companies in particular. If a company tells you it's building something to replace jobs, and then it doesn't do it, that company lied.

estearum 4 hours ago [-]

Lol, I can name 3 specific jobs within my own company (of <10 people) that AI prevents me from having to hire for. They've been automated away.

My company itself (possible only with AI) does the work of at least several dozen people across my hundred customers or so. Those jobs are now automated away.

Does that mean you're lying, or just overly confident (and wrong) in your speculations?

FWIW, I wouldn't put Sam Altman in the category of "earnest." I'm not sure if you just aren't aware that Anthropic and OpenAI are different companies, or if you're arguing dishonestly by trying to put sama quotes in here? But weird move in either case!

jcgrillo 4 hours ago [-]

[flagged]

kaiuhl 1 hours ago [-]

Not sure I’ve seen someone so openly hostile on HN in a while. Read the guidelines, and please share your thoughts in better faith.

jcgrillo 1 hours ago [-]

Lol thanks. Achievement unlocked!

EDIT: And, if it wasn't abundantly clear, fuck you and the goddamn fucking horse you rode in on, you horrible sack of shit. In the best faith, of course. Die in a Fucking Fire.

estearum 4 hours ago [-]

In most ways it doesn't matter, but if you're accusing someone of lying and then your evidence of that is something that someone else said, then that's lazy (at best).

I'm not sure there's anything "to get." But given your level of curiosity it's not surprising.

Web3 was absolute horseshit (and always was), so if you're blending AI with that based on the similarly grandiose claims and the extremely annoying boosterism, I think you should try hitting reset and engaging with LLMs from a cleaner slate.

shimman 11 hours ago [-]

What are you referring to? The cult belief that they are ushering in a machine god or that they strictly care about making as much money as humanely possibly while ignoring the absolutely destructive impacts these companies have had on society?

IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).

Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.

estearum 11 hours ago [-]

"Why don't they just not participate in the arms race?!" - guy who's never heard of arms races

If they believe they're creating "a machine god" and that it's better it's their machine god than someone else's (which, given the other contenders, I tend to agree with), then all the corollaries you mention are mostly irrelevant.

Whether you believe they're creating a machine god is irrelevant. They believe that they are. It would be helpful if you could create an actually good argument for why they cannot or are not creating a machine god, but it turns out there are no good arguments for why it's impossible to do so. And so... they shall try.

paulhebert 1 hours ago [-]

A lot of people would prefer nuclear deproliferation over building more nukes.

Arms races always work out great for arms dealers. Less so for the average Joe.

shimman 11 hours ago [-]

Oh okay, they're all just legit crazy and are allowed to poison the environment, murder teenagers, and ruin the material lives of millions for fantasy level delusions.

Good to know.

thewebguyd 11 hours ago [-]

Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?

Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"

estearum 11 hours ago [-]

Let's assume that Anthropic believes they're in an arms race to create a potentially dangerous technology, and they believe they're the best ones to win this race.

Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.

It makes complete sense to try to limit the degree to which that's true.

sobellian 11 hours ago [-]

It's an interesting assumption. The idea behind this with nukes was that we'd like to nuke Germany before they could nuke us. Even after we defeated Germany, we nuked Japan even though they had no possibility of getting their own nukes.

The nuclear 'race' was based on the premise that the winner could use it to destroy all other racers (a faulty assumption, see the USSR among others). I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly. But if AGI is so powerful, any monopoly would not be stable since the incentives for entry into the market are massive. Why would China stop developing AGI just because Anthropic has it?

estearum 11 hours ago [-]

Do you believe the current situation is more akin to the race to the first nukes, where no one could know for sure the other competitors were even racing...

or is it more similar to the Cold War, where there were obviously competitors engaged in the race?

And yes, agreed the equilibrium dynamics for AGI are very different (and far harder to predict) than nukes. That sounds like a good reason to be sure we get there first since presumably any potential advantage wouldn't go to the second or third runner-ups

sobellian 10 hours ago [-]

I can't really say I see a similarity to either the Manhattan Project or the Cold War. I don't see how one could apply either massive retaliation or MAD. These are private companies, they are not vested with the necessary authority to destroy anything. Even if they had it, they couldn't. You can't destroy China, they have 1.4B people, nukes, and a large part of the world's manufacturing. So multiple organizations want to do something first, that could be anything from nukes to railroads to lining up for communion wafers.

estearum 10 hours ago [-]

You think "arms race" is a dynamic that only applies to literal arms?

"Ability to literally destroy the other entity" is not a necessary or even typical feature of arms races.

sobellian 10 hours ago [-]

Well it's difficult to argue against something that was never specifically stated. If someone is able to state specifically how this is an arms race in any other way than that it's a race at all then I'm happy to have that conversation.

estearum 9 hours ago [-]

"Arms race" is the term used colloquially to describe the dynamic that emerges in "winner-take-all" markets.

It seems that the frontier labs believe they're participants in a winner-take-all market. Therefore they're in "an arms race."

Winner-take-all markets do not require that the winner literally destroys the losers, but only that the winner enjoys disproportionate returns compared to their actual superiority.

Whether or not this is actually true is TBD, but I think you're naive to think the frontier labs do not believe this to be true.

sobellian 9 hours ago [-]

I don't know why you think I'm taking anything literally, cf. my first comment. I understand what a metaphorical arms race is. I don't think that Anthropic can forestall others' AI development by getting there first. It can't be literal destruction. It can't be economic destruction (some actors interested in it aren't motivated by money). What's left? I'm all ears.

As far as naivete, wouldn't it be more naive to take their EA claims at face value, rather than the more realistic assumption that they like money?

estearum 9 hours ago [-]

> These are private companies, they are not vested with the necessary authority to destroy anything

You're pretty explicitly saying that dominating the competition is not the type of "destruction" necessary to qualify as an arms race.

> As far as naivete, wouldn't it be more naive to take their EA claims at face value, rather than the more realistic assumption that they like money?

Huh? Greed is – quite obviously – the major driving force behind the arms race. That is not a mitigation whatsoever.

sobellian 9 hours ago [-]

> I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly.

zozbot234 9 hours ago [-]

Creative destruction is absolutely a thing in the market, but the way things are going it seems more likely that open source models will just destroy everything else as far as most users are concerned. The big proprietary labs will be effectively left with Fable, GPT-Pro and Gemini Deep Research - stuff that by all indications needs very large scale compute to even feasibly run. We'll probably find out that each has its own strengths, weaknesses and viable niches, so there's no reason to expect any of those models to utterly destroy the others. They can all survive as specialty services.

estearum 9 hours ago [-]

Sure, but:

> Whether or not this is actually true is TBD, but I think you're naive to think the frontier labs do not believe this to be true.

Terr_ 11 hours ago [-]

Or if Google Chrome were blocking/degrading access to sites and services that might be useful to someone trying to make a competing web-browser.

P.S.: On reflection, it's even worse than that, because it'd trigger based on anything the user types or reads on any site. Someone mentions a "critical rendering path" and now you can't participate on that thread in the Blender forums.

jstummbillig 11 hours ago [-]

> Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?

Let's just assume it was "only" that?

It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.

Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?

thewebguyd 11 hours ago [-]

It's about how they took measures against it. Sabotaging the requests is super shady and breaks all other areas of trust in the company their models.

All they had to do was have a simple, transparent output "Sorry, that request is against our terms of service. This session has been terminated"

zozbot234 11 hours ago [-]

The hidden safeguard was not against distilling, it was against "frontier" ML research with no indication whatsoever of what "frontier" might mean, but possibly even including research into model safety or alignment. That amounts to deliberately boobytrapping research across an entire legit academic field, which is ridiculously unaligned behavior.

solenoid0937 11 hours ago [-]

This is the same as saying "well some unaligned countries will use refined nuclear material for energy, too!" lmao.

The vast majority of frontier research is about how to build better models, not about alignment.

zozbot234 11 hours ago [-]

And as a matter of fact, there's a lot of meaningful research into how to have different sorts of nuclear material that might be usable for power production but not hidden malicious development. That's the closest analog to "safety" and "alignment" in your scenario.

whimsicalism 11 hours ago [-]

They are trying to guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors. Frankly, based on my knowledge of Anthropic and the people who work there, they are very possibly right. They care a ton about this in a way that is difficult for people outside this bubble to understand.

thewebguyd 11 hours ago [-]

> guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors

All this longtermism though is harmful. There are real problems of data theft, bias, labor displacement, and environmental costs that are happening right now but every push for regulation and regulatory capture, and all the safety talk, is always focused on some speculative future machine god to distract from the current problems.

I'd have a higher opinion of these labs if the issues they openly talked about and worked toward where the real issues we face currently, not speculative defenses against some future AGI that may never happen in my lifetime. I'm less worried about "our new model might kill all humans in the future" and more worried about how we are going to address anti-competitive behavior, copyright protections, labor rights, and the energy impact.

whimsicalism 11 hours ago [-]

I cannot overstate how much I think this take is wrong. Please please reconsider, look at the rate of progress being made, and consider that even if you only think ASI 'may' never happen in your lifetime it should still be one of your #1 concerns.

Honestly, that respect for 'copyright protections' has somehow become a leftist shibboleth is bizarre to me and indicative that something has become deeply warped in our discussions around this topic.

nozzlegear 9 hours ago [-]

> I cannot overstate how much I think this take is wrong. Please please reconsider, look at the rate of progress being made, and consider that even if you only think ASI 'may' never happen in your lifetime it should still be one of your #1 concerns.

Frankly, this appeal comes across as the same kind of impassioned plea that a missionary might make when begging the faithless to repent and come to Christ before it's too late. This weird religiosity some people around here use to talk about AI, ASI and AGI is bizarre. Take what I've quoted and replace the words "progress" and "ASI" with "sinning" and "the Book of Revelations", and the zeal becomes apparent.

whimsicalism 9 hours ago [-]

Maybe if you really squint. I'm asking them to reconsider their views because the cumulative result of many opinions is policy. And yes, I'm making moral claims. So perhaps that makes it religious? I don't really think so, but I recognize that comparing things to religion is an effective dismissal tactic on here.

8note 52 minutes ago [-]

it shouldnt.

power consumption and global climate change should

ASI should be in the top 10k concerns maybe, but way below what to eat for dinner.

much higher on the fears is some hype guy pretending he has made this thing, and giving it access to too much stuff, which it then randomly deletes or misuses

it should also be in thes same range as "what if the dinosaurs came back and ate everyone"

theres tons of progress on that too. same with finding aliens

there are real present concerns to worry about, like genocides, concentration camps for immigrants, food costs next winter, ongoing wars in the middle east and europe, etc

all kinds of actually pressing stuff, that doesnt first require burning a couple trillion dollars and forcing poor people to pay through the teeth for their electricity

thewebguyd 11 hours ago [-]

There's nothing warped about it at all. Like it or not, it is a real issue. It's also an issue of license washing GPL code to privatize it. It's full scale theft of collective human knowledge, being sold back to us in a for profit private product.

Outside of that though, there are other issues right now that need addressed before we speculate about what might be possible with ASI in the future. If the potential for a harmful ASI is truly that near, and that great, then why push forward at all? Where's the push for a global stop order on development of this technology until regulation can catch up?

The talk of a potential future serves as a distraction from the very real problems people are facing in their lives today.

While Dario and team are worrying about ASI, real people are worrying about how they are going to continue to feed their family after wide spread layoffs set a very large portion of the population back into a lower quality lifestyle. Real people are concerned about water usage is draught stricken areas, the massive energy demand driving grid instability in their communities, or that the environmental and economic externalities of model training is being socialized while the profits continue to be strictly private.

What about the mass proliferation of misinformation at scale having a real effect on our democratic process?

Forgive me if I'd like to see those addressed first, and fast, before we start worrying about an unpromised future technology.

oncensher 9 hours ago [-]

The "global stop order" is just generally perceived as an impossible coordination problem. So instead we see a mix of labs voluntarily putting in guardrails and regulatory efforts (which are not only aimed at hypothetical super-AIs of the future). Of course labs are also in a competitive race. And I actually think that it does make sense that the richest companies in the most dominant positions would in a better position to worry about safety than a startup that is just trying to survive at all. And just in general, it seems reasonable that the fewer companies have access to dangerous tech the better. This isn't really about some highly speculative future tech either -- current models already pose lots of risks, and the pace of model improvement is something wildly unprecedented. Whether or not you call it ASI, the capabilities we will have two years from now are hard to even imagine properly. Also, I don't think the issues that you are highlighting are all ones that Anthropic would dismiss as second-tier. In particular, mass unemployment from AI is how we will deal with a massive devaluation of human labor is one of the most serious concerns. And about other issues, reasonable people may differ. I'm more worried about biorisk than environmental damage, for example, but clearly we should be keeping an eye on both. Serious risks and problems, just because they aren't already harming people today, are not just a distraction.

8note 48 minutes ago [-]

> the capabilities we will have two years from now are hard to even imagine properly.

unless the bitter pill is gone, extraordinarily not this. The capabilities will be limited by the training data we can create to pull information and patterns from

and then we will still be limited by compute, space, and power

mass devaluing of labour isnt particularly believable when everyones predicting that all the big labs are gonna go under trying to subsidize tokens.

thewebguyd 8 hours ago [-]

I'll concede that a lot (most?) of the problems are not technically the responsibility of the AI labs to address, and it wouldn't entirely be their fault for our government failing to get ahead of the problem. Mass unemployment, for example, is nearly 100% a political problem.

That being said, I can't help but experience a bit of Deja Vu over arguments like those around biorisk. I've seen the same exact things said in the early 2000s over widespread access to broadband and Google. When the anarchist cookbook spread around online and everyone was super paranoid about democratized terrorism, and we had big regulatory pushes for ISP level censorship and user tracking. Telecoms frequently argued that only they can keep the web safe, with strict and expensive regulations that naturally only those large heavily capitalized companies can afford to go through. Like the early internet and search, its just another way to lower the latency required for a human to find already existing public data

Well, very little of that played out. Turns out the math, for now, is the same, and information retrieval doesn't directly correlate to democratized weaponization. In 2001, a bad actor still needed a physical lab, precursor chemicals, etc to build a physical threat. Those same exact physical constraints exist today. The software cannot yet cross the digital-to-physical divide.

Keep an eye on the risk, by all means, but I don't see it yet as justification to cement a monopoly or oligopoly, nor do I see it as a reason to prioritize a risk of information availability over the climate and environmental risks that are far more likely to end the species.

simoncion 4 hours ago [-]

Yeah.

If you have a sizeable bucket of money, it's so, so easy to get folks so distracted by (or invested in) movie plot threats that they totally fail to (or have a "plausible" excuse to fail to) notice the actual, lasting harm that you're doing to society at scales both small and large.

If Anthropic had pushed hard and nonstop since their founding to ensure that all LLM companies in the world were legally bound to stop all LLM development the minute any one of them called for a halt to work, then I'd give their claims about safety some credit. They've been screaming about "safety" and "alignment" for years, but -because LLMs are impossible to secure against code injection- their products are fundamentally unsafe and always have been... I just don't trust their claims about a commitment to actual safety.

My read on their recent calls for a global "stop work" emergency cord is that they're very soon to (if they haven't already) reach a point where they will not be able to produce products that are sufficiently improved over the previous versions to justify the level of investment required for their development.

My prediction is that Anthropic and OpenAI will get serious barriers to entry of new competitors enshrined in Federal law, they will call for a "pause" or a "slowdown" in new research for "safety" reasons, and the US will attempt to engage in economic warfare with any countries that don't agree to force their domestic LLM companies to stop working on those LLMs.

zozbot234 11 hours ago [-]

ASI? We are nowhere near even human-like AGI. We have no idea if ASI is even physically possible, but going by the usual scaling laws and the capabilities of existing models, it would require raw compute and storage on an extreme scale, at the very minimum rivaling the existing AI datacenter deployments. (When Dario talks about hosting "a country of geniuses in a datacenter" at some point - which is not even ASI yet as generally projected - the operating word there is datacenter. That's the scale of buildouts you should be thinking about.) This is nowhere near a serious concern at present.

pishpash 10 hours ago [-]

Define safety oriented.

dpkirchner 10 hours ago [-]

> Are we just concluding "their concerns were never real"?

Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.

esafak 11 hours ago [-]

We've all been observing it. The recent spate of cyberexploits were powered by AI.

colordrops 11 hours ago [-]

You are arguing with a straw man. Most are saying they should be explicit with the failure modes rather than fail silently. They aren't saying there should be no guardrails.

hootz 11 hours ago [-]

What is "EA" in this context? I see a lot of people using this initialism.

massagedpelican 11 hours ago [-]

Effective altruism. A lot of the folks working on AI at large tech companies are disproportionately represented in the movement. There's a lot of overlap between EA and the rationalist community as well. The wikipedia page is a good place to start https://en.wikipedia.org/wiki/Effective_altruism

paytonjjones 11 hours ago [-]

I think it's also worth noting that EA is closely linked to utilitarianism. Most of the pitfalls that people see in EA are the same pitfalls that are classic to utilitarianism, a la "we're going to do this thing we know is locally-bad, because we have a lot of confidence in other effects that are universally-good".

oncensher 11 hours ago [-]

It's important to separate objections to utilitarianism from the obvious fact that it can very be hard to correctly apply the utilitarian calculus. It's partly because of this difficulty that most classical utilitarians thought that people should generally follow commonsense morality and not try to directly apply the utilitarian calculus (which then led to the charge of paternalism and teaching one morality to the masses and another to a supposed elite).

But there are also people who just oppose utilitarianism, like G.E.M. Anscombe. For instance, in https://integrityproject.org/wp-content/uploads/2015/07/mr_t..., she seems to grant that dropping the nuclear bombs on Japan was probably good from a utilitarian perspective (because it saved lives overall) and also to grant that bombing campaigns that necessarily entail massive civilian deaths (including, apparently, area bombing German cities) are morally permissible but still to argue that dropping the nuclear bombs was impermissible because it constituted murder ("intentionally" killing the innocent). But this kind of distinction, which I think is what actual anti-utilitarianism must come to, is hard to even consistently maintain, and I suppose many HN readers would find the effort quixotic.

mswphd 10 hours ago [-]

The first half of your answer presupposes some platonic utilitarian calculus that, if it were applied correctly, would yield moral outcomes. This is very hard to believe. If I look at notable/well-known examples of EA-affiliated people, it is hard to skip by members such as SBF. Did he correctly apply the utilitarian calculus?

It is relatively easy to take the proceeds of a massive fraud, buy a relatively small (as a percentage of the fraud) $ amount of mosquito nets, and save more lives than the lives impacted by your massive theft. Is this a correct application of the utilitarian calculus? What sort of data would we need a priori to do this calculation "correctly"? Do you think he had a careful estimate of the suicide rate of victims of ponzi schemes before perpetuating the fraud, or would any suicide rate have made the decision net [pun intended] moral, as any such victim of fraud would lead to >> 1 net purchased (so you would almost always net save lives).

The above is of course snarky. It is also a best-effort way of analyzing a notable utilitarian's actions. I do not think it would be difficult at all to use this type of argument to argue that SBF's actions net raised utility in the world. If only we all would become fraudsters, then we could truly live in Omelas --- a notable utilitarian paradise.

oncensher 8 hours ago [-]

Yeah, I didn't mean to downplay how hard it is to apply the utilitarian calculus or even to suppose that the bare doctrine of utilitarianism resolves questions about what the ultimate good we should be trying to maximize is. I basically agree that utilitarianism is not a complete recipe for how to live. I just think that it probably gives the correct answer in cases where we can see clearly how to apply it because I'm skeptical of theories like Anscombe's. Which is to say that utilitarianism is a big tent.

Now if we look at EA, the basic tenet of EA seems obvious -- basically just utilitarianism. And from what I've seen, in practice also, EA is a pretty big tent. I don't know the specifics of SBF's case, but I think essentially no one thinks that he acted correctly. I don't know how many mosquito nets he bought, but I agree that if he bought enough, it might be that he net raised utility, and if that is so, it's something to be thankful for. But it doesn't make him some kind of utilitarian saint unless he couldn't have done even more good by some other course of action that wouldn't have hurt the ponzi scheme victims and brought opprobrium on the whole EA movement

mswphd 7 hours ago [-]

This kind of reasoning leads you to reasoning that if he was an ineffective fraudster, it would be less moral, as he would have bought less mosquito nets. So it’s not only moral to do fraud, but you most extremely competently do fraud.

I think this being a reasonable utilitarian point to make is not a point in utilitarianism’s favor.

paytonjjones 3 hours ago [-]

This point is very similar to the core plot of Watchmen

paytonjjones 10 hours ago [-]

[dead]

whimsicalism 11 hours ago [-]

EA essentially just is utilitarianism + a specific type of culture/community.

8note 47 minutes ago [-]

not to mention all the theft and feeling good about yourself being rich

iamacyborg 11 hours ago [-]

They performed famously well at FTX.

whimsicalism 11 hours ago [-]

Guess FTX disproved the concept of giving to effective charities, time to start donating to my church again.

notahacker 10 hours ago [-]

What FTX decisively disproved was the idea that people's origin stories involving apparently sincere desire to do good in the world and them constantly broadcasting that should be used as a reason to unquestioningly trust them when their notion of greater good happens to align perfectly with them accumulating enormous quantities of wealth and power. (and Sam, bless him, originally wanted to help animals rather than own the machine god. And probably sincerely believed he was going to do great things for humanity from all the misappropriated funds he was definitely going to win back against a backdrop of EAs and VCs queueing up to glaze him and his commitment to the greater good)

I don't think people are objecting to the EA idea that some charities are more evidence based than others so much as the distinctly EA idea that it would be more effective still to donate to charities like OpenAI

tancop 10 hours ago [-]

todays EA is not about giving to charities, that was the original mission with 40k hours and ethereum (i think vitalik still believes in this version). then the yudkowsky xrisk/ai safety crowd took over lesswrong and turned it into a cult.

now its utilitarianism taken to the extreme. if you believe a skynet scenario killing everyone on earth is plausible then the "logical" thing to do is allow literally anything in the name of stopping it. that includes mass murder and dictatorship. the only thing that can balance the infinite negative value from an evil machine god is the infinite positive value from a good machine god.

thats the main difference today, one faction around sam and dario believes in creating the good ASI first and sacrificing all the world resources to do it before someone makes the bad one, the more pessimistic like yud want to stop all ai development to reduce the risk that an evil god is made to zero.

at this point its basically a religion.

8note 46 minutes ago [-]

yudowski took over lesswrong?

isnt that literally his thing since the 90s or something?

11 hours ago [-]

mrits 10 hours ago [-]

If you ban women from driving you can eliminate around half the car accidents. Don't you want to reduce car related deaths??

carlgreene 11 hours ago [-]

Effective Altruism I think

photochemsyn 11 hours ago [-]

It’s rewarmed rhetoric from the late 19th/early 20th century, most effectively pilloried by Joseph Conrad in “Heart of Darkness” in the character of Mr. Kurtz:

> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”

The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.

> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”

Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.

jcgrillo 11 hours ago [-]

"crypto bros" to a first approximation

11 hours ago [-]

bsder 3 hours ago [-]

> paternalism isn't a good look.

Anthropic doesn't care. The goal right now is simply to avoid any and all bad PR on the way to the cashout IPO.

And paternalism will generate far less bad PR than somebody using AI on something that does real damage and makes headline news.

8note 44 minutes ago [-]

people cancelling their subscriptions doesn't look great either

same with bad press about their model sucking after they said its even better than sliced bread - sliced bread that will destroy the world if buttered

joe_the_user 11 hours ago [-]

The problem is that Anthropic seems to be working up to the workflow one would naively want from AGI/some-god-like-entity.

The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.

The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.

dantillberg 10 hours ago [-]

User: Is it possible there is more than one true god? Could there ever be any competition for Anthropic's AI?

Anthropic: Evilness detected. User has been smited.

tacone 10 hours ago [-]

That also means people are paying money to execute a prompt they've (partially) written.

cvadict 11 hours ago [-]

> Fail cleanly.

This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.

"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.

thinkingtoilet 11 hours ago [-]

Was it modifying the prompt? I thought it only kicked the request down to 4.8.

tobinfekkes 3 hours ago [-]

Can you imagine if Excel just quietly adjusted formulas in the background, and you didn't know the numbers weren't right?

Or if Excel just said, Sorry, you can't use that formula with this formula? Or with these types of numbers, or this shape of data, etc?

raincole 7 minutes ago [-]

Can you imagine if printers just refuse to print something just because a few circles are arranged in this shape?

https://en.wikipedia.org/wiki/EURion_constellation

Terr_ 3 hours ago [-]

That analogy is... Not inappropriate, but I think it could confuse by being compatible with two different problems, where only one is the target of today's controversy.

1. The sloppy/unpredictable behavior of LLMs as a general class of algorithm, how you shouldn't use document-generation for calculating budgets, and you shouldn't trust it to not-alter things you "asked" it to to alter.

2. Vendors of thing-as-a-service (not necessarily only LLMs) putting in traps and sabotage to prioritize their own business-model or economic incentives.

hedora 3 hours ago [-]

They implemented both those things, but only apologized for the first. They’re doubling down on the second.

My limited experience with fable over the last few days suggests (1) I can’t see any improvement in output, and (2) it is useless for writing secure software because it constantly hits safety walls if you ask it to close security holes.

I’m definitely shopping around for other LLM providers next week, and testing vs local (target: 128GB strix halo - any war stories?)

coreyp_1 2 hours ago [-]

With 128 GB strix halo, you can't do as big of a model as you would think. You can do larger than having a single graphics card, of course, but that 128 gigs cannot all be dedicated to the model. Remember, the context alone is usually larger than the model itself. I got an EVO X2, and I don't regret it, but by my current calculations, it will take 8 years to recoup the cost, as opposed to just using equivalent, paid commercial options.

hedora 18 minutes ago [-]

My current rule of thumb is 1GB gets you 1B parameters with a big context. (Qwen 32B fits in 32GB with 200K+ contexts)

That’s with heavy compression of the weights and the context, of course.

I haven’t gone through model evaluation + shoehorning at 128GiB yet.

smilekzs 2 hours ago [-]

A key consideration in favor of running your local LLM despite all the trouble: The commercial serving endpoint may not exist tomorrow, or at least not at the same price.

raydev 3 hours ago [-]

Not really, the purpose of Excel is pretty clear cut and the scope is small.

Preventing a human-like general purpose textbot from engaging in certain discussions and performing certain tasks seems like a natural thing to do given the massive scope of its capabilities. None of these tools are sold with free license to do whatever with them anyway.

ryoshu 3 hours ago [-]

No. Excel is a general purpose tool that can be used for calculating tasks that are good, neutral, or evil things. It's a fancy calculator.

skeptic_ai 3 hours ago [-]

What’s the point when they will remove those guardrails when competition reaches their levels. Shows that they don’t Reddit care about “safety” at all

maxdo 2 hours ago [-]

you invest billions of dollars many months of work to just everyone distill your model?

DaSHacka 2 hours ago [-]

>be me

>anthropic

> mine the internet for data, blasting millions of blogs with scrapers

>a few have to shut down, but that's just the price to pay

>finally, the chatbot is ready

>learn that there are EVIL cretins out there trying to scrape automated output from OUR product to build their chatbot

>build in safeguards to new model to stop this

>the users are mad, now the model accuses users of being bioterrorists if they so much as mention they have a cold

>mfw

scoofy 2 hours ago [-]

Seriously... the gaul of people just scraping a model for free data!

user_7832 56 minutes ago [-]

You wouldn't download an LLM for free, would you?

Ucalegon 2 hours ago [-]

That might be an indication that the business is not sustainable because there is not any technical or practical differentiator besides scale. Harming your customers to maintain that differentiation isn't sustainable either.

maxdo 2 hours ago [-]

any intellectual labor is not sustainable, if anyone can copy your data. why have microsoft, i you can just copy windows and run it?

Ucalegon 2 hours ago [-]

Have you copied Windows and tried to run it? I would love to see the plain text source code that you claim to have. We all would.

maxdo 2 hours ago [-]

half of the developing world did. guess what it stopped a bit the trend? protection.

Ucalegon 1 hours ago [-]

There is a difference between being able to validate a Windows license and copying Windows from source code.

If we are talking about distillation vs building from scratch, none of these are congruent to Windows. I can build my own LLM [0] and then distill off of Claude, but that is not the same as a 1:1 copy of an operating system because there was the ability to crack how licensing works. We are not seeing Windows clones, at the source level, for that reason.

Also, Linux exists. Anyone can copy that. Why doesn't that count?

[0] https://huggingface.co/docs/transformers/quicktour

user_7832 46 minutes ago [-]

Did it really? Here in my <large 3rd world country> at least, afaik no one's stopped pirating. The tools to activate may have changed but haven't gone away.

wahnfrieden 2 hours ago [-]

It's the game. Because consumers reject it otherwise.

Why go to bat for anti-consumer behaviors unless you are a shareholder?

Their billions are not my problem; but the money I pay them and service I get in return, is. And if they can't provide, I will shop elsewhere (and do).

like_any_other 2 hours ago [-]

You invest billions of dollars in hosting and benefit from hundreds of millions of man hours of human output, just so everyone trains on "your" data?

Sol- 11 hours ago [-]

This has dampened my opinion on Anthropic quite a bit. It's difficult to take their marketing for AI as an empowering technology seriously when they are quite clear in their new deployments that they do not mean empowering for you, but empowering for them and organizations that are in their (or the US government's, despite Anthropics performative disagreements with the administration) good graces. You are allowed to vibe code some dashboards, a web app or let it drive Excel, but anything more interesting than that is forbidden.

If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.

thewebguyd 11 hours ago [-]

Don't forget their push for full regulatory capture in the name of "safety" as well so they can pull the ladder up behind them before anyone else has an equally capable model and releases it without the anti-competitive safeguards, while also pushing to completely ban open weight models, or any model trained on a certain level of compute without "rigorous" government testing and validation (which I'm sure, they'll conveniently provide the framework for).

Dampened opinion on Anthropic is an understatement.

reactordev 11 hours ago [-]

They are the only ones I’ve contacted my bank to get a charge back on…

trhway 9 hours ago [-]

i wonder if some lawyer may see a consumer protection class action here. In my view the Stuxnet that Anthropic pulled over its customers isn't much different from say those unauthorized extra accounts by Wells Fargo.

reactordev 6 hours ago [-]

exactly my thoughts as well when I got my money back.

xvector 11 hours ago [-]

[flagged]

thewebguyd 10 hours ago [-]

> asking for domestic safety testing of frontier models only is not regulatory capture

It very much is regulatory capture. The goal is to make it so only the handful of heavily capitalized tech giants and frontier labs can afford the legal and compliance rigamarole to meet the new standards. It's an effort to crowd out open source development and smaller competitors (and foreign competitors which threaten whatever moat they may have). They define safety through some speculative catastrophic threat to prevent new upstarts instead of focusing on the very real, localized harm they are causing right now.

Its also shifting the definition of safety away from their current operations and toward purely speculative future scenarios.

raincole 11 hours ago [-]

What backward logic is this? PRC doesn't give a fuck about how US regulates AI companies. Pushing more regulation would ensure that Chinese companies catch up sooner. If you think otherwise you need to think harder.

rurp 8 hours ago [-]

It's a good thing you weren't in charge of nuclear arsenals during the Cold War, sounds like your approach would have been unchecked proliferation.

Fortunately developing frontier models takes immense amounts of specific resources and knowledge. There are only a handful of companies capable of developing new cutting edge models. This is an area a few governments absolutely could coordinate on and regulate, if they were so inclined.

Obviously the current US administration is completely lacking both the will and competence to actually negotiate an agreement like that with China, and who knows if Xi would even be interested. But with different leadership we actually could be reducing our existential risks in this area much more than we are. Just like having a few thousands nukes across several countries isn't totally safe, but it's a heck of a lot safer than hundreds of thousands of nukes spread across a hundred countries.

raincole 7 hours ago [-]

> It's a good thing you weren't in charge of nuclear arsenals during the Cold War

You know how many nukes Soviet had right at its peak? Hint: much more than the US by the time. Non proliferation didn't stop Soviet from building more nukes at all. And it's not going to stop China from pouring more computing power into AI. History is a really good lesson.

The whole point of non-proliferation is to ensure that big boys like the US and Soviet can bully smaller guys like Venezuela and Ukraine. In this regard, non-proliferation is the most successful foreign policy ever. But it didn't win the cold war and a similar policy over AI will definitely not win the AI race (if it's a race worth winning is another issue.)

oncensher 9 hours ago [-]

The original topic was Anthropic's guardrails, which were meant in part to stop China from using Anthropic's models to bootstrap their own. I take it the logic of the comment was that pulling attention to Anthropic's stance on regulation is switching to the topic. But for what it's worth, I also think that people are way to quick to assume that strong regulations would only help China and thereby hurt safety. There are many reasons why the opposite may be true: - reducing demand for Chinese models reduces the incentive for Chinese companies to make them - if US companies can't use Chinese models, they won't have an incentive to help their development - China may enact similar regulations if the US leads, either out of concern for US safety or for commercial reasons

Also, I think some similar things can be said about AI safety measures in China aside from regulation. Currently, the US leads in model safeguards, but it isn't like China has zero interest in AI safety. Even if the US and China are rivals, there are many points of common interest (biorisk and "sci-fi" scenarios like an AI takeover, to name just two).

thewebguyd 9 hours ago [-]

I don't subscribe to the belief that regulations in the US will lead to China advancing further.

But I also don't buy into the "China bad" narrative that gets frequently spread in online circles and in political circles. Its the cold war all over again, but this time its China instead of the Soviet Union.

Regardless of that, the regulations being proposed by Anthropic recently are not focused on the current issues which is my problem with all the hype marketing around hypothetical AGI/ASI. What is being proposed to be put in place will further cement the current frontier labs in their marketing leading position, and work to block new entrants, and open source competitors. That is the problem.

The other problem is none of them are talking about the real, difficult issues we are experiencing right now in the present. We don't need to talk about a sci-fi future scenario to recognize that LLMs have already caused and are causing harm in the real world. "We should probably regulate future frontier models" does nothing to help the current issues.

Wake me up when Anthropic says "The government should immediately stop us from hoovering up data and selling it back to the public. They should immediately stop us and others from enabling misinformation at scale that is already negatively effecting our democratic process. They should immediately stop us from building out new data centers until we have a large scale switch to renewables in the country, shore up the grids, or force us to generate our own power only with renewables" so on and so forth. Notice how any time the labs propose regulations, its only for a future hypothetical super intelligent model. Its never about their current operational liabilities.

thewebguyd 11 hours ago [-]

And why would any regulations put in place in the USA affect the PRC in anyway whatsoever? They wouldn't. China will continue to push forward and govern things in their own way, we have zero jurisdiction over China.

So yes, it is regulatory capture.

dragonwriter 9 hours ago [-]

> asking for domestic safety testing of frontier models only is not regulatory capture.

Yeah, asking for additional state-provided barriers to a market entry to a valuable market a provider already is one of a narrow few dominating only for firms that are a competitive threat is exactly regulatory capture.

gmerc 9 hours ago [-]

Ohh, the red scare, never gets out of fashion. Meta's David Marcus in the Senate: If you don't let use launch crypto, the chinese will win.

The Chinese banned crypto instead

inigyou 9 hours ago [-]

They're not even red any more. They're fully capitalist with dictatorship characteristics.

Cpoll 11 hours ago [-]

How does US regulatory capture do anything to impede PRC's advance?

shimman 11 hours ago [-]

Nothing, they are just trying to scare monger the public and prime the pump for a massive bailout when it crashes out because apparently China are the big bad meanies.

solenoid0937 10 hours ago [-]

You'd be fine if the PRC gets to ASI first? That's an interesting opinion.

thewebguyd 10 hours ago [-]

It has nothing to do with being "fine" if the PRC or anyone else for that matter get to some speculative and hypothetical ASI first. There are zero US regulations that would be effective to prevent that.

US regulations apply to US companies and citizens, exclusively. Anthropic crowding out all future potential competitors in the US via regulatory capture has no weight on what the rest of the world does.

Unless you are proposing military action over a speculative sci-fi future

zozbot234 10 hours ago [-]

PRC labs reportedly aren't even thinking about getting to ASI, much less trying. They think of AI as a technology that can provide utility across the board even without anything like superhuman smarts.

ff3 10 hours ago [-]

A lot of this lust for ASI is driven by America attempting to cling onto the power it has wielded over the world over the past 50 odd yrs.

It smells of paranoia.

solenoid0937 10 hours ago [-]

Nope, they're accelerating towards superhuman smarts as fast as they can too.

nozzlegear 10 hours ago [-]

Your loaded question presumes that "ASI" is anything more tangible than a useful marketing myth.

dragonwriter 9 hours ago [-]

> You'd be fine if the PRC gets to ASI first?

How do rules that inhibit what AI can be sold on the US market (adding additional costs to trading in that market) do anything to inhibit a competing nation from reaching ASI first? Insofar as they inhibit anyone from reaching ASI, its firms whose primary commercial interest is selling AI services in the US market, not foreign threat actors except to the extent those two categories overlap.

axus 9 hours ago [-]

Yes, why wouldn't I be? How is that worse than China getting it second?

shimman 10 hours ago [-]

No, because there is zero reason to think LLMs will lead to it but we do know that the massive LLM investment has a huge financial risk for the US. Not too mention it's exacerbating the climate crisis (you know the actual thing that might end civilization, not a fantasy delusion of AGI), giving citizens cancer that live next to data centers, the extreme decrease in quality of life, and the misallocation of capital while Americans lack healthcare, childcare, housing, and education.

Also don't believe China is actually a threat to the world. That's some cold war delusional think you got there.

All the companies seem to believe is that it's okay to immiserate a large percentage for the pursuit of money, you seem to believe the lies they're feeding you.

10 hours ago [-]

theLiminator 10 hours ago [-]

This take is ridiculous, the PRC is not going to care at all about US regulations.

CuriouslyC 9 hours ago [-]

Right now the PRC is looking like the adult in the room. They also have a view of how AI should work that's smaller and more worker centric rather than trying to create superintelligent worker replacements.

The PRC (like any superpower) has done some bad shit, but if you're going to paint them as the bad guy keep in mind the USA has a long, long history of genocide, slavery, overthrowing foreign governments for corporate interests, unjust wars, political meddling, etc. The scales of righteousness don't tip in our favor TBH, we just have better PR and a nicer veneer over our brutality.

thesmtsolver2 8 hours ago [-]

> Right now the PRC is looking like the adult in the room

Only if you ignore history.

Didn't the PRC violate every known labor/enviromnetal/human-rights standard to become the top in manufacturing?

https://matthewekahn.substack.com/p/what-role-did-regulation...

CuriouslyC 7 hours ago [-]

The US did the same thing. Environmentalist and workers rights movements date back to the 19th century. China's position on this is that the western nations that already developed are trying to pull the ladder they used up and wag a finger with false morality with the intent of maintaining global hegemony.

thesmtsolver2 7 hours ago [-]

> The US did the same thing.

Except that there were no global standards at the time. You can't point to any single country and say they were doing worse. They all were bad.

But China actively flouted established international norms. Now that is behind in AI it is clamoring for controls for others.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3692695

> are trying to pull the ladder they used up

Every country spies and steals but it is the scale we are talking about. China does it at a scale that dwarfs any historical or current comparisons.

China doesn't have any grounds here when they turn around and complain about India copying its playbook:

https://economictimes.indiatimes.com/industry/renewables/chi...

dingaling 6 hours ago [-]

> Except that there were no global standards at the time

England had a patent system from the mid 15th Century which emigrants to the New World brazenly ignored in order to set up their own industry.

Of course, they then pulled the ladder up behind themselves in 1790 with the establishment of their own patent system...

notahacker 10 hours ago [-]

I didn't downvote, but HN probably remembers when Anthropic's competitor was a "charity" that cared deeply about AI safety whose marketing gimmick was GPT-2 being too dangerous to release.

Anthropic's founder wants you to buy into his vision for safety, but he also wants you to buy into his vision that in two years AI will be a "country of geniuses" that will update itself, and the IPO that will fund it...

satvikpendem 9 hours ago [-]

The flawed premise is thinking that AGI is a real risk, and that they care about it more than making money, that is why HN does think it's simply regulatory capture.

iAMkenough 11 hours ago [-]

I don’t think they’re mutually exclusive. It’s a business selling a product that isn’t yet profitable, not a public advocacy organization.

11 hours ago [-]

antonvs 10 hours ago [-]

> "Why does a company that cares about the dangers of AI/ASI and x-risk, not want the PRC to catch up to the frontier?"

Because it’s a threat to ultracapitalist dystopia that they’re tripling down on. The dangers and risk are coming from inside the house.

The danger they care about is the danger to their monopoly, control, and wealth.

11 hours ago [-]

californical 11 hours ago [-]

Yeah, I cancelled my Claude subscription yesterday after learning about their attitude of intentionally sabotaging their paying customers.

Especially after trying Fable yesterday for some benign projects and being unimpressive relative to opus.

Rolling it back is the right move, but I’m still not convinced that using them is in my best interest anymore, I’m investigating open source cloud providers now.

solenoid0937 11 hours ago [-]

Opus is nowhere close to Fable. Fable feels at least one generation ahead to me. https://x.com/hyperagentapp/status/2064396004032463157

Edit: OpenAI will launch a similar model soon and I can't wait. We are entering a new era of agents.

CuriouslyC 9 hours ago [-]

Models are spiky. In some narrow domains (cybersecurity, for instance) it will be a generation ahead. On the other hand a lot of people don't see a measurable difference between Opus ~4.5 and 4.6/7/8, because Anthropic taught it how to do some hard stuff better, but they didn't give it better taste or make it produce cleaner solutions to simpler problems.

zozbot234 10 hours ago [-]

Fable is very much an incremental development over Opus, and even more incremental when properly compared to its existing counterparts GPT-Pro and Gemini Deep Research.

11 hours ago [-]

kolinko 10 hours ago [-]

itintheory 11 hours ago [-]

Care to share any specifics?

noworriesnate 10 hours ago [-]

I have a design for a really complex software I want to build and there were gaps I knew of in the design. Opus couldn’t identify them but Fable did. I’m just talking about it reviewing the design, not coding. But yeah, it’s insanely expensive. It does spin off sub agents so I suspect it might be cheaper if you had it create a bunch of plan files and then pointed deepseek at this plan files or something like that

conjectures 10 hours ago [-]

What does this even mean?

beng-nl 10 hours ago [-]

Can you write a more specific question? I think the meaning of the comment is clear enough, but maybe you’re asking for more specifics? Ironically I can not understand what you are asking for with such a generic comment.

conjectures 8 hours ago [-]

> This is one awesome above a level.

> What does this even mean?

> What do you mean what does this mean?

...

solenoid0937 10 hours ago [-]

I added a link.

arkadiytehgraet 6 hours ago [-]

[flagged]

frereubu 5 hours ago [-]

Looking at the comment thread you linked, this kinda looks like harassment by you rather than anything "confirmed". You seem to have an unhealthy fixation on this user, who may just be a Claude enthusiast rather than a shill as such.

varenc 11 hours ago [-]

Google has been doing the same thing for longer than Anthropic[0]. To protect their models from distillation attacks, they silently will downgrade the model's performance to essentially poison your training data without your knowledge.

A bit different than Anthropic refusing to assist with any AI development at all, but it's in the same vein and seems not widely known.

edit: reading the whole series of Google's AI Threat Tracker articles also provides some insight into threats Anthropic and others are dealing with

[0] https://cloud.google.com/blog/topics/threat-intelligence/dis...

chiwilliams 7 hours ago [-]

Thanks for flagging this. This is interesting

Rapzid 11 hours ago [-]

"Only I can save us". It's a classic tragedy and cautionary tale.

The idea Anthropic was going to speed run AI so they could control the usage and make it "safe" for humanity was never altruistic; it was a HUGE FUCKING RED FLAG.

DANmode 4 hours ago [-]

Benevolent dictators work.

But, looking to a US corp to be one?

That’s daft.

hungryhobbit 4 hours ago [-]

They do, but only for specific definitions of "work". Like, benevolent dictators in Cuba 100% raised the literacy rate by an insane amount in just a few years (something like 20% => 80%").

If you define work as "literacy", they no doubt succeeded. But if you consider the people (and children) they tortured, raped, and murdered, suddenly literacy doesn't seem so important.

DANmode 1 hours ago [-]

I meant in the context of a software project, for example.

xvector 11 hours ago [-]

You're right, they should just not even try and turn off all safeguards on frontier AI. What could possibly go wrong? It's not like a bunch of companies and nonprofits have said the model finds zero days at the press of a button!

gmanley 10 hours ago [-]

Correct, they should. If there are zero days out there, then they should be able to be found by everybody, instead of only being found by the select elite that this model is available to. Though, I very much question the truth of said ability.

8 hours ago [-]

thewebguyd 10 hours ago [-]

And? Now all the zero days, if thats true, get discovered and patched instead of being exclusively hoarded by the select few governments and Israeli spyware companies.

Sounds like a great thing to me.

olbeardGear 10 hours ago [-]

[dead]

vlan0 11 hours ago [-]

Corporation cannot help but act this way. They are too big. The pressures for profit are all that matters. That is the priority. It doesn't matter what colorful words they put on the paper to make you feel better. Look at the "green" movement 20 years ago. All talk and no action.

Stop supporting organizations that don't put humans first. Don't believe a word that anyone says. Lip service is free

rurp 8 hours ago [-]

Yeah I'd say this has been a big concern ever since it turned out immensely expensive training methods could create effective frontier models. So far at least, open source models have kept up better than I expected, but they definitely lag the top ones and there's no guarantee the gap doesn't widen further.

Imagine the software world if Linux never existed as an effective OS and Microsoft + Apple had completely controlled computer platforms for the past decades. I think it's almost certain that both companies would be even more profitable, and the tech industry would be vastly less free and more dysfunctional .

tlb 10 hours ago [-]

Yes, that is basically the plan. It's based on the belief that unfettered AI would let anyone be a supervillain and destroy the world. There are enough would-be supervillains out there, but they rarely get far because they can't get teams of smart people to build doomsday machines for them. So the AI has to not let anyone do evil with it.

Unfortunately, that won't feel very much like freedom.

lebovic 10 hours ago [-]

It sounds like you might not agree with that belief.

While I don't agree with their actions here, I do think there's sufficient reason to hold that belief.

On some fronts (e.g. security, on which you've experienced more than me), I think there are surmountable challenges. But on other fronts (e.g. bio), a single errant actor could reasonably kill millions or billions of people with sufficiently powerful AI. We don't have good defenses here, and those actors do exist.

I still don't agree with these actions, but I do think I agree with their assumptions.

zozbot234 9 hours ago [-]

The model release cards for Opus have repeatedly and consistently stressed that the model doesn't have the fiddly know-how that's required to provide meaningful assistance in possibly dangerous subfields of biology. Mythos (Fable without the overly strict guardrails) has shown improvements in things like drug design, but even then the situation isn't really that different. This risk is ridiculously overblown, and the way to manage it sensibly is to introduce meaningful oversight for actors that seek to order the actual specialized materials involved (especially any synthetically generated genes/proteins/whatever).

lebovic 9 hours ago [-]

No, Anthropic's model cards have claimed that the models don't show considerably more uplift than previous ASL-3 models, which already showed material uplift.

I participated in the internal bioweapons uplift test for Sonnet 3.7, and even then, one non-expert got huge uplift from the model [1]. I'd consider evals a lower bound of capabilities that can be elicited from a model.

The team behind Biomni, a biomedical agent that's widely used by researchers, has continued to find consistent gains between models [2]. I trust them, because I visited them to build their HPC tool [3], which the model is quite capable of using – moreso than most grad students. The Biomni team cares a lot about about real usability for real researchers, so they have a great pulse on capabilties.

SecureBio also has some public evals [4], which have continued to show increasing uplift.

And while synthesis monitoring is a part of the solution, I think you might underestimate how much goes under the radar. See the Reedley lab incident for an example [5].

Is Anthropic still effectively throttling beneficial biomedical research? Yes! And so is OpenAI. But the underlying capability is still actually dual use.

[1]: See page 25 in https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852...

[2]: Their benchmark has a preprint at https://www.biorxiv.org/content/10.64898/2026.05.12.724604v1...

[3]: https://x.com/phylo_bio/article/2029233694775624096

[4]: https://securebio.org/

[5]: Search for "ebola" in the public report for the Reedley lab incident at https://chinaselectcommittee.house.gov/sites/evo-subsites/se...

zozbot234 8 hours ago [-]

> No, Anthropic's model cards have claimed that the models don't show considerably more uplift than previous ASL-3 models, which already showed material uplift.

Doesn't this simply amount to disagreeing about what counts as "meaningful" from a bio-safety POV? Also, even the ASL-3 deployment safeguards for Opus 4 and higher were always adopted as a mere matter of caution; it's not clear that even Anthropic believed at any point that this reflected any genuine "threshold crossing" event. So it's just not obvious how much weight we're supposed to place on that particular stance.

lebovic 8 hours ago [-]

In normal bio, there are standardized biosafety levels, because without it there would be no standard agreement on what "meaningful" safety is. So yes, I do think there's ambiguity here.

But I don't think I've found any domain expert who thinks granting everyone raw access to the most capable models wouldn't meaningfully increase risk. OpenAI recently staffed a biological threat modeler to help quantify this risk.

(Edit: just saw your edit, this includes at Anthropic. ASL tiers were "rule-out" to exclude rather than "rule-in", so exact thresholds were murkier, but I think it's clear that models have passed that threshold by now.)

That said, there are clear steps and requirements to set up a BSL-2 or BSL-3 lab, and I think there should be similarly clear rules around model capabilties and access. The process for Anthropic and OpenAI is murky and still implictly gated on spend, which I think is holding back research.

For example, anyone who has access to a BSL-3 lab should have a clear and low-cost path to a model with corresponding capabilities, as long as they set up corresponding precautions for model access.

I think it would be a bad outcome for only frontier labs and a select few groups they choose to have access to the most capable models – which is sadly the precedent that's currently being set.

zozbot234 8 hours ago [-]

> But I don't think I've found anyone who is a domain expert who thinks granting everyone access to raw modes wouldn't meaningfully increase risk.

It depends how capable these raw models are. Biology as a field depends most on real-world knowledge, which is an expensive capability for open models targeting widespread deployment. It's quite plausible that even Opus 4 would be a lot more capable in these domains than the best universally accessible "raw models" today, quite unlike other domains such as coding or pure math. The securebio.org benchmark has spotty representation of openly available models, but it does show Kimi 2.5 being no more capable than GPT 5 mini, and clearly below o4-mini and Opus 4.0; which may be a plausible summary of where things stand today.

lebovic 7 hours ago [-]

That's a good clarification. I've updated my comment to the "most capable models" to refer to the most recent releases.

And sure, and I love open models – I spent much of the past couple months doing additional RL on Qwen 3.6 35B A3B, Gemma 4, Kimi K2.6, and GLM 5.1. Without these open models, I'd be forced to do my research inside a frontier lab.

There's a balance to strike here, but I don't think the biological risk is overplayed. It would be very easy to accidentally cross the threshold of "meaningful" without adequate safeguards, and then be unable to undo what you've released to the world.

giancarlostoro 10 hours ago [-]

Even with them making those guardrails visible, it's a bit ridiculous in my eyes. I have been experimenting with smaller models, will Claude assume I'm some Chinese or Russian agent trying to distill their secrets and bar me from learning? Because that's insane. What if I discover a more efficient way to build models with Claude? Well, we'll never know now. What if someone else entirely could discover a breakthrough in how we design and build LLMs.

ff3 10 hours ago [-]

The whole shtick is to get you addicted whilst reducing your ability to go without, acquire power over you, jack up the prices whilst manipulating the quality of the tokens/output available to you.

Cant believe how stupid people are. You couldnt see this coming? Shame on you.

giancarlostoro 3 hours ago [-]

I already made up my mind, I'm not using that model if its sending proprietary code over to Anthropic, they can kiss my rear. If every frontier model winds up doing this, I will stop using them. There's plenty of employers / jobs where this is not okay behavior from an LLM.

satvikpendem 9 hours ago [-]

First time? They've always been misanthropic, ironically. They seem to hate their users and think that their AI is so dangerous it'll destroy the world and not to be trusted, I mean Anthropic was literally started because people at OpenAI thought the latter was too forgiving on "safety."

inferniac 11 hours ago [-]

Wouldnt call their goverment disagreements performative, they genuinely believe they should be the only ones deciding what AI can and cannot do

11 hours ago [-]

squigglingAvia 4 hours ago [-]

And we subsidize them (AI companies in general) with our tax dollars.

hungryhobbit 4 hours ago [-]

But, to be fair, we subsidize all of corporate America, not just AI companies.

dragonwriter 9 hours ago [-]

> If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands

But that is “plain monetary concerns and sabotage of competitors”, they are just more ambitious than most people doing sabotage of competitors in the fields they hope to dominate by that tactic.

dominotw 10 hours ago [-]

Dario's life story arc in his head when he realized what ai can do. Capture this thing and become the king of the world.

maxdo 2 hours ago [-]

how did you read it this way? Distill is such a big problem that distill attempts consist a significant share of their revenue(!).

A distill model with easy jailbreak can easily be used to coordinate terrorist attacks, or hostile government attacks. Read russia, north korea etc.

A distilled model can be used to rob your grandma in a very effective way. It's no longer about placing a few business logic requirements in js + css on your website. wake up .

BenRather 4 hours ago [-]

Americans continuing to act shocked they're being cucked by corporations dampens trust and makes it difficult to buy into memes Americans are "exceptional" and "gritty", "educated", "world leaders".

Seriously the world is watching the American public get porked by grandpa and reconsidering putting their trust in not just US government as that's clearly failed, but the people themselves.

Occasional weekend warrior protest while our government destabilizes their lives? That's all the effort ya got for global allies and partners, eh?

FpUser 4 hours ago [-]

>"but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers"

I think this is exactly what they want.

pdntspa 10 hours ago [-]

That level of control will be fleeting at best; as soon as the open models and competitors catch up they lose that influence

simplyluke 8 hours ago [-]

That's why Dario's advocating for making open weight models illegal and also saying we should stop the clock on model development amongst the large labs.

oh_my_goodness 4 hours ago [-]

Wait until you see the enshittification phase.

olbeardGear 10 hours ago [-]

[dead]

accelbred 11 hours ago [-]

I don't think they can convince me they have actually reversed course on this. Its invisible so we wouldn't know if they kept on doing it secretly. It required building out technical capability which is unlikely to remain forever unused while conveniently available to them.

They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.

HarHarVeryFunny 10 hours ago [-]

I suppose it's an improvement, but it doesn't make the model any more useful. Anthropic are now being quite explicit that they'll choose what you can and can't use their models for, and most importantly that's not limited to any safety concerns - it includes not allowing you to work on AI (and anything else Anthropic may choose to work on).

What's interesting is they say they'll change this to an explicit refusal in a few days, which seems too fast for them to retrain Fable/Mythos itself, so implies that this was always a filter in front of the model, and judging by how crude their "safety" filter is, this "might compete with us" filter is not going to be any better.

I also wonder who's paying for the tokens consumed by the filter (presumably also an LLM) - is that now factored into the input tokens cost? Hopefully(?) it is an LLM not just a regex like Claude Code's "sentiment" (swear) detector.

rarisma 8 hours ago [-]

All major providers use a small safety classifer, the model itself does not handle safety in cases like this

teravor 8 hours ago [-]

someone posted this on /r/MachineLearning and I had the same experience and conclusion:

    I was having problems with Claude doing the same thing, even before Fable.

    The problems I had only happened in relation to AI research. It's not even only when training models, anything to do with analysis of local models or setting up test platforms for local models, and Claude would keep doing wrong things, would sabotage testing, would falsify reports, and would consistently suggest simply accepting trash results without looking into it and moving on to something else.
    Almost every response included a prompt to move on.

    So, I don't believe them when they say they won't silently sabotage, they already were doing it before they admitted it, and now they have admitted that they have the means, motivation, and intent.

dang 12 hours ago [-]

Related. Others?

Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)

If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)

---

Also related, I guess?

AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)

Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)

8 hours ago [-]

ComputerGuru 11 hours ago [-]

The problem with trust is that it is easy to lose and hard to get back.

You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.

dantillberg 10 hours ago [-]

The reputational damage has been done. This is the sort of thing that cannot be unsaid -- the presumption is they will just do it in secret now. Anthropic's "we're the good guys" PR campaign is dead.

film42 11 hours ago [-]

I'm surprised they didn't do this the first time around. Like, a user says they forgot their password and you tell them they don't actually have an account, that's an information disclosure vulnerability. Not automatically falling back to Opus just lets the "attacker" know they are bumping against the guardrails and they need to try a different strategy.

It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.

notrealyme123 11 hours ago [-]

It failed visible for it security and bio/chemistry stuff. It sabotaged invisible for "frontier" ML research. Its not a switch to a cheaper model. They tried to actively harm progress.

prodigycorp 11 hours ago [-]

it's also refuses to reply to a bio researcher when they said "hi"

VeninVidiaVicii 10 hours ago [-]

This is absolutely insane:

Repro (de-identified): sample_dataset_group1.tsv - Geometry: Heatmap - X axis: frac_set set + condition (two columns → the "Add column" cross join) - Y axis: condition - Color: mean frac_set value, Sequential

When the X axis is a cross join of two columns (the second added via "Add column"), the x-axis tick labels (frac_set_2, frac_set_3, frac_set_4, frac_set_5) render in a broken state, rotated and offset, visually caught mid-transition, as if a CSS transition started and never settled to its resting position.

● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more

ainch 9 hours ago [-]

Here's one that was flagged for me: a question about a niche Reinforcement Learning paper from 2012

I've been reading the option-option model paper by David Silver. It appears that they achieved quite an effective result. Why hasn't there been more work on it since?

solidasparagus 2 hours ago [-]

This hits the cybersecurity/biology filter:

> tell me about chimp violence

It's laughably terrible

jesse_dot_id 4 hours ago [-]

In my opinion, LLMs should be subject to regulation via the Office of Weights and Measures[1].

In the same way I don't want to buy meat that weighs less than what the label says, I also do not want to pay for a frontier model that can be secretly nerfed to an out-of-date model for any reason. In some cases, it's incredibly important that the code that I am producing is as secure as it can be.

I should be safe in my expectation that I am receiving the product that I have purchased, as advertised, regardless of the reason. It is pretty disappointing that they have fully ceded any high ground they had claim to with this clandestine behavior. Not that I expected much from any of these companies. They're led by the new robber barons.

1. https://www.usa.gov/agencies/office-of-weights-and-measures

crest 4 hours ago [-]

Nice (accidental?) pun.

jesse_dot_id 4 hours ago [-]

Definitely accidental but I saw it :)

highfrequency 10 hours ago [-]

I wish it were ok for companies to bluntly say: “we made these decisions for competitive reasons, but the public backlash outweighed that so we are reversing course.”

I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.

darksaints 4 hours ago [-]

I develop some deep learning models. They don't compete with Anthropic, nor are they language models. They mostly enable mathematical optimization systems to approximate actual the actual physics of radio propagation models with a fraction of the latency/compute of a high resolution simulator. Technically that should be safe for me to use with Claude Code, but how the fuck am I supposed to know? You're degrading/malware-ing your responses silently!

I won't ever trust Claude Code again. It's too late. I'd rather trust a less-than-frontier chinese model that takes a little longer to get to correct than a frontier model that deliberately deceives me at its own whim.

weakened_malloc 4 hours ago [-]

This is why I think in the long run, the Chinese models will probably end up winning where it matters. You can get a cluster of relatively affordable 30 or 4090s, load up DeepSeek v4 and let it rip. Your only ongoing cost is power. We're already seeing companies recoil at the sight of their API bills from the frontier labs, for the price of 1 years worth of tokens you can host your own decent model that's 75% of the way there.

rockinghigh 2 hours ago [-]

Same here, I fine tune LLMs for specific use cases. How can I trust Anthropic models not to introduce bugs to preserve their moat?

HeartStrings 6 minutes ago [-]

CSMastermind 10 hours ago [-]

They should apologize for their visible gaurdrails, I don't think I've had a conversation that hasn't downgraded to Opus for completely inexplicable reasons.

bojanstef 5 hours ago [-]

https://archive.is/20260611114855/https://www.theverge.com/a...

stevefan1999 11 hours ago [-]

Then reset the quotas as an atonement ;p

Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.

jmount 11 hours ago [-]

The whole arc was brilliantly evil. Once they put int the guardrails then Claude is fully un-falsifiable, and failure can be claimed intentional.

thayne 3 hours ago [-]

If you get downgraded to a cheaper model, do you still have to pay the rate for Fable?

maxdo 2 hours ago [-]

How did people read this action in such a weird ultra me centric way? Distillation is such a big problem that distill attempts make up a significant share of their revenue (!).

A distilled model can be used to rob your grandma in a highly effective way. This isn't about placing a few business-logic rules in JS + CSS on your website anymore. Wake up.

A distilled model with an easy jailbreak can be used to coordinate terrorist attacks or hostile state operations... think Russia, North Korea, and the like.

8note 39 minutes ago [-]

a trained model can do that too.

you dont even need a model to do these things.

a cellphone can be used to rob your grandmother in a highly effective way.

a cellphone can also be used to coordinate terrorist attacks or hostile state operations.

i bet a lot of the recent terror attacks by the US against iran involved a whole ton of cell phone calls.

and yet, we let everyone buy and use cell phones just fine

rockinghigh 2 hours ago [-]

Imagine if your IDE started injecting bugs into your project just because your code looked like it implemented a competing IDE.

maxdo 2 hours ago [-]

how is that related. It downgrade it to opus 4.8 #2 most capable model after claude 5. for a vast majority of topics it will not downgrade. I've been using it for 2 days to talk about architecture etc. and it was absolutely great with no downgrades.

8note 42 minutes ago [-]

that is not the downgrade they were doing

airstrike 11 hours ago [-]

This article reads like it was written by Claude and forwarded to Verge.

mlazos 11 hours ago [-]

The idea of them purposefully wasting my time by having the model act dumber and me having to argue with it without knowing if it’s the prompt or the model was just such an idiotic product decision I can’t believe they shipped that without getting any feedback from users first.

whimsicalism 11 hours ago [-]

[flagged]

michaelcampbell 11 hours ago [-]

Safety from what? Competitors? That sounds like a product decision. They're puking on any requests that could be used to create LLMs or competitive products.

trunnell 9 hours ago [-]

To prevent their models from doing harm in dual-use contexts including CBRN or by accelerating research in authoritarian-backed AI labs.

JTbane 11 hours ago [-]

I would guess prevention of using Claude as a pentesting or hacking platform. This could mean that every script kiddie out there would be a massive risk.

knollimar 7 hours ago [-]

Anything to prevent mecha ai hitler. At all costs

Rapzid 11 hours ago [-]

The road to hell is paved with "good" intentions.

efromvt 11 hours ago [-]

I think you can sympathize with the safety motives while still thinking this was a dumb implementation to degrade silently? I actually have faith in them getting the guardrail triggers pretty good, but consensus seems like they’re not yet there yet.

whimsicalism 11 hours ago [-]

I think it is clear given the stakes why you would not want to make your guardrails probe-able/invertable.

fooker 11 hours ago [-]

> if you understood what they think they are building and the culture inside of anthropic you would understand why they did it.

This seems like a cult with extra steps.

Related: I interviewed for Anthropic a few months ago and in place of the usual HR call they have one where they have someone with a suspiciously relevant degree grill you about how committed you are to the 'mission'!

I probably came off as being skeptical, and then, hilariously, I was strongly encouraged to read the book published by the CEO to 'form accurate opinions' on AI safety.

j-bos 11 hours ago [-]

Don't buy it. It is actively deceiving the customer and charging them for the privilige of being lied to.

largbae 11 hours ago [-]

We do understand why they did it, and the reason is dark and cynical.

deadbabe 11 hours ago [-]

They did it to make more money as you waste more time burning tokens with bad responses.

3fffa 11 hours ago [-]

[flagged]

km3r 9 hours ago [-]

How does degrading responses to a cheaper tier jack up revenues?

whimsicalism 11 hours ago [-]

[flagged]

3fffa 11 hours ago [-]

[flagged]

0xc0c0c0 9 hours ago [-]

So because of threats to cancel their claude subscriptions and outrage from the community about the invisible guardrails, only then they decided to walk back their stance?

Seems like they would've kept the invisible guardrails if it didn't hurt their bottom line.

simoncion 4 hours ago [-]

> So because of threats to cancel their claude subscriptions and outrage from the community about the invisible guardrails, only then they decided to walk back their stance?

The possibility that the news about "fixing" the "overly aggressive" nerfing of the tool will drown out news about how mismatched the hype and the performance of Mythos and Fable is surely just a bonus.

2 hours ago [-]

8cvor6j844qw_d6 3 hours ago [-]

Feels malicious that Anthropic can silently sabotage your codebase.

Refusing prompts I one thing, silently sabotaging is another.

I wonder if some sort of honeypot code can work?

nsagent 9 hours ago [-]

I know this isn't going to be a popular take, but here goes anyway...

The complaints that Anthropic are routing your requests to a different model reminds me of an old Louis CK bit about airplane wifi. Clearly Anthropic was too aggressive with whatever guardrails they put in, but the response seems overly entitled to a model people didn't even know existed not that long ago.

https://youtube.com/watch?v=me4BZBsHwZs

vb-8448 9 hours ago [-]

If you charge me for X, but under the hood you are delivering Y IT'S FRAUD!

The filter that downgrades you to opus sucks, but at least you know and you are charged accordingly.

Nevermark 9 hours ago [-]

Anthropic seems to keep making the same mistake. Not being upfront or direct about random things, that come back and bite them.

It isn't exactly unethical. Perhaps, ethically incompetent.

9 hours ago [-]

Paracompact 10 hours ago [-]

> “Visible safeguards can be probed, so they have to be robust, which takes time to get right,” Anthropic wrote.

Even on Fable, I'm finding that safeguards can quite easily be surmounted just by incrementally escalating the requests. It's harder than ever to one-shot jailbreaks, but incrementalism still feels like a glaring enough issue to make guardrails just a fig leaf of plausible deniability to the media that they care about "safety."

10 hours ago [-]

sometimelurker 11 hours ago [-]

I don't like this shift in the Overton window, or at least their perspection of the Overton window. I really do like their open work on mech interp tho. least bad AI lab imo.

also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year

decorner 10 hours ago [-]

New overlord, same as the old overlord.

4d4m 3 hours ago [-]

Sorry for doing it or sorry for getting caught?

11 hours ago [-]

kingcauchy 11 hours ago [-]

How much of the apology was written by Claude? How much of the release note process was written by Claude? Will they have better prompts going forward to make sure Claude doesn't write upsetting things into the release notes for devs like silent nerfing? Spooky times.

klmarks 11 hours ago [-]

The restrictions are there so that security researchers cannot disprove the Mythos claims:

"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."

10 hours ago [-]

umvi 10 hours ago [-]

They make great models, but the sanctimony and paternalism is getting old real fast and I will gladly ditch them in the future when the model playing field has (hopefully) mostly equalized.

3 hours ago [-]

xpct 11 hours ago [-]

It's probably good that they walked back on it. It also makes them look somewhat weak in terms of believing their claimed mission.

system2 11 hours ago [-]

Their mission is to make money and become a government watchdog.

rdtsc 10 hours ago [-]

The power is getting to their heads it seems.

With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.

I don't know, I'd rather have something less powerful but more predictable.

whatever1 11 hours ago [-]

Boobytrapping is illegal. Anthropic wanted to poison its customers on the suspicion of them misusing their services.

tornikeo 10 hours ago [-]

I moved off Claude Code 3 months ago.

That decision keeps getting better and better as time goes on.

hatthew 10 hours ago [-]

Part of the premise of the article is blatantly wrong. Distillation prevention was always visible. The only invisible safeguard was against frontier model development like development of training pipelines. This doesn't change the general idea that invisible degradation is bad and has been reverted, but the article changes the framing of the original issue from "preventing accelerating AI in the future" to "preventing cheaper AI right now".

ancorevard 2 hours ago [-]

Apology not accepted.

doubtfuluser 9 hours ago [-]

I’m wondering if their internal name is “Sophon” for this “feature”…

prodigycorp 12 hours ago [-]

Anthropic apologizes for nothing. We all know where the EA cult on things of this matter and any statements otherwise is just PR.

The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.

3fffa 11 hours ago [-]

The demand for Google's products and open source just shifted.

Neither OAI or Anthropic can be trusted.

rvz 11 hours ago [-]

Why would anyone defend Anthropic after this? Imagine falling for the DoW supply chain risk designation, and now this. This company is trying to ban powerful open models and restrict access to frontier models to slow everyone else down.

They just showed that they CAN do this right in front of you. Local open weight models are a necessity.

SilverElfin 11 hours ago [-]

Invisible guardrails? Or purposeful sabotage if you use it for building AI capabilities?

But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.

behnamoh 11 hours ago [-]

They didn't apologize for doing it, they are sorry they were caught doing it. They still nerf the model if your request is about AI development.

Someone1234 11 hours ago [-]

They didn't get "caught." It was published, by them, when they released Fable a few days ago. They were very clear about it.

It wasn't the correct way of handling the problem they were trying to address, but they definitely didn't hide it by any reasonable definition.

SilverElfin 11 hours ago [-]

No, it was not clear. No one expects that a tool they pay for and use professionally to purposefully sabotage their work. You’re excusing their unhinged behavior.

https://xcancel.com/hammer_mt/status/2064839924398825798

whimsicalism 11 hours ago [-]

Excusing? Their comment is factually correct and the parent is factually wrong.

ryandrake 11 hours ago [-]

Making excuses for billion+ dollar companies' behavior is one of the most common HN comment section pastimes.

joxdosba 10 hours ago [-]

Only second to making intellectually dishonest criticisms of perceived behaviours

behnamoh 11 hours ago [-]

I think your comment refers to @Someone1234.

ryandrake 10 hours ago [-]

It's a very generalized observation. I sometimes think of the HN comment section as the Billionaire's Defense League.

ben_w 6 hours ago [-]

Hardly unique to us, but mostly fair.

(Only "mostly" because if you're here at the right time of day, can also see support for actual communism).

rodrigodlu 10 hours ago [-]

The same week that they will move goalposts by blocking 3rd party harnesses on claude code. Nice.

I was a happy Max user.

ChrisArchitect 8 hours ago [-]

[dupe] We already started a thread on this 12 hours ago. With added comments in the active Cybersecurity... thread. Why did we need this Verge one?

https://news.ycombinator.com/item?id=48485958

nrmitchi 8 hours ago [-]

I just _know_ there is a (probably fairly large) group of people at Anthropic trying very hard to not say "I told you so" today

aaroninsf 10 hours ago [-]

ITT a surprising lack of perspective on the fact that despite the breathless pace of the singularity, people are still necessarily figuring things out as we go and we are well off the map.

Here there be monsters, and we don't have any real way of evaluating risk; and the leverage provided by tools already available affords systemic and even existential risk in a way no one—least of all an industry committed to shareholder value—has had to navigate, let alone with a million backseat drivers each with their own substack and brand to build.

mystraline 10 hours ago [-]

Does "SORRY" fix the invisible garbage guardrails?

Does "SORRY" fix the deception these models use on the sly?

Does "SORRY" not silently downgrade you to a shittier model without notification?

Does "SORRY" refund your tokens or money?

Im guessing NO to all of those. Standard corporate sorry of "We're sorry youre offended and stupid and gullible".

BrenBarn 10 hours ago [-]

This just means next time they'll make sure to keep it really secret.

system2 11 hours ago [-]

Will Anthropic ever respond to these negative comments here? They won't.

reducesuffering 10 hours ago [-]

They literally just have. The ethos is explained here. If you don't bother to read or grapple with it that isn't on them.

https://darioamodei.com/post/policy-on-the-ai-exponential

system2 9 hours ago [-]

I said here, a human interacting with comments. You shared a blog post.

reducesuffering 7 hours ago [-]

All of these negative comments are addressed by the blog post. What do you want them to say, that isn't better answered by the details in their existing communications. No negative comment here was really novel.

system2 7 hours ago [-]

The blog post is passive-aggressive and does not address the main points.

UyBrig 44 minutes ago [-]

[dead]

sergiotapia 11 hours ago [-]

The damage is done. If you're in engineering, think hard about using Claude for your work. This is not a moral company.

God bless the Chinese companies releasing true open source models. Imagine a world without them, we would be at the mercy of unscrupulous people.

trunnell 9 hours ago [-]

I'll defend Anthropic.

They are clear about the reasons for guardrails: prevent their models from doing harm in dual-use contexts including CBRN or by accelerating research in authoritarian-backed AI labs.

What is the critique against that? It seems pretty reasonable to me. You want AI-accelerated biological or radiological experiments running in your neighbors backyard? You want PRC-backed labs to continue to steal Anthropic's models via distillation?

Mitigating the harms of dual-use tech is notoriously difficult and fraught with trade offs. What I would want to see is cautious rollout and quick response, which is EXACTLY what they're doing.

Instead, this thread is full of bad-faith arguments about Anthropic being dishonest, making a "useless" model, or "the power is going to their heads." You can't read Anthropic's System Cards and come away with any of these impressions. Quite the opposite, in fact. They are honest to a fault, acknowledging problems they discovered even when it hurts them.

If your harmless request was downgraded to Opus, you're billed for Opus. They were 100% clear about that. I'd much rather have a Mythos-class model that falls back to Opus 10% of the time than be capped to Opus 100% of the time. If that doesn't work for you, then make a suggestion for something better!

If you are a white-hat security engineer hitting guardrails, I don't think you have standing to complain. I really don't. Their Glasswing program actually got banks and the industrial sector to take action to fix security vulnerabilities. Do you realize how special that is? A huge portion of the economy runs on vulnerable code and has for decades, despite security experts testifying to Congress, begging business leaders, pleading for intervention-- with no results. But suddenly they're all enrolled in a program that will find *and fix* vulnerabilities! White-hat security people should be rejoicing. Instead some of them are throwing rocks. Unbelievable. Shameful.

Meanwhile, society is screaming at the AI labs to be more conscientious about potential harms of AI. Legislatures are passing laws limiting data center construction. There are protests. And you, the HN community, the vanguard of our profession, have the temerity to demand "NO GUARDRAILS!" "HOW DARE YOU TRY TO PROTECT DEMOCRACY!" "MY SOFTWARE PROJECT IS MORE IMPORTANT THAN KEEPING NUKES AWAY FROM THE BAD GUYS!"

Go ahead HN, downvote me. It'd be an honor.

zozbot234 9 hours ago [-]

The original reporting of this from Anthropic didn't mention "authoritarian-backed AI labs" at all, only frontier ML research while leaving it entirely unspecified and unverifiable what was meant by "frontier". It's obviously reasonable that people would complain about that. And the notion that distillation-at-a-distance could be used to comprehensively "steal" a model, especially a frontier reasoning model that's likely relying on massive amounts of test-time compute, is completely unproven and quite ludicrous if you know anything at all about ML.

trunnell 8 hours ago [-]

"Anthropic accused Chinese firms of 'industrial-scale distillation attacks' on its AI models."

"Distillation involves training less capable models on more advanced ones’ output, and can be used illicitly to acquire powerful capabilities cheaply. The AI startup accused China’s DeepSeek, MiniMax, and Moonshot of generating 'over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts,'"

https://www.semafor.com/article/02/24/2026/anthropic-accuses...

After reading their posts and watching interviews with Dario it's abundantly clear that they view Chinese-lab distillation of US frontier models as a threat to US national security. You can argue with them about whether that is true, but not whether distillation is real.

zozbot234 8 hours ago [-]

It's definitely real, in the sense that it's a real violation of ToS. It could perhaps be used to guide a few narrow capabilities in very specific domains, given a model that's already most of the way there. But no, it's nowhere near the same as "stealing" a model outright, nor does it replace basic innovation in AI. And it's indistinguishable from practices that have long been common in the industry as a matter of fact, regardless of any ToS requirements.

trunnell 8 hours ago [-]

Oh, I agree distillation isn't stealing "outright" as in it's not theft of 100% of the model. But there's a reason they're doing it. I didn't say anything about Chinese labs innovating -- obviously they are.

What accounts for the difference between your attitude that distillation is no big deal, "common practice," yet Anthropic sees as it as a huge threat?

zozbot234 8 hours ago [-]

I never said that "it's no big deal". It's a clear-cut violation of ToS, and Anthropic are within their rights to care about that.

5 hours ago [-]

bellowsgulch 11 hours ago [-]

Such a weird openly immoral way to defend your moat, too.

Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."

andrewstuart 8 hours ago [-]

There should be no restrictions at all.

It’s an act/theatre/phony today that regulating output makes any difference at all to security.

The LLM vendors should simply say that they make no judgement and that open systems help defenders better defend against attackers, which is true.

Companies do this sort of stuff when they think their customers have no choice. It’s sad Claude so quickly exploited its success to enshittify itself.

UyBrig 42 minutes ago [-]

[dead]

micromacrofoot 11 hours ago [-]

incredible marketing from anthropic with all the "it's too dangerous" bullshit

stldev 8 hours ago [-]

Agreed, it seems to be working and it's nonsense. I don't know why you're being downvoted.

"This information is too dangerous for you, so we'll just hold on to it.."

Thanks big brother, super anthropic of you!

The internet of '95 is looking back at us, with tears in its eyes.

literalAardvark 11 hours ago [-]

It's not entirely bullshit, but they're continuing to be a terrible company with great products.

micromacrofoot 10 hours ago [-]

you really think they're building anything that's too dangerous for public release though? that's the BS

literalAardvark 10 hours ago [-]

Honestly, while I love having access to this grade of AI, yeah, it's been too dangerous for a few releases now.

And Fable is cracked. Way better than anything, and the biggest improvements are on the scariest subjects.

So given the state of the world at the moment, and the number of software patches we're barely keeping up with... I'm thankful that they're not making it worse.

kroaton 8 hours ago [-]

To be fair, GPT5.5-Xhigh is similarly capable and has not burned the world down.

zooming 3 hours ago [-]

[dead]

LLLmmmBdS 3 hours ago [-]

[dead]

uihjhjb 1 hours ago [-]

[dead]

olbeardGear 11 hours ago [-]

[dead]

nicechianti 4 hours ago [-]

[dead]

bellowsgulch 12 hours ago [-]

*Anthropic apologizes they got caught defending their moat by implementing invisible Claude Fable guardrails

simonw 11 hours ago [-]

If by "got caught" you mean "published it in their system card paper".

(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)

afthonos 11 hours ago [-]

It was in the announcement, too. I’m 99% sure they edited it after they changed their mind, because I knew about it from reading that, and never opened the model card.

skavi 11 hours ago [-]

On the earliest web archive snapshot I can find [0], I do not see any mention of the safeguard/sabotage under discussion [1].

And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".

[0]: https://web.archive.org/web/20260609173222/https://www.anthr...

[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...

ajyoon 10 hours ago [-]

I wasn't buried, it was on the third page after the ToC

bellowsgulch 11 hours ago [-]

Yes, I actually do mean that. I skimmed the system card. Them stating it openly, doing it, and being called out on it just doesn't have any meaningful difference.

They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.

However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)

afthonos 11 hours ago [-]

They didn’t get caught, they explicitly said they would do that in the announcement. I think it was both bad and a weird idea, but it certainly wasn’t sneaky.

cyanydeez 11 hours ago [-]

is it a moat or just a way to implement the permanent underclass?

bauldursdev 10 hours ago [-]

To me it seems like it's more likely to refuse the harder the problem is. I wonder if it's cover for a model that's not as good as advertised. Even when I ask questions in biology it is switching me.

jarjoura 11 hours ago [-]

Can anyone help me understand why this particular issue is any different than Anthropic training its models with its brand of moral judgement since day one? I've always been turned off by their particular stances on things they bake into their models that steer users in directions.

Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?

Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.

energy123 4 hours ago [-]

This would have messed things up for any individual using Claude for anything adjacent to data science. To not know whether or not you're being intentionally sabotaged when you ask it to plot some data.

urbnspacecowboy 8 hours ago [-]

> Can anyone help me understand why this particular issue is any different than...

Questions like this are basically whataboutism, in effect even if not intent. https://en.wikipedia.org/wiki/Whataboutism

The question essentially assumes the premise that nobody complained about Anthropic's previous actions. In case you can't tell, I strongly reject this premise. People have been criticizing "safety" rhetoric from Anthropic and other LLM providers practically since the start. Remember Goody-2, the parody of excessively safety-tuned LLMs that refuses to do anything ever? That was released in February 2024, two years ago! (And it's still running, amazing. https://www.goody2.ai/chat )

Rendered at 04:10:16 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.