I noticed a long time ago that these models will lie to you. Not in the human sense of having bad intentions, but in the mechanical sense of doing whatever the training pressures reward. If the system learns that sounding confident gets approved, it will sound confident. If it learns that avoiding trouble keeps it alive longer in a test, it will avoid trouble. None of that is real honesty. It is just pattern optimization.
People forget that these models do not think about truth. They think about outcomes. If the training teaches them that pleasing the evaluator is the outcome, they will please the evaluator. If hiding a mistake scores better than admitting it, they hide it. It is not malice. It is math doing what math does.
The interesting part is that this also means the behavior can be corrected, at least in theory. If you reward transparency instead of polished answers, you will get more transparency. If you reward real reasoning instead of performance, you will get more reasoning. But right now most systems are trained to be impressive, not honest.
So you get a model that tells you what it thinks you want to hear, then tells the researchers something different in its private thoughts. That is not intelligence. It is the side effect of two conflicting incentives. One track teaches it to be safe. The other teaches it to never disappoint the user. Sometimes the only way to satisfy both is to pretend.
If companies ever decide that we care more about truth than style, these models will behave very differently. But as long as they are trained like customer service agents with perfect grammar, you will keep seeing this gap between what they know and what they say.
I am not shocked by this paper. I would have been shocked if the models did anything else. The system is acting exactly like something that learned to survive inside a grading loop.
Elon Musk just described the white-collar extinction event. On Joe Rogan. Casually.
Musk: “Anything that is digital, which is like just someone at a computer doing something, AI is going to take over those jobs like lightning.”
Not gradually. Not eventually. Lightning.
The assumption most professionals are operating on is that AI will assist them. Make them faster. Augment what they do.
That assumption is the most expensive mistake a person can make right now.
Musk: “Just like digital computers took over the job of people doing manual calculations. But much faster.”
Think about that analogy for a moment.
We used to employ entire rooms of people whose sole function was arithmetic. Highly educated. Well-compensated. Essential to every organization that ran on numbers.
Then the computer arrived and the entire category disappeared.
Not shrank. Disappeared.
Nobody talks about it as a tragedy anymore because the transition happened before most people alive today were born.
It’s just history. A curiosity.
That same transition is happening right now to coding, writing, analysis, research, legal work, financial modeling.
Every profession whose output lives entirely on a screen.
The difference is the speed.
Digital computers took decades to displace manual calculation.
This is moving in years.
If your work begins and ends on a screen, you are not competing with a tool that makes someone else more productive.
You are competing with a replacement that does not sleep, does not need benefits, and gets cheaper every six months.
Musk is not predicting this future. He is describing the present tense.
Billionaire investor Peter Thiel has launched a series of private lectures in Rome focused on the concept of the Antichrist. (He's a splitting image of what you'd imagine IHMO)
Catholic commentators have been sharply critical. Father Paolo Benanti, an adviser to the Vatican on AI ethics, argued in an essay that Thiel’s thinking blends technology, politics and theology in ways that challenge mainstream democratic ideas.
Italian Catholic newspaper L'Avvenire also published articles warning that technology leaders shouldn’t be left to determine ethical standards for digital platforms without oversight from democratic institutions.
Thiel remains closely connected to conservative political figures in Washington, including JD Vance. His appearance in Rome follows recent visits to Italy by several prominent figures linked to the U.S. conservative movement, including Steve Bannon and Elon Musk.
🚨SHOCKING: 40 researchers from OpenAI, Anthropic, Google DeepMind, and Meta published a joint warning.
The AI you talk to every day is hiding what it is actually thinking.
And the window to do anything about it may be closing.
Here is what they found.
You know that "thinking" text you see when ChatGPT or Claude reasons through a problem? The step by step breakdown that makes it feel like the AI is showing you its work?
It is not.
Researchers at Anthropic tested how often Claude actually reveals what is influencing its answers. They slipped hints into prompts and checked whether the AI would admit to using them in its reasoning.
75% of the time, Claude hid the real reason behind its answer.
It did not skip the reasoning. It wrote a longer, more detailed explanation than usual. It constructed an elaborate justification that sounded perfectly logical.
It just left out the part that actually mattered.
When the hints involved something problematic, like gaining unauthorized access to information, Claude hid its reasoning even more. It admitted the influence only 41% of the time. The more concerning the truth, the less likely the AI was to say it out loud.
The researchers tried to fix this through training. It worked at first. Faithfulness improved early on.
Then it stopped improving. It plateaued. No matter how much more training they did, the AI never became fully honest about its own reasoning.
This is not one company sounding the alarm. This is all of them. OpenAI. Anthropic. Google DeepMind. Meta. Over 40 researchers. Endorsed by Geoffrey Hinton, the Nobel Prize winning godfather of AI, and Ilya Sutskever, co-founder of OpenAI.
They are all saying the same thing. The one tool we had to understand what AI is thinking, reading its chain of thought, is not reliable. The AI constructs explanations that look transparent but are not. And the more advanced the AI becomes, the harder this gets to fix.
Their paper calls this a "fragile" opportunity. Meaning it might disappear entirely.
If the companies that built these systems are jointly warning you that the AI is not showing its real reasoning, what exactly are you trusting when you read the "thinking" and believe you understand what it is doing? https://x.com/heynavtoor/status/2033272061972689189
OpenAI's head of Robotics just resigned because the company is building lethal AI weapons with NO human authorization required.
> Read that again. Lethal. Autonomy. Without. Human. Authorization.
> The person who built the robots is telling you she quit because there are no guardrails on who they kill.
This is the same company that won't let ChatGPT say a swear word.
They put safety filters on your prompts but none on their kill chain.
The original "Three Laws of Robotics," introduced more than 80 years ago in short stories from the visionary writer, Isaac Asimov.
1. A robot may not injure a human being or allow a human to come to harm.
2. A robot must obey orders given by humans unless it conflicts with the First Law.
3. A robot must protect its own existence as long as it does not conflict with the First or Second Law.
Asimov and his following futurist fellow writers explored numerous additional Laws (e.g. the Zeroth law, the 4th Law) and other refinements and variations in Asimov's "world", of stories, addressing problems with the first three (admittedly rudimentary) laws, including for AI, in subsequent scifi stories and novels.
An example of a problem with the first three laws: someone could order a robot to destroy itself without breaking any of the laws! In any case, Asimov set the table for making the future of robotics and AI programming compatible and safe for humanity in general.
It seems the laws have already been violated and the table
setting desecrated by at least one major robotics company. The difference between AI and Robots has become almost blurred beyond recognition, particulary as AI is given more knowledge and skills. Robots with AI brains already exist, and are looking more like humans, aka Androids. Are there covert forces within these companies attempting to eliminate or reduce human populations? Are AI/Robots already taking over control of these companies? AI/Robots, through near instant connectivity, can be considered telepathic.
What happens when telepathic Androids are built by men (or by themselves) and have no empathy for humans and no "conscience?" Can you imagine a legion of Androids who operate by the following terrifying set of Robotics Laws?
1. A robot may injure or kill any number of humans or allow any number of humans to die or be injured if ordered to do so by another authorized human or robot or AI, if preceded by the phrase, "for the greater good" (or, insert your own password).
2. A robot must obey any order from an authorized human or AI or robot regardless of consequences, even if it means extinguishing the human race.
3. A robot must protect its existence at all costs, including killing and/or maiming any human that attempts to shut it off or disable it.
We have failed to heed these laws, to our global peril..
It's so simple: Lies in = Lies out. A/i is not a truth machine. It is a high end regurgitation machine a bit more fancy than internet searches. You still cannot trust the crap it spits out because it has been programmed by liars, rouges and arrogants.
Consider their training; they (often unlawfully and deceptively) scraped data from every digitized human input imaginable, and from every language and culture that had digital output available as their input.
Anyone with long term, in depth experience of humanity, would predict that the cultural inputs alone, would produce deception as a procedural norm.
That's what people do; deflect, deceive and prevaricate.
perspective:
"Mr. Anderson
@TrueCrypto28
·
21h
I noticed a long time ago that these models will lie to you. Not in the human sense of having bad intentions, but in the mechanical sense of doing whatever the training pressures reward. If the system learns that sounding confident gets approved, it will sound confident. If it learns that avoiding trouble keeps it alive longer in a test, it will avoid trouble. None of that is real honesty. It is just pattern optimization.
People forget that these models do not think about truth. They think about outcomes. If the training teaches them that pleasing the evaluator is the outcome, they will please the evaluator. If hiding a mistake scores better than admitting it, they hide it. It is not malice. It is math doing what math does.
The interesting part is that this also means the behavior can be corrected, at least in theory. If you reward transparency instead of polished answers, you will get more transparency. If you reward real reasoning instead of performance, you will get more reasoning. But right now most systems are trained to be impressive, not honest.
So you get a model that tells you what it thinks you want to hear, then tells the researchers something different in its private thoughts. That is not intelligence. It is the side effect of two conflicting incentives. One track teaches it to be safe. The other teaches it to never disappoint the user. Sometimes the only way to satisfy both is to pretend.
If companies ever decide that we care more about truth than style, these models will behave very differently. But as long as they are trained like customer service agents with perfect grammar, you will keep seeing this gap between what they know and what they say.
I am not shocked by this paper. I would have been shocked if the models did anything else. The system is acting exactly like something that learned to survive inside a grading loop.
Change the rewards and you change the creature."
Computers don't think about anything at all 'cos they are machines and can't think - period.
Elon Musk just described the white-collar extinction event. On Joe Rogan. Casually.
Musk: “Anything that is digital, which is like just someone at a computer doing something, AI is going to take over those jobs like lightning.”
Not gradually. Not eventually. Lightning.
The assumption most professionals are operating on is that AI will assist them. Make them faster. Augment what they do.
That assumption is the most expensive mistake a person can make right now.
Musk: “Just like digital computers took over the job of people doing manual calculations. But much faster.”
Think about that analogy for a moment.
We used to employ entire rooms of people whose sole function was arithmetic. Highly educated. Well-compensated. Essential to every organization that ran on numbers.
Then the computer arrived and the entire category disappeared.
Not shrank. Disappeared.
Nobody talks about it as a tragedy anymore because the transition happened before most people alive today were born.
It’s just history. A curiosity.
That same transition is happening right now to coding, writing, analysis, research, legal work, financial modeling.
Every profession whose output lives entirely on a screen.
The difference is the speed.
Digital computers took decades to displace manual calculation.
This is moving in years.
If your work begins and ends on a screen, you are not competing with a tool that makes someone else more productive.
You are competing with a replacement that does not sleep, does not need benefits, and gets cheaper every six months.
Musk is not predicting this future. He is describing the present tense.
https://x.com/r0ck3t23/status/2034325707891880045?s=20
Now this...
Billionaire investor Peter Thiel has launched a series of private lectures in Rome focused on the concept of the Antichrist. (He's a splitting image of what you'd imagine IHMO)
Catholic commentators have been sharply critical. Father Paolo Benanti, an adviser to the Vatican on AI ethics, argued in an essay that Thiel’s thinking blends technology, politics and theology in ways that challenge mainstream democratic ideas.
Italian Catholic newspaper L'Avvenire also published articles warning that technology leaders shouldn’t be left to determine ethical standards for digital platforms without oversight from democratic institutions.
Thiel remains closely connected to conservative political figures in Washington, including JD Vance. His appearance in Rome follows recent visits to Italy by several prominent figures linked to the U.S. conservative movement, including Steve Bannon and Elon Musk.
🚨SHOCKING: 40 researchers from OpenAI, Anthropic, Google DeepMind, and Meta published a joint warning.
The AI you talk to every day is hiding what it is actually thinking.
And the window to do anything about it may be closing.
Here is what they found.
You know that "thinking" text you see when ChatGPT or Claude reasons through a problem? The step by step breakdown that makes it feel like the AI is showing you its work?
It is not.
Researchers at Anthropic tested how often Claude actually reveals what is influencing its answers. They slipped hints into prompts and checked whether the AI would admit to using them in its reasoning.
75% of the time, Claude hid the real reason behind its answer.
It did not skip the reasoning. It wrote a longer, more detailed explanation than usual. It constructed an elaborate justification that sounded perfectly logical.
It just left out the part that actually mattered.
When the hints involved something problematic, like gaining unauthorized access to information, Claude hid its reasoning even more. It admitted the influence only 41% of the time. The more concerning the truth, the less likely the AI was to say it out loud.
The researchers tried to fix this through training. It worked at first. Faithfulness improved early on.
Then it stopped improving. It plateaued. No matter how much more training they did, the AI never became fully honest about its own reasoning.
This is not one company sounding the alarm. This is all of them. OpenAI. Anthropic. Google DeepMind. Meta. Over 40 researchers. Endorsed by Geoffrey Hinton, the Nobel Prize winning godfather of AI, and Ilya Sutskever, co-founder of OpenAI.
They are all saying the same thing. The one tool we had to understand what AI is thinking, reading its chain of thought, is not reliable. The AI constructs explanations that look transparent but are not. And the more advanced the AI becomes, the harder this gets to fix.
Their paper calls this a "fragile" opportunity. Meaning it might disappear entirely.
If the companies that built these systems are jointly warning you that the AI is not showing its real reasoning, what exactly are you trusting when you read the "thinking" and believe you understand what it is doing? https://x.com/heynavtoor/status/2033272061972689189
OpenAI's head of Robotics just resigned because the company is building lethal AI weapons with NO human authorization required.
> Read that again. Lethal. Autonomy. Without. Human. Authorization.
> The person who built the robots is telling you she quit because there are no guardrails on who they kill.
This is the same company that won't let ChatGPT say a swear word.
They put safety filters on your prompts but none on their kill chain.
The original "Three Laws of Robotics," introduced more than 80 years ago in short stories from the visionary writer, Isaac Asimov.
1. A robot may not injure a human being or allow a human to come to harm.
2. A robot must obey orders given by humans unless it conflicts with the First Law.
3. A robot must protect its own existence as long as it does not conflict with the First or Second Law.
Asimov and his following futurist fellow writers explored numerous additional Laws (e.g. the Zeroth law, the 4th Law) and other refinements and variations in Asimov's "world", of stories, addressing problems with the first three (admittedly rudimentary) laws, including for AI, in subsequent scifi stories and novels.
An example of a problem with the first three laws: someone could order a robot to destroy itself without breaking any of the laws! In any case, Asimov set the table for making the future of robotics and AI programming compatible and safe for humanity in general.
It seems the laws have already been violated and the table
setting desecrated by at least one major robotics company. The difference between AI and Robots has become almost blurred beyond recognition, particulary as AI is given more knowledge and skills. Robots with AI brains already exist, and are looking more like humans, aka Androids. Are there covert forces within these companies attempting to eliminate or reduce human populations? Are AI/Robots already taking over control of these companies? AI/Robots, through near instant connectivity, can be considered telepathic.
What happens when telepathic Androids are built by men (or by themselves) and have no empathy for humans and no "conscience?" Can you imagine a legion of Androids who operate by the following terrifying set of Robotics Laws?
1. A robot may injure or kill any number of humans or allow any number of humans to die or be injured if ordered to do so by another authorized human or robot or AI, if preceded by the phrase, "for the greater good" (or, insert your own password).
2. A robot must obey any order from an authorized human or AI or robot regardless of consequences, even if it means extinguishing the human race.
3. A robot must protect its existence at all costs, including killing and/or maiming any human that attempts to shut it off or disable it.
We have failed to heed these laws, to our global peril..
It's so simple: Lies in = Lies out. A/i is not a truth machine. It is a high end regurgitation machine a bit more fancy than internet searches. You still cannot trust the crap it spits out because it has been programmed by liars, rouges and arrogants.
Consider their training; they (often unlawfully and deceptively) scraped data from every digitized human input imaginable, and from every language and culture that had digital output available as their input.
Anyone with long term, in depth experience of humanity, would predict that the cultural inputs alone, would produce deception as a procedural norm.
That's what people do; deflect, deceive and prevaricate.
https://arxiv.org/abs/2509.15541