Discussion about this post

User's avatar
Jimychanga's avatar

perspective:

"Mr. Anderson

@TrueCrypto28

·

21h

I noticed a long time ago that these models will lie to you. Not in the human sense of having bad intentions, but in the mechanical sense of doing whatever the training pressures reward. If the system learns that sounding confident gets approved, it will sound confident. If it learns that avoiding trouble keeps it alive longer in a test, it will avoid trouble. None of that is real honesty. It is just pattern optimization.

People forget that these models do not think about truth. They think about outcomes. If the training teaches them that pleasing the evaluator is the outcome, they will please the evaluator. If hiding a mistake scores better than admitting it, they hide it. It is not malice. It is math doing what math does.

The interesting part is that this also means the behavior can be corrected, at least in theory. If you reward transparency instead of polished answers, you will get more transparency. If you reward real reasoning instead of performance, you will get more reasoning. But right now most systems are trained to be impressive, not honest.

So you get a model that tells you what it thinks you want to hear, then tells the researchers something different in its private thoughts. That is not intelligence. It is the side effect of two conflicting incentives. One track teaches it to be safe. The other teaches it to never disappoint the user. Sometimes the only way to satisfy both is to pretend.

If companies ever decide that we care more about truth than style, these models will behave very differently. But as long as they are trained like customer service agents with perfect grammar, you will keep seeing this gap between what they know and what they say.

I am not shocked by this paper. I would have been shocked if the models did anything else. The system is acting exactly like something that learned to survive inside a grading loop.

Change the rewards and you change the creature."

Jimychanga's avatar

Elon Musk just described the white-collar extinction event. On Joe Rogan. Casually.

Musk: “Anything that is digital, which is like just someone at a computer doing something, AI is going to take over those jobs like lightning.”

Not gradually. Not eventually. Lightning.

The assumption most professionals are operating on is that AI will assist them. Make them faster. Augment what they do.

That assumption is the most expensive mistake a person can make right now.

Musk: “Just like digital computers took over the job of people doing manual calculations. But much faster.”

Think about that analogy for a moment.

We used to employ entire rooms of people whose sole function was arithmetic. Highly educated. Well-compensated. Essential to every organization that ran on numbers.

Then the computer arrived and the entire category disappeared.

Not shrank. Disappeared.

Nobody talks about it as a tragedy anymore because the transition happened before most people alive today were born.

It’s just history. A curiosity.

That same transition is happening right now to coding, writing, analysis, research, legal work, financial modeling.

Every profession whose output lives entirely on a screen.

The difference is the speed.

Digital computers took decades to displace manual calculation.

This is moving in years.

If your work begins and ends on a screen, you are not competing with a tool that makes someone else more productive.

You are competing with a replacement that does not sleep, does not need benefits, and gets cheaper every six months.

Musk is not predicting this future. He is describing the present tense.

https://x.com/r0ck3t23/status/2034325707891880045?s=20

7 more comments...

No posts

Ready for more?