Re: [agi] Re: Safety Via Wolpert-Constrained ML

Matt Mahoney Tue, 14 Jan 2025 15:13:01 -0800

On Mon, Jan 13, 2025, 10:22 PM John Rose <johnr...@polyplexic.com> wrote:

> On Monday, December 23, 2024, at 11:17 AM, James Bowery wrote:
>
> *We may be talking at cross purposes here. I am referring to the theorem
> that no intelligence can model itsel*
>
>
> I keep thinking there is a way around Wolpert's theorem, agents fully
> modelling, if you go cross-Planck hologram or the universe modelling
> itself, because it is itself, it is own prediction. Unless agents need to
> be a subset of the universe to be considered an agent.
>

An agent modeling itself is the special case of two agents modeling each
other when both have the same software and initial state. Wolpert's theorem
says they both can't predict the other, even if they both know the software
and initial state of the other so that they could run a copy of the other
agent.

Proof: Suppose these two agents (not necessary identical) play rock
scissors paper. Agent 1 runs a copy of agent 2 and predicts it will play
rock, so it plays paper. Agent 2 runs a copy of agent 1 and knows it will
play paper, so it plays scissors. Who wins?

Wolpert's theorem is fundamental to the alignment problem. Either you can
predict what AI will do, or it can predict you, but not both. We use
prediction accuracy to measure intelligence. If you want to build an AI
that is smarter than you, then you can't predict what it will do. This
means you can't control it either, because that would require you to
predict the effects of your actions.

One argument against this is that intelligence is not a point on a line. So
in reality each agent can predict the other sometimes but not always. That
is today's reality. If by intelligence you mean a broad set of
capabilities, then the practical effect is that as AI gains capabilities,
it gets better at predicting you and you get worse at predicting it.

A second argument is that the goal of AI is not intelligence, it is
modeling human behavior. But this is an argument against a singularity. It
is not an argument against the loss of control. It means instead of being
controlled by autonomous self improving AI, you are controlled by
billionaires predicting the best way to enrich themselves at your expense.

I must repeat. The risk of AI is getting everything you want. You are
controlled by positive reinforcement. When you train a dog with treats, it
thinks it is controlling you.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tff34429f975bba30-M633fd156ff3504b359828fdb
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Re: Safety Via Wolpert-Constrained ML

Reply via email to