Your paper makes 2 main claims that I think should be 2 separate papers
with the claim in the title. Then the paper should have a proof or some
experimental data supporting the claim. Otherwise I could just disagree and
the question remains unresolved.

The first claim is that a feedback architecture would solve the
hallucination problem. This really should be called the bullshit problem.
LLMs model human behavior because they are trained on what humans write.
When humans don't know the answer, they tend to make up something plausible
because they don't want to look stupid. I realize that LLMs don't have
human emotions, but this does not stop them from expressing emotions that a
human would feel given the same inputs. They are simply predicting how a
human would act.

The second claim is that transformers could be made more efficient by
encoding logical rules. Again I disagree without supporting evidence. LLMs
model language learning in humans, starting with phonemes, then the rules
for tokenization, then vocabulary, then semantics, then grammar. Math and
logic are advanced grammars, learned later. Symbolic logic has low
computational complexity, which is why we tried to encode the rules
directly. The problem that killed Cyc was we didn't know how many rules we
needed to encode manually. Cyc had about 10^6 rules, but an LLM encodes
parameters equivalent to about 10^10 rules. What I think happens in the
brain and in an LLM is that it learns to apply the rules before knowing how
to express them. We learn to use nouns and verbs correctly in sentences
before we know the difference between a noun and a verb.

I did find your reference interesting that claims that the attention
mechanism in transformers is equivalent to a Hopfield net. A Hopfield net
is an associate memory that stores about 0.15n n-bit vectors and recalls
them by suppling part of the vector. It is a fully connected network of n
neurons where the two weights in either direction are contstrained to be
equal. Neuron activation levels are clamped to (-1, 1). The training rule
is to adjust the weight in the direction of the product. This maintains the
weight symmetry, as does Hebb's rule, so it is biologically plausible. The
attention mechanism in a transformer is a winner take all network where
neurons are mutually inhibiting. One way to make this more efficient would
be to store only n^2/2 weights, which would cut the computation in half.

-- Matt Mahoney, mattmahone...@gmail.com

On Tue, May 13, 2025, 9:08 AM YKY (Yan King Yin, 甄景贤) <
generic.intellige...@gmail.com> wrote:

> Sorry there was a mistake in my paper, I corrected it after the submission
> deadline, not sure if they'll accept it or not... anyway, here is the
> updated version.  The comparison of symbolic rewriting and neural rewriting
> is indeed possible, just that the left-hand-side of rewrite rule must match
> the entire input graph 😅
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T702c1cb81704d7d7-Ma321dff0a8e30d9fb9a4b94f>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T702c1cb81704d7d7-Mc50e622a60c38592aa90c2b8
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to