Your paper makes 2 main claims that I think should be 2 separate papers with the claim in the title. Then the paper should have a proof or some experimental data supporting the claim. Otherwise I could just disagree and the question remains unresolved.
The first claim is that a feedback architecture would solve the hallucination problem. This really should be called the bullshit problem. LLMs model human behavior because they are trained on what humans write. When humans don't know the answer, they tend to make up something plausible because they don't want to look stupid. I realize that LLMs don't have human emotions, but this does not stop them from expressing emotions that a human would feel given the same inputs. They are simply predicting how a human would act. The second claim is that transformers could be made more efficient by encoding logical rules. Again I disagree without supporting evidence. LLMs model language learning in humans, starting with phonemes, then the rules for tokenization, then vocabulary, then semantics, then grammar. Math and logic are advanced grammars, learned later. Symbolic logic has low computational complexity, which is why we tried to encode the rules directly. The problem that killed Cyc was we didn't know how many rules we needed to encode manually. Cyc had about 10^6 rules, but an LLM encodes parameters equivalent to about 10^10 rules. What I think happens in the brain and in an LLM is that it learns to apply the rules before knowing how to express them. We learn to use nouns and verbs correctly in sentences before we know the difference between a noun and a verb. I did find your reference interesting that claims that the attention mechanism in transformers is equivalent to a Hopfield net. A Hopfield net is an associate memory that stores about 0.15n n-bit vectors and recalls them by suppling part of the vector. It is a fully connected network of n neurons where the two weights in either direction are contstrained to be equal. Neuron activation levels are clamped to (-1, 1). The training rule is to adjust the weight in the direction of the product. This maintains the weight symmetry, as does Hebb's rule, so it is biologically plausible. The attention mechanism in a transformer is a winner take all network where neurons are mutually inhibiting. One way to make this more efficient would be to store only n^2/2 weights, which would cut the computation in half. -- Matt Mahoney, mattmahone...@gmail.com On Tue, May 13, 2025, 9:08 AM YKY (Yan King Yin, 甄景贤) < generic.intellige...@gmail.com> wrote: > Sorry there was a mistake in my paper, I corrected it after the submission > deadline, not sure if they'll accept it or not... anyway, here is the > updated version. The comparison of symbolic rewriting and neural rewriting > is indeed possible, just that the left-hand-side of rewrite rule must match > the entire input graph 😅 > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T702c1cb81704d7d7-Ma321dff0a8e30d9fb9a4b94f> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T702c1cb81704d7d7-Mc50e622a60c38592aa90c2b8 Delivery options: https://agi.topicbox.com/groups/agi/subscription