Spit-balling a speculative answer to your question -- so don't take this _too_ seriously:
What transformers need in order to reason is a recurrent version of the attention mechanism -- not unlike what Hecht-Nielsen did by wapping his cogent confabulation <https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.9224&rep=rep1&type=pdf> (really *multi*-confabulation) with what he called "swirling" architecture illustrated on page 21 of Soren Solari's thesis <https://media.proquest.com/media/hms/ORIG/2/zoDxI?_s=8BGOpMLiGOebjhDy1rLAsDoErpE%3D> (one of Hecht-Nielsen's students). This treats reasoning as an attractor network which converges on a maximum likelihood sentence after a finite number of swirls or cycles. So, rather than parading around the aphorism that "Attention Is All You Need" so as to poke fun at Schmidhuber, try parading around something more like "LSTM Wrapping Attention Is Better Than Either". On Sun, Dec 12, 2021 at 10:21 AM <[email protected]> wrote: > Ah. Then what would be a dynamic Transformer architecture then? You have a > good idea to improve it? Can you explain it? > > Do you mean something like a Paths model like Google announced that'd be > more sparse and therefore more intuitive how it gets its answers (mostly > how)? > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T22ce813ce07d9b1a-Mb8c00d6e6bce92ee5b350aba> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T22ce813ce07d9b1a-M2746d682322512ac19e00f30 Delivery options: https://agi.topicbox.com/groups/agi/subscription
