And an RNN can be approximated by merely increasing the number of identical feedforward layers to an arbitrary number.
It helps to make the layers non-identical but the problem is, in principle, the same. Check this out: https://iopscience.iop.org/article/10.1088/1757-899X/1042/1/012030/pdf On Sun, Dec 12, 2021 at 3:57 PM <[email protected]> wrote: > I could be wrong but I thought Transformers have layers of self/attention, > so like it clarifies what ex 'it' is (the cat) and then in the next pass / > layer it uses that to figure out what 'the thing i just mentioned' refers > to (it (the cat)). > > ? > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T22ce813ce07d9b1a-M450271af8965958352874108> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T22ce813ce07d9b1a-M0c8a9a470bd3b26c6842f8d9 Delivery options: https://agi.topicbox.com/groups/agi/subscription
