Transformer Attention does seem to be more than just those two fundamental
points.
I do not want to spend a lot of time working with NNs (other than on my TinyML
projects) but I do want to get a better understanding about how these things
work and then apply some of the ideas to some slightly more discrete method.
For example, by studying how CNNs work with visual data I was able to see the
first input steps which can be used in a more discrete analysis of visual data.
I got into a completely unexpected discussuament with someone who insisted that
NNs are not linear and I perhaps unwisely happened to say that they use linear
approximations. So after that brouhaha I started wondering if I could create an
Artificial Artificial Neural Network using an array/field of trial and error
linear approximations. It makes sense, it should be feasible, and it should be
a simple experiment to try out. So by getting an idea about how attention
works in DL I can begin to translate it into my own ideas (some of which may be
crackpot and some of which may turn out to be interesting). I can't quite
understand how Transformer Attention increases parallelization except, perhaps,
because it is keeping track of interesting previous states there is more time
for each node to spend on processing? Or since each node does not use a
traditional nn, the epochs are spread rather than recycled in iterations? That
does not sound quite right but it does open possibilities for my trial and
error approximation AANN, which should hopefully open some possibilities to try
to develop some more discrete AI methods.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink:
https://agi.topicbox.com/groups/agi/Tefaeb8e790a54cec-Mcbfd35f08a044d9599663a68
Delivery options: https://agi.topicbox.com/groups/agi/subscription