Transformer Attention does seem to be more than just those two fundamental 
points. 
I do not want to spend a lot of time working with NNs (other than on my TinyML 
projects) but I do want to get a better understanding about how these things 
work and then apply some of the ideas to some slightly more discrete method. 
For example, by studying how CNNs work with visual data I was able to see the 
first input steps which can be used in a more discrete analysis of visual data. 
I got into a completely unexpected discussuament with someone who insisted that 
NNs are not linear and I perhaps unwisely happened to say that they use linear 
approximations. So after that brouhaha I started wondering if I could create an 
Artificial Artificial Neural Network using an array/field of trial and error 
linear approximations.  It makes sense, it should be feasible, and it should be 
a simple experiment to try out.  So by getting an idea about how attention 
works in DL I can begin to translate it into my own ideas (some of which may be 
crackpot and some of which may turn out to be interesting). I can't quite 
understand how Transformer Attention increases parallelization except, perhaps, 
because it is keeping track of interesting previous states there is more time 
for each node to spend on processing?  Or since each node does not use a 
traditional nn, the epochs are spread rather than recycled in iterations? That 
does not sound quite right but it does open possibilities for my trial and 
error approximation AANN, which should hopefully open some possibilities to try 
to develop some more discrete AI methods. 
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tefaeb8e790a54cec-Mcbfd35f08a044d9599663a68
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to