To me, this "Transformer Attention" is just the 2 things I explain: 1) Recent
letters/ words/ etc are made more probable, so if cat>runs more than cat>sleep,
predict cat>RUNS more often or with higher probability, but if you saw recently
sleep 3 times, you predict sleep much more than runs, temporarily. And 2),
translation, so you see cat as dog ex. cat>barked, you take dog's predictions,
but this can be ambiguous, my [stick]>them....tree? or move? So you attend to
the past to see what supports translation....
------------------------------------------
Artificial General Intelligence List: AGI
Permalink:
https://agi.topicbox.com/groups/agi/Tefaeb8e790a54cec-M260dae39fabbebf60268f3d8
Delivery options: https://agi.topicbox.com/groups/agi/subscription