To me, this "Transformer Attention" is just the 2 things I explain: 1) Recent 
letters/ words/ etc are made more probable, so if cat>runs more than cat>sleep, 
predict cat>RUNS more often or with higher probability, but if you saw recently 
sleep 3 times, you predict sleep much more than runs, temporarily. And 2), 
translation, so you see cat as dog ex. cat>barked, you take dog's predictions, 
but this can be ambiguous, my [stick]>them....tree? or move? So you attend to 
the past to see what supports translation....
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tefaeb8e790a54cec-M260dae39fabbebf60268f3d8
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to