karl3ļ¼ writeme.com wrote: > > it totally worked it completed "Once upon a" -> " time" with only a top_k > > of 6 > > > actually it didn't do that i think i prompted it with that token > > it made totally wrong tokens > > the next text was "interconnected.goBack_SECURE" > > more what i expected :/ > > well i changed models to athena which i've logged so i could compare the > > math was correct > and it turns out i was making the tensors contiguous incorrectly and my > numbers were all wrong > i added an extra loop to separate out all the strides, but it then becomes so > slow to simply iterate the indices that i haven't seen a single loop complete > (although i was doing all 8192 indices in athena to test). > > i was thinking that i would step away because a better algorithm informed by > pagesize could likely need less index iteration, as well as simply using the > tensor strides as opposed to calling a wrapped indexing function for every > scalar in the matrix > additionally merging the fetch regions would reduce the extensive incomplete > loop to a single fetch in this case > > another idea is to try a yet smaller model, i've got llama 1B logged for tiny > tests > > but i spent all day and got so close. it was really cool to forward llama > 405b streaming over the internet in just a few seconds. but it wasn't > selecting the correct data due to misinterpretation of sparse strides. > > an upshot is that it's quite possible those faulty results would look much > better with my indexing error fixed > this might be a situation where preprocessing the data to change its storage > layout (which could also include extrema of the rows and columns of each > matrix which would make top_k more effective) would make some sense > > man it's so close
i'm slow coding because [my experience left a part of me that rewires me to fight me] and i can have harsh eye jerking issues and amnesia associated with things like changing lines in an editor that i have to continuously navigate :s coding was my biggest skill though (long ago) i wish i had somebody to talk to about it who could kind of confirm that they understood and help learn and navigate triggers
