> > it totally worked it completed "Once upon a" -> " time" with only a top_k 
> > of 6

> actually it didn't do that i think i prompted it with that token
> it made totally wrong tokens
> the next text was "interconnected.goBack_SECURE"
> more what i expected :/

well i changed models to athena which i've logged so i could compare the math 
was correct
and it turns out i was making the tensors contiguous incorrectly and my numbers 
were all wrong
i added an extra loop to separate out all the strides, but it then becomes so 
slow to simply iterate the indices that i haven't seen a single loop complete 
(although i was doing all 8192 indices in athena to test).

i was thinking that i would step away because a better algorithm informed by 
pagesize could likely need less index iteration, as well as simply using the 
tensor strides as opposed to calling a wrapped indexing function for every 
scalar in the matrix
additionally merging the fetch regions would reduce the extensive incomplete 
loop to a single fetch in this case

another idea is to try a yet smaller model, i've got llama 1B logged for tiny 
tests

but i spent all day and got so close. it was really cool to forward llama 405b 
streaming over the internet in just a few seconds. but it wasn't selecting the 
correct data due to misinterpretation of sparse strides.

an upshot is that it's quite possible those faulty results would look much 
better with my indexing error fixed
this might be a situation where preprocessing the data to change its storage 
layout (which could also include extrema of the rows and columns of each matrix 
which would make top_k more effective) would make some sense

man it's so close

Reply via email to