karl3ļ¼ writeme.com wrote:
> > it totally worked it completed "Once upon a" -> " time" with only a top_k 
> > of 6
> > > actually it didn't do that i think i prompted it with that token
> > it made totally wrong tokens
> > the next text was "interconnected.goBack_SECURE"
> > more what i expected :/
> > well i changed models to athena which i've logged so i could compare the 
> > math was correct
> and it turns out i was making the tensors contiguous incorrectly and my 
> numbers were all wrong
> i added an extra loop to separate out all the strides, but it then becomes so 
> slow to simply iterate the indices that i haven't seen a single loop complete 
> (although i was doing all 8192 indices in athena to test).
> 
> i was thinking that i would step away because a better algorithm informed by 
> pagesize could likely need less index iteration, as well as simply using the 
> tensor strides as opposed to calling a wrapped indexing function for every 
> scalar in the matrix
> additionally merging the fetch regions would reduce the extensive incomplete 
> loop to a single fetch in this case
> 
> another idea is to try a yet smaller model, i've got llama 1B logged for tiny 
> tests
> 
> but i spent all day and got so close. it was really cool to forward llama 
> 405b streaming over the internet in just a few seconds. but it wasn't 
> selecting the correct data due to misinterpretation of sparse strides.
> 
> an upshot is that it's quite possible those faulty results would look much 
> better with my indexing error fixed
> this might be a situation where preprocessing the data to change its storage 
> layout (which could also include extrema of the rows and columns of each 
> matrix which would make top_k more effective) would make some sense
> 
> man it's so close

i'm slow coding because [my experience left a part of me that rewires me to 
fight me] and i can have harsh eye jerking issues and amnesia associated with 
things like changing lines in an editor that i have to continuously navigate :s
coding was my biggest skill though (long ago)

i wish i had somebody to talk to about it who could kind of confirm that they 
understood and help learn and navigate triggers

Reply via email to