karl3@writeme.com wrote:
> karl3@writeme.com wrote:
> > there's some interest in 'downloading only top k items' [this involves 
> > looking at the layer algebra [and coming up with ways to identify 
> > low-contributing values.
> > [[we have solved this before possibly/optionally including preprocessing to 
> > categorize things
> > top k is more fun! it seems niftier to make ummmmmmmmm
> 
> so we've got some input logits. these are probably getting multiplied by a 
> _huge_ matrix.
> 
> we could technically do a naiveish approach of discarding the parts that are 
> multiplied by values near zero. (we could actually consider that each dot 
> product has large values and small values, and skip all values that are 
> smaller than a percentage of the largest values.)
> - this works much better if we find a way to clump the mask based on locality 
> :/ since http servers like to send regions of bytes not sparse masks
> - this is really cool if we make like a bayesian or error-labeled datatype, 
> so instead of 3.4 it's more like 3.4+-0.31 this would give much more useful 
> information at the end
> 
> but yeah it seems interesting to just try the mask! involves some simple 
> torch kernel algebra

there's a small space here where one can get the _same exact output_ by 
predicting that some products would be smaller than the precision of the sum 
... this might at least need information on the magnitude of the weights unsure 
... ... but there are likely heuristics one could apply here that would be 
accurate because of the rote nature of the training process and possibly a lack 
of useful+accurate information one would expect from an overtiny number 
multiplied by an overlarge one ...

that's kind of more in line with the intent of httptransformer and llm_logits, 
to be able to work on things like that on your cellphone, but i didn't make 
llm_logits for this model

ummm i guess i'll look a little at matmul

Reply via email to