karl3ļ¼ writeme.com wrote:
> there's some interest in 'downloading only top k items' [this involves 
> looking at the layer algebra [and coming up with ways to identify 
> low-contributing values.
> 
> [[we have solved this before possibly/optionally including preprocessing to 
> categorize things

top k is more fun! it seems niftier to make ummmmmmmmm

so we've got some input logits. these are probably getting multiplied by a 
_huge_ matrix.

we could technically do a naiveish approach of discarding the parts that are 
multiplied by values near zero. (we could actually consider that each dot 
product has large values and small values, and skip all values that are smaller 
than a percentage of the largest values.)
- this works much better if we find a way to clump the mask based on locality 
:/ since http servers like to send regions of bytes not sparse masks
- this is really cool if we make like a bayesian or error-labeled datatype, so 
instead of 3.4 it's more like 3.4+-0.31 this would give much more useful 
information at the end

but yeah it seems interesting to just try the mask! involves some simple torch 
kernel algebra

Reply via email to