karl3ļ¼ writeme.com wrote: > > ( >( >( > > (Pdb) p input.shape > torch.Size([1, 6, 16384]) > (Pdb) p weight[0:16].T.shape > torch.Size([16384, 16]) > > input @ weight > rows @ cols > > so one row of input is [0,0,:] > then one col of weight.T is [:,0] > these are dotted. > now, weight.T is dense on the first dimension. so ideally we'd make stripes > across the other dimension. uhhhhhhhhhhhhhhhhhhhhhhhhh > > [_,_] @ [_,_] > [_,_] @ [_,_] > rows x cols > rows of left times cols of right > so the output is just a concatenation differently on each side. the rows of > the left can be treated independently. the cols of the right can be treated > independently. > > [_,_] @ [_,_] > [_,_] @ [_,_] > rows x cols > > the dense portion of the second operand is the dimension that is broadcast. > so, since it's rows @ cols, the dense portion of the second operation would > be the columns. the columns are dense. > > well that's frustrating. this would work better if they were stored sideways. > if i get all of one col and none of the next it doesn't really help me. it's > not summed inside the dot product. this may still be an appropriate thing to > do, but it would involve math elsewise in the operator graph rather than > right here
looks like http does actually support sparse requests: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Range_requests#multipart_ranges i wonder what a server would do if i sent a huge sparsity document for a range header. there must be some kind of maximum (there might not be) ... likely other issues too
