karl3ļ¼ writeme.com wrote:
> > ( >( >(
> > (Pdb) p input.shape
> torch.Size([1, 6, 16384])
> (Pdb) p weight[0:16].T.shape
> torch.Size([16384, 16])
> 
> input @ weight
> rows @ cols
> 
> so one row of input is [0,0,:]
> then one col of weight.T is [:,0]
> these are dotted.
> now, weight.T is dense on the first dimension. so ideally we'd make stripes 
> across the other dimension. uhhhhhhhhhhhhhhhhhhhhhhhhh
> 
> [_,_] @ [_,_]
> [_,_] @ [_,_]
> rows x cols
> rows of left times cols of right
> so the output is just a concatenation differently on each side. the rows of 
> the left can be treated independently. the cols of the right can be treated 
> independently.
> 
> [_,_] @ [_,_]
> [_,_] @ [_,_]
> rows x cols
> 
> the dense portion of the second operand is the dimension that is broadcast. 
> so, since it's rows @ cols, the dense portion of the second operation would 
> be the columns. the columns are dense.
> 
> well that's frustrating. this would work better if they were stored sideways. 
> if i get all of one col and none of the next it doesn't really help me. it's 
> not summed inside the dot product. this may still be an appropriate thing to 
> do, but it would involve math elsewise in the operator graph rather than 
> right here

looks like http does actually support sparse requests: 
https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Range_requests#multipart_ranges
i wonder what a server would do if i sent a huge sparsity document for a range 
header. there must be some kind of maximum (there might not be) ... likely 
other issues too

Reply via email to