karl3@writeme.com wrote: > karl3@writeme.com wrote: > > ( >( >( > > (Pdb) p input.shape > > torch.Size([1, 6, 16384]) > > (Pdb) p weight[0:16].T.shape > > torch.Size([16384, 16]) > > input @ weight > > rows @ cols > > so one row of input is [0,0,:] > > then one col of weight.T is [:,0] > > these are dotted. > > now, weight.T is dense on the first dimension. so ideally we'd make stripes > > across the other dimension. uhhhhhhhhhhhhhhhhhhhhhhhhh > > [_,_] @ [_,_] > > [_,_] @ [_,_] > > rows x cols > > rows of left times cols of right > > so the output is just a concatenation differently on each side. the rows of > > the left can be treated independently. the cols of the right can be treated > > independently. > > [_,_] @ [_,_] > > [_,_] @ [_,_] > > rows x cols > > the dense portion of the second operand is the dimension that is broadcast. > > so, since it's rows @ cols, the dense portion of the second operation would > > be the columns. the columns are dense. > > well that's frustrating. this would work better if they were stored > > sideways. if i get all of one col and none of the next it doesn't really > > help me. it's not summed inside the dot product. this may still be an > > appropriate thing to do, but it would involve math elsewise in the operator > > graph rather than right here > > looks like http does actually support sparse requests: > > https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Range_requests#mult... > i wonder what a server would do if i sent a huge sparsity document for a > range header. there must be some kind of maximum (there might not be) ... > likely other issues too
i think the underlying store operates in sizes of mmap pages. so ideally that would backpropagate somehow to the slicing operator and i could also use the surrounding data that is in the same page. maybe a separate function to fetch the smallest encompassing slice--[someday
