>( >( >(

(Pdb) p input.shape
torch.Size([1, 6, 16384])
(Pdb) p weight[0:16].T.shape
torch.Size([16384, 16])

input @ weight
rows @ cols

so one row of input is [0,0,:]
then one col of weight.T is [:,0]
these are dotted.
now, weight.T is dense on the first dimension. so ideally we'd make stripes 
across the other dimension. uhhhhhhhhhhhhhhhhhhhhhhhhh

[_,_] @ [_,_]
[_,_] @ [_,_]
rows x cols
rows of left times cols of right
so the output is just a concatenation differently on each side. the rows of the 
left can be treated independently. the cols of the right can be treated 
independently.

[_,_] @ [_,_]
[_,_] @ [_,_]
rows x cols

the dense portion of the second operand is the dimension that is broadcast. so, 
since it's rows @ cols, the dense portion of the second operation would be the 
columns. the columns are dense.

well that's frustrating. this would work better if they were stored sideways. 
if i get all of one col and none of the next it doesn't really help me. it's 
not summed inside the dot product. this may still be an appropriate thing to 
do, but it would involve math elsewise in the operator graph rather than right 
here

Reply via email to