matrix math :S :S :S ummmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm rows @ cols
the left is a matrix, the right is a matrix except here it's different instead it's a batched vector operation i think ... and torch does this such that the right side is transposed. so it's actually rows times rows i think ... (Pdb) p input.shape torch.Size([1, 6, 16384]) (Pdb) p weight[0:16].T.shape torch.Size([16384, 16]) (Pdb) p weight[0:16].shape torch.Size([16, 16384]) ummmm ok so ummmmm uhhhhhhhhhhhhhhh it's not performing a sum or anything, it's just contatenating, so all the summing must be within the 16384 dimension. ok, and the underlying data is [N,16384], 16384 is the minor dimension so if i have some super-small values, then i only need those elements of this weight's values. sadly it gets striped/strided. two things think about at once grrr >( after mind influenced to hurt itself my thoughts get more interwired rather than more topical. interesting how that helps trafficking, lines up with getting more symbolic and suggestible let's go back to those tensors (Pdb) p input.shape torch.Size([1, 6, 16384]) (Pdb) p weight[0:16].T.shape torch.Size([16384, 16]) (Pdb) p weight[0:16].shape torch.Size([16, 16384]) input[...,X] is broadcast against weight[...,X] the earlier dimensions of input are batch dimensions ... uhh then if we think of weight.T ...