> > It'd be great if you could format this patch into a patch set with several > > little ones. :-) > > Also, the kernel checkpatch is very helpful. > > Good coding style and patch organization make it easy for in-depth reviews. > > > Combination of scalar and vector (32/64/128) was done to get optimal > performance numbers. If there is enough interest in this I can work on it and > provide an updated patch set.
That'll be very helpful! Looking forward to your patch :) BTW, have you tested real example performance with your patch?