On Thu, May 16, 2019 at 11:14:35AM +0800, Zhangshaokun wrote: > On 2019/5/15 17:47, Will Deacon wrote: > > On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote: > >> On 12/04/2019 10:52, Will Deacon wrote: > >>> I'm waiting for Robin to come back with numbers for a C implementation. > >>> > >>> Robin -- did you get anywhere with that? > >> > >> Still not what I would call finished, but where I've got so far (besides an > >> increasingly elaborate test rig) is as below - it still wants some > >> unrolling > >> in the middle to really fly (and actual testing on BE), but the worst-case > >> performance already equals or just beats this asm version on Cortex-A53 > >> with > >> GCC 7 (by virtue of being alignment-insensitive and branchless except for > >> the loop). Unfortunately, the advantage of C code being instrumentable does > >> also come around to bite me... > > > > Is there any interest from anybody in spinning a proper patch out of this? > > Shaokun? > > HiSilicon's Kunpeng920(Hi1620) benefits from do_csum optimization, if Ard and > Robin are ok, Lingyan or I can try to do it. > Of course, if any guy posts the patch, we are happy to test it. > Any will be ok.
I don't mind who posts it, but Robin is super busy with SMMU stuff at the moment so it probably makes more sense for you or Lingyan to do it. Will