Re: [PATCH] arm64: do_csum: implement accelerated scalar version

Will Deacon Thu, 15 Aug 2019 09:47:13 -0700

On Thu, May 16, 2019 at 11:14:35AM +0800, Zhangshaokun wrote:
> On 2019/5/15 17:47, Will Deacon wrote:
> > On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote:
> >> On 12/04/2019 10:52, Will Deacon wrote:
> >>> I'm waiting for Robin to come back with numbers for a C implementation.
> >>>
> >>> Robin -- did you get anywhere with that?
> >>
> >> Still not what I would call finished, but where I've got so far (besides an
> >> increasingly elaborate test rig) is as below - it still wants some 
> >> unrolling
> >> in the middle to really fly (and actual testing on BE), but the worst-case
> >> performance already equals or just beats this asm version on Cortex-A53 
> >> with
> >> GCC 7 (by virtue of being alignment-insensitive and branchless except for
> >> the loop). Unfortunately, the advantage of C code being instrumentable does
> >> also come around to bite me...
> > 
> > Is there any interest from anybody in spinning a proper patch out of this?
> > Shaokun?
> 
> HiSilicon's Kunpeng920(Hi1620) benefits from do_csum optimization, if Ard and
> Robin are ok, Lingyan or I can try to do it.
> Of course, if any guy posts the patch, we are happy to test it.
> Any will be ok.


I don't mind who posts it, but Robin is super busy with SMMU stuff at the
moment so it probably makes more sense for you or Lingyan to do it.

Will

Re: [PATCH] arm64: do_csum: implement accelerated scalar version

Reply via email to