On January 11, 2017 5:16:43 PM GMT+01:00, Robin Dapp <rd...@linux.vnet.ibm.com> 
wrote:
>Hi,
>
>When examining the performance of some test cases on s390 I realized
>that we could do better for constructs like 2-byte memcpys or
>2-byte/4-byte memsets. Due to some s390-specific architectural
>properties, we could be faster by e.g. avoiding excessive unrolling and
>using dedicated memory instructions (or similar).

Not sure why you mention memcpy, how does that depend on 'element size'?

>For 1-byte memset/memcpy the builtin functions provide a
>straightforward
>way to achieve this. At first sight it seemed possible to extend
>tree-loop-distribution.c to include the additional variants we need.
>However, multibyte memsets/memcpys are not covered by the C standard
>and
>I'm therefore unsure if such an approach is preferable or if there are
>more idiomatic ways or places where to add the functionality.

Yes, for memset with larger element we could add an optab plus internal 
function combination and use that when the target wants.  Or always use such 
IFN and fall back to loopy expansion.

>The same question goes for 2-byte strlen. I didn't see a recognition
>pattern for strlen (apart from optimizations due to known string length
>in tree-ssa-strlen.c). Would it make sense to include strlen
>recognition
>and subsequently handling for 2-byte strlen? The situation might of

I'd say a multibyte memchr might make sense, but strlen specifically?  Not sure.

Likewise multibyte memcmp.

Richard.

>course more complicated than memset because of encodings etc. My
>snippet
>in question used a fixed-length encoding of 2 bytes, however.
>
>Another simple idea to tackle this would be a peephole optimization but
>I'm not sure if this is really feasible for something like memset.
>Wouldn't the peephole have to be recursive then?
>
>Regards
> Robin

Reply via email to