On 01/11/2017 09:16 AM, Robin Dapp wrote:
Hi,
When examining the performance of some test cases on s390 I realized
that we could do better for constructs like 2-byte memcpys or
2-byte/4-byte memsets. Due to some s390-specific architectural
properties, we could be faster by e.g. avoiding excessiv
On Thu, Jan 12, 2017 at 9:26 AM, Robin Dapp wrote:
>> Yes, for memset with larger element we could add an optab plus
>> internal function combination and use that when the target wants. Or
>> always use such IFN and fall back to loopy expansion.
>
> So, adding additional patterns in tree-loop-dis
> Yes, for memset with larger element we could add an optab plus
> internal function combination and use that when the target wants. Or
> always use such IFN and fall back to loopy expansion.
So, adding additional patterns in tree-loop-distribute.c (and mapping
them to dedicated optabs) is fine?
On Wed, 2017-01-11 at 17:16 +0100, Robin Dapp wrote:
> Hi,
Hi Robin,
I thought I'd share some of what I've run into while doing similar
things for the rs6000 target.
First off, be aware that glibc does some macro expansion things to try
to handle 1/2/3 byte string operations in some cases.
Sec
On January 11, 2017 5:16:43 PM GMT+01:00, Robin Dapp
wrote:
>Hi,
>
>When examining the performance of some test cases on s390 I realized
>that we could do better for constructs like 2-byte memcpys or
>2-byte/4-byte memsets. Due to some s390-specific architectural
>properties, we could be faster b