On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis <charles.bay...@linaro.org> wrote: > On 5 June 2014 07:27, Ramana Radhakrishnan <ramana....@googlemail.com> wrote: >> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis >> <charles.bay...@linaro.org> wrote: >>> This patch adds support for post-indexed addressing for NEON structure >>> memory accesses. >>> >>> For example VLD1.8 {d0}, [r0], r1 >>> >>> >>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. >>> >>> Ok for trunk? >> >> This looks like a reasonable start but this work doesn't look complete >> to me yet. >> >> Can you also look at the impact on performance of a range of >> benchmarks especially a popular embedded one to see how this behaves >> unless you have already done so ? > > I ran a popular suite of embedded benchmarks, and there is no impact > at all on Chromebook (including with the additional attached patch)
Thanks for the due diligence > > The patch was developed to address a performance issue with a new > version of libvpx which uses intrinsics instead of NEON assembler. The > patch results in a 3% improvement for VP8 decode. Good - 3% not to be sneezed at. > >> POST_INC, POST_MODIFY usually have a funny way of biting you with >> either ivopts or the way in which address costs work. I think there >> maybe further tweaks needed but for a first step I'd like to know what >> the performance impact is. > >> I would also suggest running this through clyon's neon intrinsics >> testsuite to see if that catches any issues especially with the large >> vector modes. Thanks. > > No issues found in clyon's tests. Please keep an eye out for any regressions. > > Your mention of larger vector modes prompted me to check that the > patch has the desired result with them. In fact, the costs are > estimated incorrectly which means the post_modify pattern is not used. > The attached patch fixes that. (used in combination with my original > patch) > > > 2014-06-15 Charles Baylis <charles.ba...@linaro.org> > > * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with > embedded side effects. I'm not too thrilled with putting in more special cases that are not table driven in there. Can you file a PR with some testcases that show this so that we don't forget and CC me on it please ? Ramana