Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-31 Thread Toon Moene
On 10/31/2011 03:23 PM, Jakub Jelinek wrote: On Sat, Oct 29, 2011 at 03:53:37PM +0200, Toon Moene wrote: I wonder whether it will work with the attached Fortran routine - it sure would mean a boost to the 18%+ heaviest CPU user in our code. Would be nice to cut down slightly this testcase

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-31 Thread Jakub Jelinek
On Mon, Oct 31, 2011 at 03:23:32PM +0100, Jakub Jelinek wrote: > Would be nice to cut down slightly this testcase into just one or two loops > that are vectorized and turn it into a runtime testcase which verifies > the vectorization was correct. Here is one such testcase (though, in your case the

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-31 Thread Jakub Jelinek
On Sat, Oct 29, 2011 at 03:53:37PM +0200, Toon Moene wrote: > I wonder whether it will work with the attached Fortran routine - it > sure would mean a boost to the 18%+ heaviest CPU user in our code. It didn't do anything, but only because I used a bad approach in vect_check_gather. I have been u

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-29 Thread Toon Moene
On 10/26/2011 11:56 PM, Jakub Jelinek wrote: Hi! This patch implements gather vectorization with -mavx2, if dr_may_alias (which apparently doesn't use tbaa :(( ) can figure out there is no overlap with stores in the loop (if any). The testcases show what is possible to get vectorized. Hmmm,

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-29 Thread Toon Moene
On 10/26/2011 11:56 PM, Jakub Jelinek wrote: Hi! This patch implements gather vectorization with -mavx2, if dr_may_alias (which apparently doesn't use tbaa :(( ) can figure out there is no overlap with stores in the loop (if any). The testcases show what is possible to get vectorized. I chose t

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Richard Guenther wrote: > On Fri, 28 Oct 2011, Jakub Jelinek wrote: > > > On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote: > > > It is also because of re-use of memory via memcpy (yes, some dubious > > > TBAA case from C, but essentially we don't want to brea

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Jakub Jelinek wrote: > On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote: > > It is also because of re-use of memory via memcpy (yes, some dubious > > TBAA case from C, but essentially we don't want to break that). Thus > > we can't use TBAA on anonymous memory

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote: > It is also because of re-use of memory via memcpy (yes, some dubious > TBAA case from C, but essentially we don't want to break that). Thus > we can't use TBAA on anonymous memory. No, IMHO we always use a ref_all mem access in t

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Richard Guenther
On Fri, 28 Oct 2011, Jakub Jelinek wrote: > On Fri, Oct 28, 2011 at 02:01:36PM +0400, Kirill Yukhin wrote: > > this looks really cool. I have a liitle question, since I do not > > understand vectorizer as good. > > > > Say, we have a snippet: > > int *p; > > int idx[N]; > > int arr[M]; > > for (.

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Jakub Jelinek
On Fri, Oct 28, 2011 at 02:01:36PM +0400, Kirill Yukhin wrote: > this looks really cool. I have a liitle question, since I do not > understand vectorizer as good. > > Say, we have a snippet: > int *p; > int idx[N]; > int arr[M]; > for (...) > { > p[i%4] += arr[idx[I]]; > } > As far as I understa

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

2011-10-28 Thread Kirill Yukhin
Hi Jacob, this looks really cool. I have a liitle question, since I do not understand vectorizer as good. Say, we have a snippet: int *p; int idx[N]; int arr[M]; for (...) { p[i%4] += arr[idx[I]]; } As far as I understand, we cannot do gather we, since p may point to somewere in arr, and, idx ma