On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > On 26/10/15 15:04, Richard Biener wrote: >> >> >> apart from the fact that you'll post a new version you need to adjust >> GROUP_GAP. >> You also seem to somewhat "confuse" "first I stmts" and "a group of >> size I", those >> are not the same when the group has haps. I'd say "a group of size i" >> makes the >> most sense here thus I suggest to adjust the function comment accordingly. > > > Ok, thanks for pointing this out. My objective had been to only split the > store groups - which in BB vectorization, always seem to have gap 0 1 1 .... > 1. I didn't come up with a good scheme for how to split load groups, but it > seemed that I didn't need to do anything there if I restricted to BB > vectorization only. For example, consider (ignoring that we could multiply > the first four elements by 1 and add 0 to the last four): > > a[0] = b[I] + 1; > a[1] = b[J] + 2; > a[2] = b[K] + 3; > a[3] = b[L] + 4; > a[4] = b[M] * 3; > a[5] = b[N] * 4; > a[6] = b[O] * 5; > a[7] = b[P] * 7; > > > with constants I,J,K,L,M,N,O,P. Even with those being a sequence 2 0 1 1 3 0 > 2 1 with overlaps and repetitions, this works fine for BB SLP (two subgroups > of stores, *sharing* a load group but with different permutations). Likewise > 0 1 2 3 0 2 4 6. > > For loop SLP, yes it looks like the load group needs to be split. So how; > and what constraints to impose on those constants? (There is no single right > answer!) > > A fairly-strict scheme could be that (I,J,K,L) must be within a contiguous > block of memory, that does not overlap with the contiguous block containing > (M,N,O,P). Then, splitting the load group on the boundary seems reasonable, > and updating the gaps as you suggest. However, when you say "the group first > elements GROUP_GAP is the gap at the _end_ of the whole group" - the gap at > the end is the gap that comes after the last element and up to....what? > > Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we > split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1? > (IIUC, you suggest 1111 and 0111?)
As said on IRC it should be 4 1 1 1 and 4 1 1 1. > If they are disjoint sets, but overlapping blocks of memory, say 0 2 4 6 1 3 > 5 7...then do we create two load groups, with gap 0 2 2 2 and 0 2 2 2 again? > Does something record that the load groups access overlapping areas, and > record the offset against each other? No, I don't think we can split load groups that way. So I think if splitting store groups works well (with having larger load groups) then that's the way to go (even for loop vect). > If there are repeated elements (as in the BB SLP case mentioned above), I'm > not clear how we can split this effectively...so may have to rule out that > case. (Moreover, if we are considering hybrid SLP, it may not be clear what > the loop accesses are, we may be presented only with the SLP accesses. Do we > necessarily want to pull those out of a load group?) > > So I expect I may resolve some of these issues as I progress, but I'm > curious as to whether (and why) the patch was really broken (wrt gaps) as it > stood... Yes, the gaps were clearly bogously constructed in general. If you have an existing group you can only split it into non-overlapping groups. Thus for two load SLP nodes loading from 0 2 4 6 and from 1 3 5 7 you will have a single "group" (0 1 2 3 4 5 6 7) and you can at most split it as 0 1 2 3, 4 5 6 7 which won't help in this case (but would be actually worse). So I think restricting the splitting to the stores should work fine. Richard. > Thanks, > Alan >