On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge <tho...@codesourcery.com> wrote: > > Hi! > > On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > When solving 0 = _15 & 1, we calculate _15 as: > > > > [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe > > > > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with > > the above, yielding: > > > > [0, 1] NONZERO 0x0 > > > > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0. > > > > This is problematic because here we have a bool which is zero, but > > returns false for irange::zero_p, since the latter does not look at > > nonzero bits. This causes logical_combine to assume the range is > > not-zero, and all hell breaks loose. > > > > I think we should just normalize a nonzero mask of 0 to [0, 0] at > > creation, thus avoiding all this. > > 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 > "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx > offloading test case: > > UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 > PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 > (test for excess errors) > PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 > execution test > [-PASS:-]{+FAIL:+} > libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 > scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* > [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}" > > Same for C++. > > I'll later send a patch (for the test case!) to fix that up. > > 2. Looking into this, I found that this > commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 > "[PR107195] Set range to zero when nonzero mask is 0" actually enables a > code transformation/optimization that GCC apparently has not been doing > before! I've tried to capture that in the attached > "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".
Nice. > > Will you please verify that one? In its current '#if 1' configuration, > it's all-PASS after commit > r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 > "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we > get two calls to 'foo', because GCC apparently didnn't understand the > relation (optimization opportunity) between 'r *= 2;' and the subsequent > 'if (r & 1)'. Yeah, that looks correct. We keep better track of nonzero masks. > > I've left in the other '#if' variants in case you'd like to experiment > with these, but would otherwise clean that up before pushing. > > Where does one put such a test case? > > Should the file be named 'pr107195' or something else? The aforementioned patch already has: * gcc.dg/tree-ssa/pr107195-1.c: New test. * gcc.dg/tree-ssa/pr107195-2.c: New test. So I would just add a pr107195-3.c test. > > Do we scan 'optimized', or an earlier dump? > > At '-O1', the actual code transformation is visible already in the 'dom2' > dump: > > <bb 3> [local count: 536870913]: > gimple_assign <mult_expr, r_7, r_6(D), 2, NULL> > + gimple_assign <bit_and_expr, _11, r_7, 1, NULL> > + goto <bb 6>; [100.00%] > > - <bb 4> [local count: 1073741824]: > - # gimple_phi <r_4, r_6(D)(2), r_7(3)> > + <bb 4> [local count: 536870912]: > + # gimple_phi <r_4, r_6(D)(2)> > gimple_assign <bit_and_expr, _2, r_4, 1, NULL> > gimple_cond <ne_expr, _2, 0, NULL, NULL> > - goto <bb 5>; [50.00%] > + goto <bb 5>; [100.00%] > else > - goto <bb 6>; [50.00%] > + goto <bb 6>; [0.00%] > > <bb 5> [local count: 536870913]: > gimple_call <foo, _3, r_4> > gimple_assign <plus_expr, r_8, _3, r_4, NULL> > > <bb 6> [local count: 1073741824]: > - # gimple_phi <r_5, r_4(4), r_8(5)> > + # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)> > gimple_return <r_5> > > And, the actual "avoid second call 'foo'" optimization is visiable > starting 'dom3': > > <bb 3> [local count: 536870913]: > gimple_assign <mult_expr, r_7, r_6(D), 2, NULL> > + goto <bb 6>; [100.00%] > > - <bb 4> [local count: 1073741824]: > - # gimple_phi <r_4, r_6(D)(2), r_7(3)> > - gimple_assign <bit_and_expr, _2, r_4, 1, NULL> > + <bb 4> [local count: 536870912]: > + gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL> > gimple_cond <ne_expr, _2, 0, NULL, NULL> > - goto <bb 5>; [50.00%] > + goto <bb 5>; [100.00%] > else > - goto <bb 6>; [50.00%] > + goto <bb 6>; [0.00%] > > <bb 5> [local count: 536870913]: > - gimple_call <foo, _3, r_4> > - gimple_assign <plus_expr, r_8, _3, r_4, NULL> > + gimple_assign <integer_cst, _3, 0, NULL, NULL> > + gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL> > > <bb 6> [local count: 1073741824]: > - # gimple_phi <r_5, r_4(4), r_8(5)> > + # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)> > gimple_return <r_5> > > ..., but I don't know if either of those would be stable/appropriate to > scan instead of 'optimized'? IMO, either dom3 or optimized is fine. Thanks. Aldy