On Wed, 24 Nov 2021, Jakub Jelinek wrote: > On Mon, Nov 22, 2021 at 08:39:42AM -0000, Roger Sayle wrote: > > This patch implements PR tree-optimization/103345 to merge adjacent > > loads when combined with addition or bitwise xor. The current code > > in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior, > > so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in > > the same way at BIT_IOR_EXPR. > > Unfortunately they aren't exactly the same. They work the same if always > at least one operand (or corresponding byte in it) is known to be 0, > 0 | 0 = 0 ^ 0 = 0 + 0 = 0. But for | also x | x = x for any other x, > so perform_symbolic_merge has been accepting either that at least one > of the bytes is 0 or that both are the same, but that is wrong for ^ > and +. > > The following patch fixes that by passing through the code of binary > operation and allowing non-zero masked1 == masked2 through only > for BIT_IOR_EXPR. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK. > Thinking more about it, perhaps we could do more for BIT_XOR_EXPR. > We could allow masked1 == masked2 case for it, but would need to > do something different than the > n->n = n1->n | n2->n; > we do on all the bytes together. > In particular, for masked1 == masked2 if masked1 != 0 (well, for 0 > both variants are the same) and masked1 != 0xff we would need to > clear corresponding n->n byte instead of setting it to the input > as x ^ x = 0 (but if we don't know what x and y are, the result is > also don't know). Now, for plus it is much harder, because not only > for non-zero operands we don't know what the result is, but it can > modify upper bytes as well. So perhaps only if current's byte > masked1 && masked2 set the resulting byte to 0xff (unknown) iff > the byte above it is 0 and 0, and set that resulting byte to 0xff too. > Also, even for | we could instead of return NULL just set the resulting > byte to 0xff if it is different, perhaps it will be masked off later on. > Ok to handle that incrementally? Not sure if it is worth the trouble - the XOR handling sounds straight forward at least. But sure, the merging routine could simply be conservatively correct here. Thanks, Richard. > 2021-11-24 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/103376 > * gimple-ssa-store-merging.c (perform_symbolic_merge): Add CODE > argument. If CODE is not BIT_IOR_EXPR, ensure that one of masked1 > or masked2 is 0. > (find_bswap_or_nop_1, find_bswap_or_nop, > imm_store_chain_info::try_coalesce_bswap): Adjust > perform_symbolic_merge callers. > > * gcc.c-torture/execute/pr103376.c: New test. > > --- gcc/gimple-ssa-store-merging.c.jj 2021-11-23 10:26:30.000000000 +0100 > +++ gcc/gimple-ssa-store-merging.c 2021-11-23 11:49:33.806168782 +0100 > @@ -434,14 +434,14 @@ find_bswap_or_nop_load (gimple *stmt, tr > return true; > } > > -/* Compute the symbolic number N representing the result of a bitwise OR on 2 > - symbolic number N1 and N2 whose source statements are respectively > - SOURCE_STMT1 and SOURCE_STMT2. */ > +/* Compute the symbolic number N representing the result of a bitwise OR, > + bitwise XOR or plus on 2 symbolic number N1 and N2 whose source statements > + are respectively SOURCE_STMT1 and SOURCE_STMT2. CODE is the operation. > */ > > gimple * > perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1, > gimple *source_stmt2, struct symbolic_number *n2, > - struct symbolic_number *n) > + struct symbolic_number *n, enum tree_code code) > { > int i, size; > uint64_t mask; > @@ -563,7 +563,9 @@ perform_symbolic_merge (gimple *source_s > > masked1 = n1->n & mask; > masked2 = n2->n & mask; > - if (masked1 && masked2 && masked1 != masked2) > + /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2 > + has to be 0, for BIT_IOR_EXPR x | x is still x. */ > + if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2)) > return NULL; > } > n->n = n1->n | n2->n; > @@ -769,7 +771,8 @@ find_bswap_or_nop_1 (gimple *stmt, struc > return NULL; > > source_stmt > - = perform_symbolic_merge (source_stmt1, &n1, source_stmt2, &n2, n); > + = perform_symbolic_merge (source_stmt1, &n1, source_stmt2, &n2, n, > + code); > > if (!source_stmt) > return NULL; > @@ -943,7 +946,8 @@ find_bswap_or_nop (gimple *stmt, struct > else if (!do_shift_rotate (LSHIFT_EXPR, &n0, eltsz)) > return NULL; > ins_stmt > - = perform_symbolic_merge (ins_stmt, &n0, source_stmt, &n1, n); > + = perform_symbolic_merge (ins_stmt, &n0, source_stmt, &n1, n, > + BIT_IOR_EXPR); > > if (!ins_stmt) > return NULL; > @@ -2881,7 +2885,7 @@ imm_store_chain_info::try_coalesce_bswap > end = MAX (end, info->bitpos + info->bitsize); > > ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt, > - &this_n, &n); > + &this_n, &n, BIT_IOR_EXPR); > if (ins_stmt == NULL) > return false; > } > --- gcc/testsuite/gcc.c-torture/execute/pr103376.c.jj 2021-11-23 > 12:03:38.339948150 +0100 > +++ gcc/testsuite/gcc.c-torture/execute/pr103376.c 2021-11-23 > 12:02:44.668723595 +0100 > @@ -0,0 +1,29 @@ > +/* PR tree-optimization/103376 */ > + > +long long a = 0x123456789abcdef0LL, f; > +int b, c, *d; > + > +__attribute__((noipa)) void > +foo (int x) > +{ > + asm volatile ("" : : "r" (x)); > +} > + > +int > +main () > +{ > + long long e; > + e = a; > + if (b) > + { > + foo (c); > + d = (int *) 0; > + while (*d) > + ; > + } > + f = a ^ e; > + asm volatile ("" : "+m" (f)); > + if (f != 0) > + __builtin_abort (); > + return 0; > +} > > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)