Richard, Thanks for the review ... comments below. On Tue, 2016-11-08 at 13:36 +0100, Richard Biener wrote: > On Tue, Nov 1, 2016 at 11:29 PM, Aaron Sawdey > <acsaw...@linux.vnet.ibm.com> wrote: > > > > This patch adds code to expand_builtin_strncmp so it also attempts > > expansion via cmpstrnsi in the case where c_strlen() returns NULL > > for > > both string arguments, meaning that neither one is a constant. > > @@ -3929,61 +3929,85 @@ > unsigned int arg1_align = get_pointer_alignment (arg1) / > BITS_PER_UNIT; > unsigned int arg2_align = get_pointer_alignment (arg2) / > BITS_PER_UNIT; > > + /* If we don't have POINTER_TYPE, call the function. */ > + if (arg1_align == 0 || arg2_align == 0) > + return NULL_RTX; > + > > hm? we cann validate_arglist at the beginning... > > + if (!len1 && !len2) > + { > + /* If neither arg1 nor arg2 are constant strings. */ > + /* Stabilize the arguments in case gen_cmpstrnsi fails. */ > + arg1 = builtin_save_expr (arg1); > + arg2 = builtin_save_expr (arg2); > > we no longer need the stabilization dance since we expand from > GIMPLE. > > + /* Use save_expr here because cmpstrnsi may modify arg3 > + and builtin_save_expr() doesn't force the save to > happen. */ > + len = save_expr (arg3); > + len = fold_convert_loc (loc, sizetype, len); > > cmpstrnsi may certainly not modify trees in-place. If it does it > needs to be fixed.
I needed this to get past a bootstrap failure on i386 where the cmpstrnsi pattern was modifying cx which also contained a length used elsewhere. The pattern does this: count = operands[3]; countreg = ix86_zero_extend_to_Pmode (count); but does not clobber operand 3 even though it uses it as the count for the repz cmpsb. I wonder if this is the source of that problem? > > + /* If both arguments have side effects, we cannot > optimize. */ > + if (TREE_SIDE_EFFECTS (len)) > + return NULL_RTX; > > this can't happen anymore > > btw, due to the re-indention a context diff would be _much_ easier to > review. I assume you mean a diff that ignores whitespace -- I'll make sure to do that. > > So the real "meat" of your change is > > /* If both arguments have side effects, we cannot optimize. */ > - if (!len || TREE_SIDE_EFFECTS (len)) > + if (TREE_SIDE_EFFECTS (len)) > return NULL_RTX; > > and falling back to arg3 as len if !len1 && !len2. > > Plus > > + /* Set MEM_SIZE as appropriate. This will only happen in > + the case where incoming arg3 is constant and arg1/arg2 are > not. */ > + if (CONST_INT_P (arg3_rtx)) > + { > + set_mem_size (arg1_rtx, INTVAL (arg3_rtx)); > + set_mem_size (arg2_rtx, INTVAL (arg3_rtx)); > + } > > where I don't really see why you need it or how it is even correct > (arg1 might > terminate with a '\0' before arg3). > > It would be nice to simplify the patch to simply do > > if (!len1 && !len2) > len = arg3; > else if (!len1) > ... I'll see if I can simplify it to do just this and also remove the unnecessary stuff you've flagged. Thanks, Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain