On November 18, 2017 11:20:42 AM GMT+01:00, Marc Glisse <marc.gli...@inria.fr> wrote: >On Fri, 17 Nov 2017, Richard Biener wrote: > >> On Sat, Nov 11, 2017 at 12:44 AM, Marc Glisse <marc.gli...@inria.fr> >wrote: >> >>> The exact undefined-behavior status should probably be clarified >more. >>> First, I'd like to say that POINTER_DIFF_EXPR may only take 2 >pointers into >>> the same "object" (like in C/C++), so they differ by at most half >the size >>> of the address space, and a-b is not allowed to be the minimum >negative >>> value (so I can safely use b-a), and if we compute a-b and b-c, then >a and c >>> are in the same object so I can safely compute a-c, etc. Second, we >probably >>> need modes without undefined behavior, wrapping with either -fwrapv >or a new >>> -fwrapp, or sanitized. For the sanitized version, we could keep >using >>> POINTER_DIFF_EXPR and check TYPE_OVERFLOW_SANITIZED on the pointer >type as >>> we currently do for integers. For wrapping, either we say that >>> POINTER_DIFF_EXPR has wrapping semantics when that option is in >effect, or >>> we do not use POINTER_DIFF_EXPR and instead cast to unsigned before >applying >>> a MINUS_EXPR (my preference). >> >> CCing C/C++ FE folks as well. >> >> + /* It is better if the caller provides the return type. */ >> + if (code == POINTER_DIFF_EXPR) >> + { >> + offset_int res = wi::sub (wi::to_offset (arg1), >> + wi::to_offset (arg2)); >> + return force_fit_type (signed_type_for (TREE_TYPE (arg1)), >res, 1, >> + TREE_OVERFLOW (arg1) | TREE_OVERFLOW >(arg2)); >> + } >> + >> >> there's an overload that provides the type... (we should probably go >> over all callers >> and use that). There is already a folding that is only done when the >type is >> available. So I'd simply remove the above from the non-type overload >handling. >> What breaks then? > >I haven't checked yet. On the other hand, now that I require the same >precision for pointers and their difference, using signed_type_for >should >be safe enough (only issue is if a front-end is unhappy that we changed > >the exact type for another 'equivalent' one). But I'll try removing the > >version with signed_type_for. > >> With the patch it's of course somewhat ugly in that we need to deal >with both >> MINUS_EXPR on pointers and POINTER_DIFF_EXPR - at least that's what >> >> + case POINTER_DIFF_EXPR: >> case MINUS_EXPR: >> + /* Fold &a[i] - &a[j] to i-j. */ >> + if (TREE_CODE (arg0) == ADDR_EXPR >> + && TREE_CODE (TREE_OPERAND (arg0, 0)) == ARRAY_REF >> + && TREE_CODE (arg1) == ADDR_EXPR >> + && TREE_CODE (TREE_OPERAND (arg1, 0)) == ARRAY_REF) >> + { >> + tree tem = fold_addr_of_array_ref_difference (loc, type, >> + TREE_OPERAND >(arg0, 0), >> + TREE_OPERAND >(arg1, 0), >> + code >> + == >POINTER_DIFF_EXPR); >> >> suggests. But is that really so? The FEs today should never build >> a MINUS_EXPR with pointer operands and your patch also doesn't. >> I suppose we only arrive above because STRIP_NOPS strips the >> pointer to integer conversion? > >I think so. And since I only patched 2 front-ends, others probably >still >generate casts to integers and MINUS_EXPR and can benefit from this >transformation. Also, if we implement -fwrapp by casting to unsigned >and >doing MINUS_EXPR, that will remain useful. > >> + case POINTER_DIFF_EXPR: >> + if (!POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 0))) >> + || !POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 1)))) >> + { >> + error ("invalid operand to pointer diff, operand is not a >pointer"); >> + return t; >> + } >> + CHECK_OP (0, "invalid operand to pointer diff"); >> + CHECK_OP (1, "invalid operand to pointer diff"); >> + break; >> >> can you add >> || TREE_CODE (TREE_TYPE (t)) != INTEGER_TYPE >> || TYPE_UNSIGNED (TREE_TYPE (t)) > >I wasn't sure how much to duplicate between here and below. Added. > >> ? Note that if the precision of the pointer type arguments are not >equal to the >> precision of the return value then foldings like >> >> (simplify >> + (plus:c (pointer_diff @0 @1) (pointer_diff @2 @0)) >> + (if (TYPE_OVERFLOW_UNDEFINED (type) >> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))) >> + (pointer_diff @2 @1))) >> >> might not be correct as we drop a possible truncation here. Shall we > >I simplified the transformations after deciding to require equal >precision. > >> require (and document) that the result of the pointer difference in >> a POINTER_DIFF_EXPR has to be representable in the type of the >> expression, otherwise >> the behavior is undefined? Shall we allow an unsigned return type >for >> differences known to be representable? Probably not... Does the >behavior >> depend on TYPE_OVERFLOW_UNDEFINED? It would be nice to amend >> the generic.texi docs somewhat here. > >I was more precise in tree.def, copying details over to generic.texi >now. >But the overflow behavior is something I was asking about above (see >citation at the beginning of this email). I am tempted to say that >overflow >is always undefined for POINTER_DIFF_EXPR and we should use MINUS_EXPR >on >unsigned integers when we want wrapping, but if people prefer to use >POINTER_DIFF_EXPR in all cases and change its behavior depending on >TYPE_OVERFLOW_UNDEFINED that would be ok with me. I probably wasn't >consistent in match.pd and will need to tweak a few details once this >is >decided.
Ah. Indeed using unsigned MINUS_EXPR sounds like a good idea - also for canonicalization. Likewise for POUNTER_PLUS_EXPR vs. PLUS_EXPR. >> Ah, the gimple checking adds some more bits here that clarify things: >> >> + case POINTER_DIFF_EXPR: >> + { >> + if (!POINTER_TYPE_P (rhs1_type) >> + || !POINTER_TYPE_P (rhs2_type) >> + || !types_compatible_p (rhs1_type, rhs2_type) >> + || TREE_CODE (lhs_type) != INTEGER_TYPE >> + || TYPE_UNSIGNED (lhs_type) >> + || TYPE_PRECISION (lhs_type) != TYPE_PRECISION >(rhs1_type)) >> + { >> + error ("type mismatch in pointer diff expression"); >> + debug_generic_stmt (lhs_type); >> + debug_generic_stmt (rhs1_type); >> + debug_generic_stmt (rhs2_type); >> + return true; >> + } >> >> >> I think the patch is ok for trunk, the FE bits should get approval >> from their maintainers. >> Joseph may have an idea about the address-space issue. >> >> With this in place we should be able to fix some of the issues Jakub >ran into >> with pointer sanitizing? > >Some of them should be fixed without doing anything: pointer >differences >are not using MINUS_EXPR anymore in C/C++, so they are not sanitized >and >don't produce spurious errors when crossing the middle of the address >space. On the other hand, we may want to sanitize POINTER_DIFF_EXPR >specifically. > >> Also with this in place it would be nice to do the corresponding >adjustment to >> POINTER_PLUS_EXPR, that is, require a _signed_ offset type that >matches >> the precision of the other argument. > >Maybe next stage1... Too late for this year. Agreed. Bin had patches to do this IIRC. Richard.