92539] Improve code and avoid Warray-bounds false positive

Qing Zhao Thu, 09 Jan 2025 09:50:55 -0800


> On Jan 9, 2025, at 03:17, Sam James <s...@gentoo.org> wrote:
> 
> Richard Biener <richard.guent...@gmail.com> writes:
> 
>> On Wed, Jan 8, 2025 at 5:34 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Jan 7, 2025, at 07:29, Richard Biener <richard.guent...@gmail.com> 
>>>> wrote:
>>>> 
>>>> On Mon, Jan 6, 2025 at 5:40 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 6, 2025, at 11:01, Richard Biener <richard.guent...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jan 6, 2025, at 09:21, Jeff Law <jeffreya...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 1/6/25 7:11 AM, Qing Zhao wrote:
>>>>>>>>>> 
>>>>>>>>>> Given it doesn't cause user visible UB, we could insert the
>>>>>>>>>> trap *before* the UB inducing statement.  That would then
>>>>>>>>>> make the statement unreachable and it'd get removed avoiding
>>>>>>>>>> the false positive diagnostic.
>>>>>>>>> Yes, that’s a good idea.
>>>>>>>>> However, in order to distinguish a user visible UB and a UB in the IL 
>>>>>>>>> that is introduced purely by compiler, we might need some new marking 
>>>>>>>>> in the IR?
>>>>>>>> I don't think we've ever really tackled that question; the
>>>>>>>> closest I can think of would be things like integer overflow
>>>>>>>> which we try to avoid allowing the compiler to introduce.  If
>>>>>>>> we take the integer overflow as the model, then that would say
>>>>>>>> we should be tackling this during loop unrolling.
>>>>>>> 
>>>>>>> UB that is introduced by compiler transformation is one important cause 
>>>>>>> of false positive warnings.
>>>>>>> 
>>>>>>> There are two approaches to tackle this problem from my understanding:
>>>>>>> 
>>>>>>> 1. Avoid generating such UB from the beginning. i.e, for every compiler 
>>>>>>> transformation that might introduce such UB, we should add check to 
>>>>>>> avoid generating it.
>>>>>>> 
>>>>>>> 2. Marking the IR portion that were generated by compiler 
>>>>>>> transformations, then check whether the UB is compiler generated when 
>>>>>>> issue static checker warnings.
>>>>>>> 
>>>>>>> Are there other approaches?
>>>>>> 
>>>>>> Note unrolling doesn't introduce UB - it makes conditional UB
>>>>>> "obvious”.
>>>>> 
>>>>> So, you mean this is the same issue as PR109071 (and PR85788,
>>>>> PR88771, etc), i.e, the compiler optimization make the
>>>>> conditional UB that’s originally in the source code “obvious”
>>>>> after code duplication?
>>>>> 
>>>>> (I need to study the testing case in PR92539 more carefully to make sure 
>>>>> this is the case...)
>>>>> 
>>>>> If so, then the claimed false positive warning in PR92539
>>>>> actually is a real bug in the original source code, and my patch
>>>>> that introduced the new option “--fdiagnostics-details” should
>>>>> also include loop unrolling to provide more details on the
>>>>> warning introduced by loop unrolling.
>>>>> 
>>>>> 
>>>>>> Note -Warray-bounds wants to
>>>>>> diagnose UB, so doing path isolation and removing the UB would make
>>>>>> -Warray-bounds useless.
>>>>>> 
>>>>>> So unless the condition guarding the UB unrolling exposes is visibly
>>>>>> false to the compiler but we fail
>>>>>> to exploit that (missed optimization) there's not much that we can do.
>>>>>> I think "folding" away the UB
>>>>>> like what Jeff proposes trades false negatives for the false positive
>>>>>> diagnostics.
>>>>>> 
>>>>>> Note the unroller knows UB that effectively bounds the number of
>>>>>> iterations, even on conditional
>>>>>> paths and it uses this to limit the number of copies _and_ to prune
>>>>>> unreachable paths (exploiting
>>>>>> UB, avoiding diagnostics).  But one of the limitations is that it only
>>>>>> prunes paths in the last unrolled
>>>>>> copy which can be insufficient (ISTR some PR where I noticed this).
>>>>>> 
>>>>>> That said - I think for these unroller exposed cases of apparent false
>>>>>> positives we should improve
>>>>>> the path pruning in the unroller itself.  For the other cases the path
>>>>>> diagnostic might help clarify
>>>>>> that the UB happens on the 'n-th' iteration of the loop when some
>>>>>> additional condition is true/false.
>>>>> 
>>>>> So, the “other cases” refer to the situation similar as PR109071, i.e, 
>>>>> “conditional UB” in the original source code is made obvious after loop 
>>>>> unrolling?
>>>>> Yes, for such cases, the new option I have been trying to add, 
>>>>> “-fdiagnostic-details” should be able to track and provide more details 
>>>>> on the conditions that lead to the UB.
>>>>> Is this understanding correct?
>>>> 
>>>> I think so, but I didn't look into the testcase of the referenced PR.
>>> 
>>> I took a detailed study of the test case of PR92539 yesterday.  The 
>>> following is a brief summary:
>>> 
>>> 1. The pass that caused the issue is: cunrolli.
>>>     Adding -fdisable-tree-cunrolli eliminate the false positive warnings.
>>> 
>>> 2. The IR Before cunrolli:
>>> 
>>> const char *local_iterator = beginning address of string "aa";
>>> const char *last = last address of string "aa";
>>> 
>>> for (int i = 0; i < 3; ++i)
>>>  if (local_iterator != last)           // pointer comparison 1
>>>    {
>>>      local_iterator++;
>>>      if (local_iterator != last)       // pointer comparison 2
>>>        local_iterator++;
>>>    }
>>> 
>>> I think that the IR has NO UB at this moment;
>>> (The  pointer comparison 2 in the 2nd iteration of “I” loop and
>>> both pointer comparison in the 3rd iteration of “I” loop will
>>> NOT execute at all)
>>> 
>>> **After cunrolli (fully unrolling of the above i loop):
>>> 
>>> const char *local_iterator = beginning address of string "aa";
>>> const char *last = last address of string "aa";
>>> int i = 0;
>>> 
>>> if (local_iterator != last)                     // pointer comparison 1
>>>  {
>>>    local_iterator++;
>>>    if (local_iterator != last)         // pointer comparison 2
>>>      {
>>>        local_iterator++;
>>>        i++;
>>>        if (local_iterator != last)             // pointer comparison 3
>>>          {
>>>            local_iterator++;
>>>            if (local_iterator != last) // pointer comparison 4
>>>              {
>>>                local_iterator++;
>>>                i++;
>>>                if (local_iterator != last)     // pointer comparison 5
>>>                  {
>>>                    local_iterator++;
>>>                    if (local_iterator != last) // pointer comparison 6
>>>                      {
>>>                        local_iterator++;
>>>                        i++;
>>>                      }
>>>                  }
>>>              }
>>>          }
>>>      }
>>>  }
>>> 
>>> Also, I think that the IR has NO UB at this moment too.
>>> (The pointer comparison 4 and later will NOT execute at all)
>>> 
>>> ** The false positive warnings claimed for pointer comparison 4, 5, and 6
>>> assuming that these comparison will be executed during runtime but actually
>>> they will not be executed at all.
>>> 
>>> 
>>> So, based on the above study, my impression is:
>>> 
>>> A.  Unrolling transformation didn’t do anything wrong.
>>> B.  Only after const propagation, the compiler will know that the pointer 
>>> “local_iterator” will
>>>      point to an out-of-bound position of string “aa”, the pointer 
>>> comparison 4, 5, 6 is invalid
>>>      After that.
>>> 
>>> So, I guess that Jeff’s patch to identify the invalid pointer address at 
>>> “vrp1” is reasonable.
>>> 
>>> What’s your opinion on this?
>> 
>> Your analysis is correct - the long-standing design issue is that 
>> -Warray-bounds
>> does not distinguish conditional UB from unconditional UB (but that concept 
>> is
>> somewhat wishful thinking given a function isn't guaranteed to be invoked 
>> either
>> unless it is main()).  The other thing is that while -Warray-bounds
>> without unrolling
>> likely sees the value-range of i_1 for &a[i_1] includes out-of-bound values 
>> it
>> chooses to only diagnose cases where the whole value-range is out-of-bound
>> (correctly so to avoid false positives just because of conservative ranges).
>> 
>> I think Jeff's patch is not reasonable since it boils down to not diagnose
>> -Warray-bounds but instead remove those stmts.
>> 
>> The problem in this particular testcase is that we fail to optimize the
>> compares - not because we don't exploit the visible UB but because
>> we fail to compute the final value of 'last' from
>> 
>>    auto last = "aa";
>>    while (*last)
>>        ++last;
>> 
>> one way to attack this is to recognize this pattern in niter analysis
>> and record niter as strlen (last) (but it would be a "first" to have
>> a niter pattern with a memory operation), another would be to
>> handle this in loop_niter_by_eval and allow us to take advantage
>> of the constant &'aa' initial value (that would only scale so much).
>> 
>>> 
>>>> Note the unroller explicitly tries to
>>>> prune those "bugs" given it happily exploits UB when computing the
>>>> number of iterations of a loop,
>>>> like for
>>>> 
>>>> int a[6];
>>>> for (int i = 0; i < n; ++i)
>>>> a[i] = 1;
>>>> 
>>>> it knows the loop can at most iterate 5 times because a[i] = 1 with i
>>>> == 6 would invoke UB.
>>> 
>>> So, the unroller has some heuristic to limit the potential UB invoked by 
>>> unrolling? Could you point me the corresponding routine for such heuristic 
>>> if convenient?
>> 
>> It cuts off those via
>> tree-ssa-loop-ivcanon.cc:remove_exits_and_undefined_stmts, but this is
>> invoked only for the last copy,
>> and it only handles stmts that we recorded bounds for via
>> infer_loop_bounds_from_undefined.
>> 
>> It's a bit difficult to extend this to non-last copies, we'd have to
>> incrementally copy and we'll possibly disrupt the
>> batching of transforms, causing compile-time issues.  But I haven't
>> had time to investigate.
>> 
>> To summarize the following things should be done to improve things in
>> this and similar situations:
>>  a) improve niter analysis / final value replacement to constant fold
>> 'last' in such cases
>>  b) improve unroll to cut out UB stmts of earlier iterations than the
>> last (size estimates also do not
>>      consider this, so it's also a missed optimization)
>>  c) better explain diagnostics
> 
> Maybe even with -fdiagnostics-details in 15 as a late Christmas present?
> :)
> 
> The only issues I've found when using Qing's patch are cases I need to
> reduce where we may be able to diagnose more,
Is there any example here? Some other compiler optimizations that 
are not included currently?  (For example, loop unrolling?)


thanks.

Qing


> but nothing else. Kees is
> using it for kernel issues already too.

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

Reply via email to