> Am 01.03.2025 um 15:24 schrieb Martin Uecker <uec...@tugraz.at>:
>
> Am Samstag, dem 01.03.2025 um 16:52 +0300 schrieb Alexander Monakov:
>>> On Sat, 1 Mar 2025, Martin Uecker via Gcc wrote:
>>>
>>> Sorry for being a bit slow. This is still not clear to me.
>>>
>>> In vect/pr65206.c the following loop can be vectorized
>>> with -fallow-store-data-races.
>>>
>>> #define N 1024
>>>
>>> double a[N], b[N];
>>>
>>> void foo ()
>>> {
>>> for (int i = 0; i < N; ++i)
>>> if (b[i] < 3.)
>>> a[i] += b[i];
>>> }
>>>
>>> But even though this invents stores these would still seem
>>> to be allowed by C11 because they are not detectable to
>>> another reader because they write back the same value as
>>> read.
>>>
>>> Is this understanding correct?
>>
>> No. Imagine another thread running a similar loop, but with
>> the opposite condition in the 'if' (b[i] > 3). Then there are
>> no data races in the abstract machine, but vectorized code
>> may end up with wrong result.
>>
>> (suppose all odd elements in 'b' are less than three, all even
>> are more than three, and the two threads proceed through the
>> loop in lockstep; then half of updates to 'a' are lost)
>
> Ah, right. Thank you! So this example would violate C11.
>
>>
>>> Does -fallow-store-data-races also create other data races
>>> which would not be allowed by C11?
>>
>> Yes, for instance it allows an unsafe form of store motion where
>> a conditional store is moved out of the loop and made unconditional:
>>
>> int position;
>>
>> void f(int n, float *a)
>> {
>> for (int i = 0; i < n; i++)
>> if (a[i] < 0)
>> position = i;
>> }
>
> Now I wonder why this is not allowed in C11. This
> is transformed to:
>
> void f(int n, float *a)
> {
> int tmp = position;
> for (int i = 0; i < n; i++)
> if (a[i] < 0)
> tmp = i;
> position = tmp;
> }
>
> So if there is a write, no other thread can read or write
> position. Thus, the interesting case is when there should
> be no write at all, i.e.
>
> void f(int n, float *a)
> {
> int tmp = position;
> position = tmp;
> }
>
> and this could undo the store from another thread, so violates
> the C11 memory model.
>
>>
>>> What was the reason to disallow those optimizations that
>>> could be allowed by default?
>>
>> -fallow-store-data-races guards transforms that we know to be incorrect
>> for source code that may become a part of a multithreaded program.
>> Are you asking about something else?
>
> I am looking for an example where stores are invented but this
> is allowed by C11. So something such as
>
> if (x != 1)
> x = 1;
>
> being transformed to
>
> x = 1;
GCC also does this, but in two steps.
> this should not harm another reader for x == 1 and if
> there is a writer it is UB anyway, so this should be ok. Does GCC
> do such things? And is then then also guarded by the
> -falloc-store-data-races flag or not (because it is allowed in C11)?
And it’s guarded the same
>
> Martin
>