Re: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

Richard Biener Thu, 05 Mar 2015 07:57:38 -0800

On Thu, Mar 5, 2015 at 4:05 PM, Bernd Edlinger
<bernd.edlin...@hotmail.de> wrote:
> Hi,
>
> On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote:
>>
>> On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger
>> <bernd.edlin...@hotmail.de> wrote:
>>> Hi,
>>>
>>> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote:
>>>>
>>>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
>>> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure.
>>>
>>> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode)
>>> Because the strict volatile bitfields handling will inevitably try to use
>>> the fieldmode to access the memory.
>>>
>>> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode),
>>> because it is easier to explain when some one asks, when we guarantee the 
>>> semantics
>>> of strict volatile bitfield?
>>
>> But on non-strict-align targets you can even for 1-byte aligned MEMs
>> access an SImode field directly. So the old code looks correct to me
>> here and the fix needs to be done somewhere else.
>>
>
> But this SImode access is split up in several QImode or HImode accesses,
> in the processor's execution pipeline, finally on an external bus like AXI 
> all memory
> transactions are aligned.  The difference is just that some processors can 
> split up
> the unaligned accesses and some need help from the compiler, but even on an
> x86 we have a different semantics for unaligned acceses, that is they are no 
> longer atomic,
> while an aligned access is always executed as an atomic transaction on an x86 
> processor.
>
>>> Probably there is already something in the logic in expr.c that prevents 
>>> these cases,
>>> because otherwise it would be way to easy to find an example for unaligned 
>>> accesses
>>> to unaligned memory on STRICT_ALIGNMENT targets.
>>>
>>>
>>> Ok, what would you think about this variant?
>>>
>>> --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100
>>> +++ expmed.c 2015-03-05 11:50:09.400444416 +0100
>>> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns
>>> return false;
>>>
>>> /* Check for cases of unaligned fields that must be split. */
>>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize
>>> - || (STRICT_ALIGNMENT
>>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
>>> + if (bitnum % modesize + bitsize> modesize)
>>> + return false;
>>> +
>>> + /* Check that the memory is sufficiently aligned. */
>>> + if (MEM_ALIGN (op0) < modesize)
>>
>> I think that only applies to strict-align targets and then only for
>> GET_MODE_ALIGNMENT (modesize). And of course what matters
>> is the alignment at bitnum - even though op0 may be not sufficiently
>> aligned it may have known misalignment so that op0 + bitnum is
>> sufficiently aligned.
>>
>> Testcases would need to annotate structs with packed/aligned attributes
>> to get at these cases.
>>
>> For the testcase included in the patch, what does the patch end up doing?
>> Not going the strict-volatile bitfield expansion path? That looks unnecessary
>> on !strict-alignment targets but resonable on strict-align targets where the
>> access would need to be splitted. So, why does it end up being splitted
>> on !strict-align targets?
>>
>
> gcc -fstrict-volatile-bitfields -S 20150304-1.c
> without patch we find this in 20150304-1.s:
>
> main:
> .LFB0:
>         .cfi_startproc
>         pushq   %rbp
>         .cfi_def_cfa_offset 16
>         .cfi_offset 6, -16
>         movq    %rsp, %rbp
>         .cfi_def_cfa_register 6
>         movzbl  global+1(%rip), %eax
>         orl     $-1, %eax
>         movb    %al, global+1(%rip)
>         movzbl  global+2(%rip), %eax
>         orl     $-1, %eax
>         movb    %al, global+2(%rip)
>         movzbl  global+3(%rip), %eax
>         orl     $-1, %eax
>         movb    %al, global+3(%rip)
>         movzbl  global+4(%rip), %eax
>         orl     $127, %eax
>         movb    %al, global+4(%rip)
>         movl    global(%rip), %eax
>         shrl    $8, %eax
>         andl    $2147483647, %eax
>         cmpl    $2147483647, %eax
>         je      .L2
>         call    abort
> .L2:
>         movl    $0, %eax
>         popq    %rbp
>         .cfi_def_cfa 7, 8
>         ret
>         .cfi_endproc
>
>
> so the write path is OK, because strict_volatile_bitfield_p returns false,
> because
>
>   /* Check for cases where the C++ memory model applies.  */
>   if (bitregion_end != 0
>       && (bitnum - bitnum % modesize < bitregion_start
>           || bitnum - bitnum % modesize + modesize - 1> bitregion_end))
>     return false;
>
>
> bitregion_start=8, bitregion_end=39
> bitnum - bitnum % modesize = 0
> bitnum - bitnum % modesize + modesize - 1 = 31
>
>
> this does not happen in the read path, and the problem is the access
> here does not work:
>
>           str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum,
>                                           &bitnum);
>           /* Explicitly override the C/C++ memory model; ignore the
>              bit range so that we can do the access in the mode mandated
>              by -fstrict-volatile-bitfields instead.  */
>           store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value);
>
>
> str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1
> can not handle that.
>
> BTW: I can make the write code path also fail if I change this bit
> in the test case add ":8" to char x:
>
> struct s
> {
>   char x:8;
>   unsigned int y:31;
> } __attribute__((packed));
>
> now the bit region is from 0..39, and we get this code:
>
> main:
> .LFB0:
>         .cfi_startproc
>         pushq   %rbp
>         .cfi_def_cfa_offset 16
>         .cfi_offset 6, -16
>         movq    %rsp, %rbp
>         .cfi_def_cfa_register 6
>         movl    global(%rip), %eax
>         orl     $-256, %eax
>         movl    %eax, global(%rip)
>         movl    global(%rip), %eax
>         shrl    $8, %eax
>         andl    $2147483647, %eax
>         cmpl    $2147483647, %eax
>         je      .L2
>         call    abort
> .L2:
>         movl    $0, %eax
>         popq    %rbp
>         .cfi_def_cfa 7, 8
>         ret
>         .cfi_endproc
>
>
>
> The patch (I think, I would still prefer my initial proposal) fixes
> this by skipping the strict alingment code path.  That looks OK for me,
> because the structure is always packed if that happens, and it can't
> be used to access "device registers", which are by definition always
> naturally aligned, at least to the machine word size.
>
>
> with the patch we get:
>
> main:
> .LFB0:
>         .cfi_startproc
>         pushq   %rbp
>         .cfi_def_cfa_offset 16
>         .cfi_offset 6, -16
>         movq    %rsp, %rbp
>         .cfi_def_cfa_register 6
>         movzbl  global+1(%rip), %eax
>         orl     $-1, %eax
>         movb    %al, global+1(%rip)
>         movzbl  global+2(%rip), %eax
>         orl     $-1, %eax
>         movb    %al, global+2(%rip)
>         movzbl  global+3(%rip), %eax
>         orl     $-1, %eax
>         movb    %al, global+3(%rip)
>         movzbl  global+4(%rip), %eax
>         orl     $127, %eax
>         movb    %al, global+4(%rip)
>         movzbl  global+1(%rip), %eax
>         movzbl  %al, %eax
>         movzbl  global+2(%rip), %edx
>         movzbl  %dl, %edx
>         salq    $8, %rdx
>         orq     %rax, %rdx
>         movzbl  global+3(%rip), %eax
>         movzbl  %al, %eax
>         salq    $16, %rax
>         orq     %rax, %rdx
>         movzbl  global+4(%rip), %eax
>         movzbl  %al, %eax
>         andl    $127, %eax
>         salq    $24, %rax
>         orq     %rdx, %rax
>         cmpl    $2147483647, %eax
>         je      .L2
>         call    abort
> .L2:
>         movl    $0, %eax
>         popq    %rbp
>         .cfi_def_cfa 7, 8
>         ret
>         .cfi_endproc
>
>
> every access is split in 4 QImode accesses. but that is as
> expected, because the structure is byte aligned.


No, it is not expected because the CPU can handle unaligned SImode
reads/writes just fine (even if not as an atomic operation).
The C++ memory model allows an SImode read to s.y (-fstrict-volatile-bitfields
would, as well, but the CPU doesn't guarantee atomic operation here)

Richard.

>
> Thanks
> Bernd.
>

Re: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

Reply via email to