On Thu, Mar 5, 2015 at 4:05 PM, Bernd Edlinger <bernd.edlin...@hotmail.de> wrote: > Hi, > > On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote: >> >> On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger >> <bernd.edlin...@hotmail.de> wrote: >>> Hi, >>> >>> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote: >>>> >>>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger >>> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure. >>> >>> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode) >>> Because the strict volatile bitfields handling will inevitably try to use >>> the fieldmode to access the memory. >>> >>> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode), >>> because it is easier to explain when some one asks, when we guarantee the >>> semantics >>> of strict volatile bitfield? >> >> But on non-strict-align targets you can even for 1-byte aligned MEMs >> access an SImode field directly. So the old code looks correct to me >> here and the fix needs to be done somewhere else. >> > > But this SImode access is split up in several QImode or HImode accesses, > in the processor's execution pipeline, finally on an external bus like AXI > all memory > transactions are aligned. The difference is just that some processors can > split up > the unaligned accesses and some need help from the compiler, but even on an > x86 we have a different semantics for unaligned acceses, that is they are no > longer atomic, > while an aligned access is always executed as an atomic transaction on an x86 > processor. > >>> Probably there is already something in the logic in expr.c that prevents >>> these cases, >>> because otherwise it would be way to easy to find an example for unaligned >>> accesses >>> to unaligned memory on STRICT_ALIGNMENT targets. >>> >>> >>> Ok, what would you think about this variant? >>> >>> --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100 >>> +++ expmed.c 2015-03-05 11:50:09.400444416 +0100 >>> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns >>> return false; >>> >>> /* Check for cases of unaligned fields that must be split. */ >>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize >>> - || (STRICT_ALIGNMENT >>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize)) >>> + if (bitnum % modesize + bitsize> modesize) >>> + return false; >>> + >>> + /* Check that the memory is sufficiently aligned. */ >>> + if (MEM_ALIGN (op0) < modesize) >> >> I think that only applies to strict-align targets and then only for >> GET_MODE_ALIGNMENT (modesize). And of course what matters >> is the alignment at bitnum - even though op0 may be not sufficiently >> aligned it may have known misalignment so that op0 + bitnum is >> sufficiently aligned. >> >> Testcases would need to annotate structs with packed/aligned attributes >> to get at these cases. >> >> For the testcase included in the patch, what does the patch end up doing? >> Not going the strict-volatile bitfield expansion path? That looks unnecessary >> on !strict-alignment targets but resonable on strict-align targets where the >> access would need to be splitted. So, why does it end up being splitted >> on !strict-align targets? >> > > gcc -fstrict-volatile-bitfields -S 20150304-1.c > without patch we find this in 20150304-1.s: > > main: > .LFB0: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > movzbl global+1(%rip), %eax > orl $-1, %eax > movb %al, global+1(%rip) > movzbl global+2(%rip), %eax > orl $-1, %eax > movb %al, global+2(%rip) > movzbl global+3(%rip), %eax > orl $-1, %eax > movb %al, global+3(%rip) > movzbl global+4(%rip), %eax > orl $127, %eax > movb %al, global+4(%rip) > movl global(%rip), %eax > shrl $8, %eax > andl $2147483647, %eax > cmpl $2147483647, %eax > je .L2 > call abort > .L2: > movl $0, %eax > popq %rbp > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > > > so the write path is OK, because strict_volatile_bitfield_p returns false, > because > > /* Check for cases where the C++ memory model applies. */ > if (bitregion_end != 0 > && (bitnum - bitnum % modesize < bitregion_start > || bitnum - bitnum % modesize + modesize - 1> bitregion_end)) > return false; > > > bitregion_start=8, bitregion_end=39 > bitnum - bitnum % modesize = 0 > bitnum - bitnum % modesize + modesize - 1 = 31 > > > this does not happen in the read path, and the problem is the access > here does not work: > > str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum, > &bitnum); > /* Explicitly override the C/C++ memory model; ignore the > bit range so that we can do the access in the mode mandated > by -fstrict-volatile-bitfields instead. */ > store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value); > > > str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1 > can not handle that. > > BTW: I can make the write code path also fail if I change this bit > in the test case add ":8" to char x: > > struct s > { > char x:8; > unsigned int y:31; > } __attribute__((packed)); > > now the bit region is from 0..39, and we get this code: > > main: > .LFB0: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > movl global(%rip), %eax > orl $-256, %eax > movl %eax, global(%rip) > movl global(%rip), %eax > shrl $8, %eax > andl $2147483647, %eax > cmpl $2147483647, %eax > je .L2 > call abort > .L2: > movl $0, %eax > popq %rbp > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > > > > The patch (I think, I would still prefer my initial proposal) fixes > this by skipping the strict alingment code path. That looks OK for me, > because the structure is always packed if that happens, and it can't > be used to access "device registers", which are by definition always > naturally aligned, at least to the machine word size. > > > with the patch we get: > > main: > .LFB0: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > movzbl global+1(%rip), %eax > orl $-1, %eax > movb %al, global+1(%rip) > movzbl global+2(%rip), %eax > orl $-1, %eax > movb %al, global+2(%rip) > movzbl global+3(%rip), %eax > orl $-1, %eax > movb %al, global+3(%rip) > movzbl global+4(%rip), %eax > orl $127, %eax > movb %al, global+4(%rip) > movzbl global+1(%rip), %eax > movzbl %al, %eax > movzbl global+2(%rip), %edx > movzbl %dl, %edx > salq $8, %rdx > orq %rax, %rdx > movzbl global+3(%rip), %eax > movzbl %al, %eax > salq $16, %rax > orq %rax, %rdx > movzbl global+4(%rip), %eax > movzbl %al, %eax > andl $127, %eax > salq $24, %rax > orq %rdx, %rax > cmpl $2147483647, %eax > je .L2 > call abort > .L2: > movl $0, %eax > popq %rbp > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > > > every access is split in 4 QImode accesses. but that is as > expected, because the structure is byte aligned.
No, it is not expected because the CPU can handle unaligned SImode reads/writes just fine (even if not as an atomic operation). The C++ memory model allows an SImode read to s.y (-fstrict-volatile-bitfields would, as well, but the CPU doesn't guarantee atomic operation here) Richard. > > Thanks > Bernd. >