Hi, On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote: > > On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger > <bernd.edlin...@hotmail.de> wrote: >> Hi, >> >> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote: >>> >>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger >> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure. >> >> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode) >> Because the strict volatile bitfields handling will inevitably try to use >> the fieldmode to access the memory. >> >> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode), >> because it is easier to explain when some one asks, when we guarantee the >> semantics >> of strict volatile bitfield? > > But on non-strict-align targets you can even for 1-byte aligned MEMs > access an SImode field directly. So the old code looks correct to me > here and the fix needs to be done somewhere else. >
But this SImode access is split up in several QImode or HImode accesses, in the processor's execution pipeline, finally on an external bus like AXI all memory transactions are aligned. The difference is just that some processors can split up the unaligned accesses and some need help from the compiler, but even on an x86 we have a different semantics for unaligned acceses, that is they are no longer atomic, while an aligned access is always executed as an atomic transaction on an x86 processor. >> Probably there is already something in the logic in expr.c that prevents >> these cases, >> because otherwise it would be way to easy to find an example for unaligned >> accesses >> to unaligned memory on STRICT_ALIGNMENT targets. >> >> >> Ok, what would you think about this variant? >> >> --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100 >> +++ expmed.c 2015-03-05 11:50:09.400444416 +0100 >> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns >> return false; >> >> /* Check for cases of unaligned fields that must be split. */ >> - if (bitnum % BITS_PER_UNIT + bitsize> modesize >> - || (STRICT_ALIGNMENT >> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize)) >> + if (bitnum % modesize + bitsize> modesize) >> + return false; >> + >> + /* Check that the memory is sufficiently aligned. */ >> + if (MEM_ALIGN (op0) < modesize) > > I think that only applies to strict-align targets and then only for > GET_MODE_ALIGNMENT (modesize). And of course what matters > is the alignment at bitnum - even though op0 may be not sufficiently > aligned it may have known misalignment so that op0 + bitnum is > sufficiently aligned. > > Testcases would need to annotate structs with packed/aligned attributes > to get at these cases. > > For the testcase included in the patch, what does the patch end up doing? > Not going the strict-volatile bitfield expansion path? That looks unnecessary > on !strict-alignment targets but resonable on strict-align targets where the > access would need to be splitted. So, why does it end up being splitted > on !strict-align targets? > gcc -fstrict-volatile-bitfields -S 20150304-1.c without patch we find this in 20150304-1.s: main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movzbl global+1(%rip), %eax orl $-1, %eax movb %al, global+1(%rip) movzbl global+2(%rip), %eax orl $-1, %eax movb %al, global+2(%rip) movzbl global+3(%rip), %eax orl $-1, %eax movb %al, global+3(%rip) movzbl global+4(%rip), %eax orl $127, %eax movb %al, global+4(%rip) movl global(%rip), %eax shrl $8, %eax andl $2147483647, %eax cmpl $2147483647, %eax je .L2 call abort .L2: movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc so the write path is OK, because strict_volatile_bitfield_p returns false, because /* Check for cases where the C++ memory model applies. */ if (bitregion_end != 0 && (bitnum - bitnum % modesize < bitregion_start || bitnum - bitnum % modesize + modesize - 1> bitregion_end)) return false; bitregion_start=8, bitregion_end=39 bitnum - bitnum % modesize = 0 bitnum - bitnum % modesize + modesize - 1 = 31 this does not happen in the read path, and the problem is the access here does not work: str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum, &bitnum); /* Explicitly override the C/C++ memory model; ignore the bit range so that we can do the access in the mode mandated by -fstrict-volatile-bitfields instead. */ store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value); str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1 can not handle that. BTW: I can make the write code path also fail if I change this bit in the test case add ":8" to char x: struct s { char x:8; unsigned int y:31; } __attribute__((packed)); now the bit region is from 0..39, and we get this code: main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl global(%rip), %eax orl $-256, %eax movl %eax, global(%rip) movl global(%rip), %eax shrl $8, %eax andl $2147483647, %eax cmpl $2147483647, %eax je .L2 call abort .L2: movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc The patch (I think, I would still prefer my initial proposal) fixes this by skipping the strict alingment code path. That looks OK for me, because the structure is always packed if that happens, and it can't be used to access "device registers", which are by definition always naturally aligned, at least to the machine word size. with the patch we get: main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movzbl global+1(%rip), %eax orl $-1, %eax movb %al, global+1(%rip) movzbl global+2(%rip), %eax orl $-1, %eax movb %al, global+2(%rip) movzbl global+3(%rip), %eax orl $-1, %eax movb %al, global+3(%rip) movzbl global+4(%rip), %eax orl $127, %eax movb %al, global+4(%rip) movzbl global+1(%rip), %eax movzbl %al, %eax movzbl global+2(%rip), %edx movzbl %dl, %edx salq $8, %rdx orq %rax, %rdx movzbl global+3(%rip), %eax movzbl %al, %eax salq $16, %rax orq %rax, %rdx movzbl global+4(%rip), %eax movzbl %al, %eax andl $127, %eax salq $24, %rax orq %rdx, %rax cmpl $2147483647, %eax je .L2 call abort .L2: movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc every access is split in 4 QImode accesses. but that is as expected, because the structure is byte aligned. Thanks Bernd.