On 1/6/25 6:03 AM, Maciej W. Rozycki wrote:
Similarly to data races with 8-bit byte or 16-bit word quantity memory
writes on non-BWX Alpha implementations we have the same problem even on
BWX implementations with partial memory writes produced for unaligned
stores as well as block memory move and clear operations.  This happens
at the boundaries of the area written where we produce unprotected RMW
sequences, such as for example:

        ldbu $1,0($3)
        stw $31,8($3)
        stq $1,0($3)

to zero a 9-byte member at the byte offset of 1 of a quadword-aligned
struct, happily clobbering a 1-byte member at the beginning of said
struct if concurrent write happens while executing on the same CPU such
as in a signal handler or a parallel write happens while executing on
another CPU such as in another thread or via a shared memory segment.

To guard against these data races with partial memory write accesses
introduce the `-msafe-partial' command-line option that instructs the
compiler to protect boundaries of the data quantity accessed by instead
using a longer code sequence composed of narrower memory writes where
suitable machine instructions are available (i.e. with BWX targets) or
atomic RMW access sequences where byte and word memory access machine
instructions are not available (i.e. with non-BWX targets).

Owing to the desire of branch avoidance there are redundant overlapping
writes in unaligned cases where STQ_U operations are used in the middle
of a block so as to make sure no part of data to be written has been
lost regardless of run-time alignment.  For the non-BWX case it means
that with blocks whose size is not a multiple of 8 there are additional
atomic RMW sequences issued towards the end of the block in addition to
the always required pair enclosing the block from each end.

Only one such additional atomic RMW sequence is actually required, but
code currently issues two for the sake of simplicity.  An improvement
might be added to `alpha_expand_unaligned_store_words_safe_partial' in
the future, by folding `alpha_expand_unaligned_store_safe_partial' code
for handling multi-word blocks whose size is not a multiple of 8 (i.e.
with a trailing partial-word part).  It would improve performance a bit,
but current code is correct regardless.

Update test cases with `-mno-safe-partial' where required and add new
ones accordingly.

There are notable regressions between a plain `-mno-bwx' configuration
and a `-mno-bwx -msafe-partial' one:

FAIL: gm2/iso/run/pass/strcons.mod execution,  -g
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O -g
FAIL: gm2/iso/run/pass/strcons.mod execution,  -Os
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O3 -fomit-frame-pointer
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O3 -fomit-frame-pointer 
-finline-functions
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -g
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O -g
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -Os
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O3 -fomit-frame-pointer
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O3 -fomit-frame-pointer 
-finline-functions

Just as with `-msafe-bwa' regressions they come from the fact that these
test cases end up calling code that expects a reference to aligned data
but is handed one to unaligned data, causing an alignment exception with
LDL_L or LDQ_L, which will eventually be fixed up by Linux.

In some cases GCC chooses to open-code block memory write operations, so
with non-BWX targets `-msafe-partial' will in the usual case have to be
used together with `-msafe-bwa'.

Credit to Magnus Lindholm <linm...@gmail.com> for sharing hardware for
the purpose of verifying the BWX side of this change.

        gcc/
        PR target/117759
        * config/alpha/alpha-protos.h
        (alpha_expand_unaligned_store_safe_partial): New prototype.
        * config/alpha/alpha.cc (alpha_expand_movmisalign)
        (alpha_expand_block_move, alpha_expand_block_clear): Handle
        TARGET_SAFE_PARTIAL.
        (alpha_expand_unaligned_store_safe_partial)
        (alpha_expand_unaligned_store_words_safe_partial)
        (alpha_expand_clear_safe_partial_nobwx): New functions.
        * config/alpha/alpha.md (insvmisaligndi): Handle
        TARGET_SAFE_PARTIAL.
        * config/alpha/alpha.opt (msafe-partial): New option.
        * config/alpha/alpha.opt.urls: Regenerate.
        * doc/invoke.texi (Option Summary, DEC Alpha Options): Document
        the new option.

        gcc/testsuite/
        PR target/117759
        * gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add
        `-mno-safe-partial'.
        * gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file.
        * gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
        * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New
        file.
        * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c:
        New file.
        * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
        * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New
        file.
        * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c:
        New file.
        * gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'.
        * gcc.target/alpha/stlx0-safe-partial.c: New file.
        * gcc.target/alpha/stlx0-safe-partial-bwx.c: New file.
        * gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'.
        * gcc.target/alpha/stqx0-safe-partial.c: New file.
        * gcc.target/alpha/stqx0-safe-partial-bwx.c: New file.
        * gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'.
        * gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'.  Refer
        to stwx0.c rather than copying its code and also verify no LDQ_U
        or STQ_U instructions have been produced.
        * gcc.target/alpha/stwx0-safe-partial.c: New file.
        * gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
FWIW, there's certainly been a trend towards branch avoidance by issuing potentially redundant overlapping stores, so I'm not concerned about that.


  NB from my limited experience with Modula 2 decades ago I thought the
language was strongly-typed, so an alignment mismatch I guess shouldn't
happen.  But perhaps I've been wrong; corrections are welcome.
Sounds like we're roughly in a similar position. My experience was decades ago and before I started working on compilers and started caring about such things. So i can't provide any meaningful insight on this.



  NB2 as expected the atomic RMW sequences have a noticable influence on
the system's performance.  Regression testing completes in ~19h30m for
`-mno-bwx' and `23h15m' for `-mno-bwx -msafe-bwa -msafe-partial'.  But
correctness has to take priority over performance.
I suspect the number of folks impacted by this is quite small; alpha is a dead architecture after all, and non-bwx narrows the set of folks who care even more so.



+
+   For non-BWX targets we need to load data from memory, mask it such as
+   to keep any part outside the area written, insert data to be stored,
+   and write the result back atomically.  For sizes that are not a power
+   of 2 there are no byte mask or insert machine instructions available
+   so the mask required has to be built by hand, however ZAP and ZAPNOT
+   instructions can then be used to apply the mask.  Since LL/SC loops
+   are used, the high and low parts have to be disentangled from each
+   other and handled sequentially except for size 1 where there is only
+   the low part to be written.  */
So doesn't this mean that we're doing partial updates and thus have partial update visibility problems? Granted, it's still an improvement over the current state of the world. Just want to make sure I understand the basics here.


OK for the trunk.
jeff

Reply via email to