Similarly to data races with 8-bit byte or 16-bit word quantity memory 
writes on non-BWX Alpha implementations we have the same problem even on 
BWX implementations with partial memory writes produced for unaligned 
stores as well as block memory move and clear operations.  This happens 
at the boundaries of the area written where we produce unprotected RMW 
sequences, such as for example:

        ldbu $1,0($3)
        stw $31,8($3)
        stq $1,0($3)

to zero a 9-byte member at the byte offset of 1 of a quadword-aligned 
struct, happily clobbering a 1-byte member at the beginning of said 
struct if concurrent write happens while executing on the same CPU such 
as in a signal handler or a parallel write happens while executing on 
another CPU such as in another thread or via a shared memory segment.

To guard against these data races with partial memory write accesses 
introduce the `-msafe-partial' command-line option that instructs the 
compiler to protect boundaries of the data quantity accessed by instead 
using a longer code sequence composed of narrower memory writes where 
suitable machine instructions are available (i.e. with BWX targets) or 
atomic RMW access sequences where byte and word memory access machine 
instructions are not available (i.e. with non-BWX targets).

Owing to the desire of branch avoidance there are redundant overlapping 
writes in unaligned cases where STQ_U operations are used in the middle 
of a block so as to make sure no part of data to be written has been 
lost regardless of run-time alignment.  For the non-BWX case it means 
that with blocks whose size is not a multiple of 8 there are additional 
atomic RMW sequences issued towards the end of the block in addition to 
the always required pair enclosing the block from each end.

Only one such additional atomic RMW sequence is actually required, but 
code currently issues two for the sake of simplicity.  An improvement 
might be added to `alpha_expand_unaligned_store_words_safe_partial' in 
the future, by folding `alpha_expand_unaligned_store_safe_partial' code 
for handling multi-word blocks whose size is not a multiple of 8 (i.e. 
with a trailing partial-word part).  It would improve performance a bit, 
but current code is correct regardless.

Update test cases with `-mno-safe-partial' where required and add new 
ones accordingly.

There are notable regressions between a plain `-mno-bwx' configuration
and a `-mno-bwx -msafe-partial' one:

FAIL: gm2/iso/run/pass/strcons.mod execution,  -g
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O -g
FAIL: gm2/iso/run/pass/strcons.mod execution,  -Os
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O3 -fomit-frame-pointer
FAIL: gm2/iso/run/pass/strcons.mod execution,  -O3 -fomit-frame-pointer 
-finline-functions
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -g
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O -g
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -Os
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O3 -fomit-frame-pointer
FAIL: gm2/iso/run/pass/strcons4.mod execution,  -O3 -fomit-frame-pointer 
-finline-functions

Just as with `-msafe-bwa' regressions they come from the fact that these 
test cases end up calling code that expects a reference to aligned data 
but is handed one to unaligned data, causing an alignment exception with 
LDL_L or LDQ_L, which will eventually be fixed up by Linux.

In some cases GCC chooses to open-code block memory write operations, so 
with non-BWX targets `-msafe-partial' will in the usual case have to be 
used together with `-msafe-bwa'.

Credit to Magnus Lindholm <linm...@gmail.com> for sharing hardware for 
the purpose of verifying the BWX side of this change.

        gcc/
        PR target/117759
        * config/alpha/alpha-protos.h 
        (alpha_expand_unaligned_store_safe_partial): New prototype.
        * config/alpha/alpha.cc (alpha_expand_movmisalign)
        (alpha_expand_block_move, alpha_expand_block_clear): Handle 
        TARGET_SAFE_PARTIAL.
        (alpha_expand_unaligned_store_safe_partial)
        (alpha_expand_unaligned_store_words_safe_partial)
        (alpha_expand_clear_safe_partial_nobwx): New functions.
        * config/alpha/alpha.md (insvmisaligndi): Handle 
        TARGET_SAFE_PARTIAL.
        * config/alpha/alpha.opt (msafe-partial): New option.
        * config/alpha/alpha.opt.urls: Regenerate.
        * doc/invoke.texi (Option Summary, DEC Alpha Options): Document
        the new option.

        gcc/testsuite/
        PR target/117759
        * gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add 
        `-mno-safe-partial'.
        * gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file.
        * gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
        * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New 
        file.
        * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c: 
        New file.
        * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
        * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New 
        file.
        * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c: 
        New file.
        * gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'.
        * gcc.target/alpha/stlx0-safe-partial.c: New file.
        * gcc.target/alpha/stlx0-safe-partial-bwx.c: New file.
        * gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'.
        * gcc.target/alpha/stqx0-safe-partial.c: New file.
        * gcc.target/alpha/stqx0-safe-partial-bwx.c: New file.
        * gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'.
        * gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'.  Refer
        to stwx0.c rather than copying its code and also verify no LDQ_U 
        or STQ_U instructions have been produced.
        * gcc.target/alpha/stwx0-safe-partial.c: New file.
        * gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
---
 Verifying with the `alphaev56-linux-gnu' target revealed a bunch of 
regressions with test cases where I forgot to add `-mno-safe-partial'.  
I took the opportunity to add complementary tests to cover the 
`-msafe-partial' case too.

 NB from my limited experience with Modula 2 decades ago I thought the 
language was strongly-typed, so an alignment mismatch I guess shouldn't 
happen.  But perhaps I've been wrong; corrections are welcome.

 NB2 as expected the atomic RMW sequences have a noticable influence on 
the system's performance.  Regression testing completes in ~19h30m for 
`-mno-bwx' and `23h15m' for `-mno-bwx -msafe-bwa -msafe-partial'.  But 
correctness has to take priority over performance.

Changes from v1:

- Add a reference to PR target/117759.

- Add `-mno-safe-partial' to memclr-a2-o1-c9-ptr.c, stlx0.c, stwx0.c, and 
  stwx0-bwx.c tests.

- Make stwx0-bwx.c a bit stricter and also verify no LDQ_U or STQ_U 
  instructions have been produced and include stwx0.c rather than copying 
  its code.

- Add memclr-a2-o1-c9-ptr-safe-partial.c, stlx0-safe-partial.c, 
  stlx0-safe-partial-bwx.c, stqx0-safe-partial.c, 
  stqx0-safe-partial-bwx.c, stwx0-safe-partial.c, and 
  stwx0-safe-partial-bwx.c tests.

- Update the change description accordingly.
---
 gcc/config/alpha/alpha-protos.h                                           |    
3 
 gcc/config/alpha/alpha.cc                                                 |  
616 +++++++++-
 gcc/config/alpha/alpha.md                                                 |   
12 
 gcc/config/alpha/alpha.opt                                                |    
4 
 gcc/config/alpha/alpha.opt.urls                                           |    
3 
 gcc/doc/invoke.texi                                                       |   
12 
 gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c         |   
22 
 gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c                      |    
2 
 gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c |   
13 
 gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c     |   
12 
 gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c                  |    
2 
 gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c |   
13 
 gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c     |   
12 
 gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c                  |    
2 
 gcc/testsuite/gcc.target/alpha/stlx0-safe-partial-bwx.c                   |   
17 
 gcc/testsuite/gcc.target/alpha/stlx0-safe-partial.c                       |   
29 
 gcc/testsuite/gcc.target/alpha/stlx0.c                                    |    
2 
 gcc/testsuite/gcc.target/alpha/stqx0-safe-partial-bwx.c                   |   
21 
 gcc/testsuite/gcc.target/alpha/stqx0-safe-partial.c                       |   
29 
 gcc/testsuite/gcc.target/alpha/stqx0.c                                    |    
2 
 gcc/testsuite/gcc.target/alpha/stwx0-bwx.c                                |   
14 
 gcc/testsuite/gcc.target/alpha/stwx0-safe-partial-bwx.c                   |   
15 
 gcc/testsuite/gcc.target/alpha/stwx0-safe-partial.c                       |   
29 
 gcc/testsuite/gcc.target/alpha/stwx0.c                                    |    
2 
 24 files changed, 837 insertions(+), 51 deletions(-)

gcc-alpha-safe-partial.diff
Index: gcc/gcc/config/alpha/alpha-protos.h
===================================================================
--- gcc.orig/gcc/config/alpha/alpha-protos.h
+++ gcc/gcc/config/alpha/alpha-protos.h
@@ -54,6 +54,9 @@ extern void alpha_expand_unaligned_load
                                         HOST_WIDE_INT, int);
 extern void alpha_expand_unaligned_store (rtx, rtx, HOST_WIDE_INT,
                                          HOST_WIDE_INT);
+extern void alpha_expand_unaligned_store_safe_partial (rtx, rtx, HOST_WIDE_INT,
+                                                      HOST_WIDE_INT,
+                                                      HOST_WIDE_INT);
 extern int alpha_expand_block_move (rtx []);
 extern int alpha_expand_block_clear (rtx []);
 extern rtx alpha_expand_zap_mask (HOST_WIDE_INT);
Index: gcc/gcc/config/alpha/alpha.cc
===================================================================
--- gcc.orig/gcc/config/alpha/alpha.cc
+++ gcc/gcc/config/alpha/alpha.cc
@@ -2481,7 +2481,11 @@ alpha_expand_movmisalign (machine_mode m
     {
       if (!reg_or_0_operand (operands[1], mode))
        operands[1] = force_reg (mode, operands[1]);
-      alpha_expand_unaligned_store (operands[0], operands[1], 8, 0);
+      if (TARGET_SAFE_PARTIAL)
+       alpha_expand_unaligned_store_safe_partial (operands[0], operands[1],
+                                                  8, 0, BITS_PER_UNIT);
+      else
+       alpha_expand_unaligned_store (operands[0], operands[1], 8, 0);
     }
   else
     gcc_unreachable ();
@@ -3673,6 +3677,310 @@ alpha_expand_unaligned_store (rtx dst, r
   emit_move_insn (meml, dstl);
 }
 
+/* Store data SRC of size SIZE using unaligned methods to location
+   referred by base DST plus offset OFS and of alignment ALIGN.  This is
+   a multi-thread and async-signal safe implementation for all sizes from
+   8 down to 1.
+
+   For BWX targets it is straightforward, we just write data piecemeal,
+   taking any advantage of the alignment known and observing that we
+   shouldn't have been called for alignments of 32 or above in the first
+   place (though adding support for that would be easy).
+
+   For non-BWX targets we need to load data from memory, mask it such as
+   to keep any part outside the area written, insert data to be stored,
+   and write the result back atomically.  For sizes that are not a power
+   of 2 there are no byte mask or insert machine instructions available
+   so the mask required has to be built by hand, however ZAP and ZAPNOT
+   instructions can then be used to apply the mask.  Since LL/SC loops
+   are used, the high and low parts have to be disentangled from each
+   other and handled sequentially except for size 1 where there is only
+   the low part to be written.  */
+
+void
+alpha_expand_unaligned_store_safe_partial (rtx dst, rtx src,
+                                          HOST_WIDE_INT size,
+                                          HOST_WIDE_INT ofs,
+                                          HOST_WIDE_INT align)
+{
+  if (TARGET_BWX)
+    {
+      machine_mode mode = align >= 2 * BITS_PER_UNIT ? HImode : QImode;
+      HOST_WIDE_INT step = mode == HImode ? 2 : 1;
+
+      while (1)
+       {
+         rtx dstl = src == const0_rtx ? const0_rtx : gen_lowpart (mode, src);
+         rtx meml = adjust_address (dst, mode, ofs);
+         emit_move_insn (meml, dstl);
+
+         ofs += step;
+         size -= step;
+         if (size == 0)
+           return;
+
+         if (size < step)
+           {
+             mode = QImode;
+             step = 1;
+           }
+
+         if (src != const0_rtx)
+           src = expand_simple_binop (DImode, LSHIFTRT, src,
+                                      GEN_INT (step * BITS_PER_UNIT),
+                                      NULL, 1, OPTAB_WIDEN);
+       }
+    }
+
+  rtx dsta = XEXP (dst, 0);
+  if (GET_CODE (dsta) == LO_SUM)
+    dsta = force_reg (Pmode, dsta);
+
+  rtx addr = copy_addr_to_reg (plus_constant (Pmode, dsta, ofs));
+
+  rtx byte_mask = NULL_RTX;
+  switch (size)
+    {
+    case 3:
+    case 5:
+    case 6:
+    case 7:
+      /* If size is not a power of 2 we need to build the byte mask from
+        size by hand.  This is SIZE consecutive bits starting from bit 0.  */
+      byte_mask = force_reg (DImode, GEN_INT (~(HOST_WIDE_INT_M1U << size)));
+
+      /* Unlike with machine INSxx and MSKxx operations there is no
+        implicit mask applied to addr with corresponding operations
+        made by hand, so extract the byte index now.  */
+      emit_insn (gen_rtx_SET (addr,
+                             gen_rtx_AND (DImode, addr, GEN_INT (~-8))));
+    }
+
+  /* Must handle high before low for degenerate case of aligned.  */
+  if (size != 1)
+    {
+      rtx addrh = gen_reg_rtx (DImode);
+      rtx aligned_addrh = gen_reg_rtx (DImode);
+      emit_insn (gen_rtx_SET (addrh,
+                             plus_constant (DImode, dsta, ofs + size - 1)));
+      emit_insn (gen_rtx_SET (aligned_addrh,
+                             gen_rtx_AND (DImode, addrh, GEN_INT (-8))));
+
+      /* AND addresses cannot be in any alias set, since they may implicitly
+        alias surrounding code.  Ideally we'd have some alias set that
+        covered all types except those with alignment 8 or higher.  */
+      rtx memh = change_address (dst, DImode, aligned_addrh);
+      set_mem_alias_set (memh, 0);
+
+      rtx insh = gen_reg_rtx (DImode);
+      rtx maskh = NULL_RTX;
+      switch (size)
+       {
+       case 1:
+       case 2:
+       case 4:
+       case 8:
+         if (src != CONST0_RTX (GET_MODE (src)))
+           emit_insn (gen_insxh (insh, gen_lowpart (DImode, src),
+                                 GEN_INT (size * 8), addr));
+         break;
+       case 3:
+       case 5:
+       case 6:
+       case 7:
+         {
+           /* For the high part we shift the byte mask right by 8 minus
+              the byte index in addr, so we need an extra calculation.  */
+           rtx shamt = gen_reg_rtx (DImode);
+           emit_insn (gen_rtx_SET (shamt,
+                                   gen_rtx_MINUS (DImode,
+                                                  force_reg (DImode,
+                                                             GEN_INT (8)),
+                                                  addr)));
+
+           maskh = gen_reg_rtx (DImode);
+           rtx shift = gen_rtx_LSHIFTRT (DImode, byte_mask, shamt);
+           emit_insn (gen_rtx_SET (maskh, shift));
+
+           /* Insert any bytes required by hand, by doing a byte-wise
+              shift on SRC right by the same number and then zap the
+              bytes outside the byte mask.  */
+           if (src != CONST0_RTX (GET_MODE (src)))
+             {
+               rtx byte_loc = gen_reg_rtx (DImode);
+               emit_insn (gen_rtx_SET (byte_loc,
+                                       gen_rtx_ASHIFT (DImode,
+                                                       shamt, GEN_INT (3))));
+               rtx bytes = gen_reg_rtx (DImode);
+               emit_insn (gen_rtx_SET (bytes,
+                                       gen_rtx_LSHIFTRT (DImode,
+                                                         gen_lowpart (DImode,
+                                                                      src),
+                                                         byte_loc)));
+
+               rtx zapmask = gen_rtx_NOT (QImode,
+                                          gen_rtx_SUBREG (QImode, maskh, 0));
+               rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask),
+                                         UNSPEC_ZAP);
+               emit_insn (gen_rtx_SET (insh,
+                                       gen_rtx_AND (DImode, zap, bytes)));
+             }
+         }
+         break;
+       default:
+         gcc_unreachable ();
+       }
+
+      rtx labelh = gen_rtx_LABEL_REF (DImode, gen_label_rtx ());
+      emit_label (XEXP (labelh, 0));
+
+      rtx dsth = gen_reg_rtx (DImode);
+      emit_insn (gen_load_locked (DImode, dsth, memh));
+
+      switch (size)
+       {
+       case 1:
+       case 2:
+       case 4:
+       case 8:
+         emit_insn (gen_mskxh (dsth, dsth, GEN_INT (size * 8), addr));
+         break;
+       case 3:
+       case 5:
+       case 6:
+       case 7:
+         {
+           rtx zapmask = gen_rtx_SUBREG (QImode, maskh, 0);
+           rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask),
+                                     UNSPEC_ZAP);
+           emit_insn (gen_rtx_SET (dsth, gen_rtx_AND (DImode, zap, dsth)));
+         }
+         break;
+       default:
+         gcc_unreachable ();
+       }
+
+      if (src != CONST0_RTX (GET_MODE (src)))
+       dsth = expand_simple_binop (DImode, IOR, insh, dsth, dsth, 0,
+                                   OPTAB_WIDEN);
+
+      emit_insn (gen_store_conditional (DImode, dsth, memh, dsth));
+
+      alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, dsth, const0_rtx), labelh);
+    }
+
+  /* Now handle low.  */
+  rtx addrl = gen_reg_rtx (DImode);
+  rtx aligned_addrl = gen_reg_rtx (DImode);
+  emit_insn (gen_rtx_SET (addrl, plus_constant (DImode, dsta, ofs)));
+  emit_insn (gen_rtx_SET (aligned_addrl,
+                         gen_rtx_AND (DImode, addrl, GEN_INT (-8))));
+
+  /* AND addresses cannot be in any alias set, since they may implicitly
+     alias surrounding code.  Ideally we'd have some alias set that
+     covered all types except those with alignment 8 or higher.  */
+  rtx meml = change_address (dst, DImode, aligned_addrl);
+  set_mem_alias_set (meml, 0);
+
+  rtx insl = gen_reg_rtx (DImode);
+  rtx maskl;
+  switch (size)
+    {
+    case 1:
+      if (src != CONST0_RTX (GET_MODE (src)))
+       emit_insn (gen_insbl (insl, gen_lowpart (QImode, src), addr));
+      break;
+    case 2:
+      if (src != CONST0_RTX (GET_MODE (src)))
+       emit_insn (gen_inswl (insl, gen_lowpart (HImode, src), addr));
+      break;
+    case 4:
+      if (src != CONST0_RTX (GET_MODE (src)))
+       emit_insn (gen_insll (insl, gen_lowpart (SImode, src), addr));
+      break;
+    case 8:
+      if (src != CONST0_RTX (GET_MODE (src)))
+       emit_insn (gen_insql (insl, gen_lowpart (DImode, src), addr));
+      break;
+    case 3:
+    case 5:
+    case 6:
+    case 7:
+      /* For the low part we shift the byte mask left by the byte index,
+        which is already in ADDR.  */
+      maskl = gen_reg_rtx (DImode);
+      emit_insn (gen_rtx_SET (maskl,
+                             gen_rtx_ASHIFT (DImode, byte_mask, addr)));
+
+      /* Insert any bytes required by hand, by doing a byte-wise shift
+        on SRC left by the same number and then zap the bytes outside
+        the byte mask.  */
+      if (src != CONST0_RTX (GET_MODE (src)))
+       {
+         rtx byte_loc = gen_reg_rtx (DImode);
+         emit_insn (gen_rtx_SET (byte_loc,
+                                 gen_rtx_ASHIFT (DImode,
+                                                 force_reg (DImode, addr),
+                                                 GEN_INT (3))));
+         rtx bytes = gen_reg_rtx (DImode);
+         emit_insn (gen_rtx_SET (bytes,
+                                 gen_rtx_ASHIFT (DImode,
+                                                 gen_lowpart (DImode, src),
+                                                 byte_loc)));
+
+         rtx zapmask = gen_rtx_NOT (QImode,
+                                    gen_rtx_SUBREG (QImode, maskl, 0));
+         rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask),
+                                   UNSPEC_ZAP);
+         emit_insn (gen_rtx_SET (insl, gen_rtx_AND (DImode, zap, bytes)));
+       }
+      break;
+      default:
+       gcc_unreachable ();
+      }
+
+  rtx labell = gen_rtx_LABEL_REF (DImode, gen_label_rtx ());
+  emit_label (XEXP (labell, 0));
+
+  rtx dstl = gen_reg_rtx (DImode);
+  emit_insn (gen_load_locked (DImode, dstl, meml));
+
+  switch (size)
+    {
+    case 1:
+      emit_insn (gen_mskbl (dstl, dstl, addr));
+      break;
+    case 2:
+      emit_insn (gen_mskwl (dstl, dstl, addr));
+      break;
+    case 4:
+      emit_insn (gen_mskll (dstl, dstl, addr));
+      break;
+    case 8:
+      emit_insn (gen_mskql (dstl, dstl, addr));
+      break;
+    case 3:
+    case 5:
+    case 6:
+    case 7:
+      {
+       rtx zapmask = gen_rtx_SUBREG (QImode, maskl, 0);
+       rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask), UNSPEC_ZAP);
+       emit_insn (gen_rtx_SET (dstl, gen_rtx_AND (DImode, zap, dstl)));
+      }
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (src != CONST0_RTX (GET_MODE (src)))
+    dstl = expand_simple_binop (DImode, IOR, insl, dstl, dstl, 0, OPTAB_WIDEN);
+
+  emit_insn (gen_store_conditional (DImode, dstl, meml, dstl));
+
+  alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, dstl, const0_rtx), labell);
+}
+
 /* The block move code tries to maximize speed by separating loads and
    stores at the expense of register pressure: we load all of the data
    before we store it back out.  There are two secondary effects worth
@@ -3838,6 +4146,117 @@ alpha_expand_unaligned_store_words (rtx
   emit_move_insn (st_addr_1, st_tmp_1);
 }
 
+/* Store an integral number of consecutive unaligned quadwords.  DATA_REGS
+   may be NULL to store zeros.  This is a multi-thread and async-signal
+   safe implementation.  */
+
+static void
+alpha_expand_unaligned_store_words_safe_partial (rtx *data_regs, rtx dmem,
+                                               HOST_WIDE_INT words,
+                                               HOST_WIDE_INT ofs,
+                                               HOST_WIDE_INT align)
+{
+  rtx const im8 = GEN_INT (-8);
+  rtx ins_tmps[MAX_MOVE_WORDS];
+  HOST_WIDE_INT i;
+
+  /* Generate all the tmp registers we need.  */
+  for (i = 0; i < words; i++)
+    ins_tmps[i] = data_regs != NULL ? gen_reg_rtx (DImode) : const0_rtx;
+
+  if (ofs != 0)
+    dmem = adjust_address (dmem, GET_MODE (dmem), ofs);
+
+  /* For BWX store the ends before we start fiddling with data registers
+     to fill the middle.  Also if we have no more than two quadwords,
+     then obviously we're done.  */
+  if (TARGET_BWX)
+    {
+      rtx datan = data_regs ? data_regs[words - 1] : const0_rtx;
+      rtx data0 = data_regs ? data_regs[0] : const0_rtx;
+      HOST_WIDE_INT e = (words - 1) * 8;
+
+      alpha_expand_unaligned_store_safe_partial (dmem, data0, 8, 0, align);
+      alpha_expand_unaligned_store_safe_partial (dmem, datan, 8, e, align);
+      if (words <= 2)
+       return;
+    }
+
+  rtx dmema = XEXP (dmem, 0);
+  if (GET_CODE (dmema) == LO_SUM)
+    dmema = force_reg (Pmode, dmema);
+
+  /* Shift the input data into place.  */
+  rtx dreg = copy_addr_to_reg (dmema);
+  if (data_regs != NULL)
+    {
+      for (i = words - 1; i >= 0; i--)
+       {
+         emit_insn (gen_insqh (ins_tmps[i], data_regs[i], dreg));
+         emit_insn (gen_insql (data_regs[i], data_regs[i], dreg));
+       }
+      for (i = words - 1; i > 0; i--)
+       ins_tmps[i - 1] = expand_simple_binop (DImode, IOR, data_regs[i],
+                                              ins_tmps[i - 1],
+                                              ins_tmps[i - 1],
+                                              1, OPTAB_DIRECT);
+    }
+
+  if (!TARGET_BWX)
+    {
+      rtx temp = gen_reg_rtx (DImode);
+      rtx mem = gen_rtx_MEM (DImode,
+                            expand_simple_binop (Pmode, AND, dreg, im8,
+                                                 NULL_RTX, 1, OPTAB_DIRECT));
+
+      rtx label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
+      emit_label (XEXP (label, 0));
+
+      emit_insn (gen_load_locked (DImode, temp, mem));
+      emit_insn (gen_mskql (temp, temp, dreg));
+      if (data_regs != NULL)
+       temp = expand_simple_binop (DImode, IOR, temp, data_regs[0],
+                                   temp, 1, OPTAB_DIRECT);
+      emit_insn (gen_store_conditional (DImode, temp, mem, temp));
+
+      alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, temp, const0_rtx), label);
+    }
+
+  for (i = words - 1; i > 0; --i)
+    {
+      rtx temp = change_address (dmem, Pmode,
+                                gen_rtx_AND (Pmode,
+                                             plus_constant (Pmode,
+                                                            dmema, i * 8),
+                                             im8));
+      set_mem_alias_set (temp, 0);
+      emit_move_insn (temp, ins_tmps[i - 1]);
+    }
+
+  if (!TARGET_BWX)
+    {
+      rtx temp = gen_reg_rtx (DImode);
+      rtx addr = expand_simple_binop (Pmode, PLUS, dreg,
+                                     GEN_INT (words * 8 - 1),
+                                     NULL_RTX, 1, OPTAB_DIRECT);
+      rtx mem = gen_rtx_MEM (DImode,
+                            expand_simple_binop (Pmode, AND, addr, im8,
+                                                 NULL_RTX, 1, OPTAB_DIRECT));
+
+      rtx label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
+      emit_label (XEXP (label, 0));
+
+      emit_insn (gen_load_locked (DImode, temp, mem));
+      emit_insn (gen_mskqh (temp, temp, dreg));
+      if (data_regs != NULL)
+       temp = expand_simple_binop (DImode, IOR, temp, ins_tmps[words - 1],
+                                   temp, 1, OPTAB_DIRECT);
+      emit_insn (gen_store_conditional (DImode, temp, mem, temp));
+
+      alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, temp, const0_rtx), label);
+    }
+}
+
 /* Get the base alignment and offset of EXPR in A and O respectively.
    Check for any pseudo register pointer alignment and for any tree
    node information and return the largest alignment determined and
@@ -4147,26 +4566,74 @@ alpha_expand_block_move (rtx operands[])
        if (GET_MODE (data_regs[i + words]) != DImode)
          break;
 
-      if (words == 1)
-       alpha_expand_unaligned_store (orig_dst, data_regs[i], 8, ofs);
+      if (TARGET_SAFE_PARTIAL)
+       {
+         if (words == 1)
+           alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[i],
+                                                      8, ofs, dst_align);
+         else
+           alpha_expand_unaligned_store_words_safe_partial (data_regs + i,
+                                                            orig_dst, words,
+                                                            ofs, dst_align);
+       }
       else
-        alpha_expand_unaligned_store_words (data_regs + i, orig_dst,
-                                           words, ofs);
-
+       {
+         if (words == 1)
+           alpha_expand_unaligned_store (orig_dst, data_regs[i], 8, ofs);
+         else
+           alpha_expand_unaligned_store_words (data_regs + i, orig_dst,
+                                               words, ofs);
+       }
       i += words;
       ofs += words * 8;
     }
 
-  /* Due to the above, this won't be aligned.  */
+  /* If we are in the partial memory access safety mode with a non-BWX
+     target, then coalesce data loaded of different widths so as to
+     minimize the number of safe partial stores as they are expensive.  */
+  if (!TARGET_BWX && TARGET_SAFE_PARTIAL)
+    {
+      HOST_WIDE_INT size = 0;
+      unsigned int n;
+
+      for (n = i; i < nregs; i++)
+       {
+         if (i != n)
+           {
+             /* Don't widen SImode data where obtained by extraction.  */
+             rtx data = data_regs[n];
+             if (GET_MODE (data) == SImode && src_align < 32)
+               data = gen_rtx_SUBREG (DImode, data, 0);
+             rtx field = expand_simple_binop (DImode, ASHIFT, data_regs[i],
+                                              GEN_INT (size * BITS_PER_UNIT),
+                                              NULL_RTX, 1, OPTAB_DIRECT);
+             data_regs[n] = expand_simple_binop (DImode, IOR, data, field,
+                                                 data, 1, OPTAB_WIDEN);
+           }
+         size += GET_MODE_SIZE (GET_MODE (data_regs[i]));
+         gcc_assert (size < 8);
+       }
+      if (size > 0)
+       alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[n],
+                                                  size, ofs, dst_align);
+      ofs += size;
+    }
+
+  /* We've done aligned stores above, this won't be aligned.  */
   while (i < nregs && GET_MODE (data_regs[i]) == SImode)
     {
-      alpha_expand_unaligned_store (orig_dst, data_regs[i], 4, ofs);
+      gcc_assert (TARGET_BWX || !TARGET_SAFE_PARTIAL);
+      if (TARGET_SAFE_PARTIAL)
+       alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[i],
+                                                  4, ofs, dst_align);
+      else
+       alpha_expand_unaligned_store (orig_dst, data_regs[i], 4, ofs);
       ofs += 4;
       i++;
       gcc_assert (i == nregs || GET_MODE (data_regs[i]) != SImode);
     }
 
-  if (dst_align >= 16)
+  if (TARGET_BWX && dst_align >= 16)
     while (i < nregs && GET_MODE (data_regs[i]) == HImode)
       {
        emit_move_insn (adjust_address (orig_dst, HImode, ofs), data_regs[i]);
@@ -4176,7 +4643,12 @@ alpha_expand_block_move (rtx operands[])
   else
     while (i < nregs && GET_MODE (data_regs[i]) == HImode)
       {
-       alpha_expand_unaligned_store (orig_dst, data_regs[i], 2, ofs);
+       gcc_assert (TARGET_BWX || !TARGET_SAFE_PARTIAL);
+       if (TARGET_SAFE_PARTIAL)
+         alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[i],
+                                                    2, ofs, dst_align);
+       else
+         alpha_expand_unaligned_store (orig_dst, data_regs[i], 2, ofs);
        i++;
        ofs += 2;
       }
@@ -4185,6 +4657,7 @@ alpha_expand_block_move (rtx operands[])
   while (i < nregs)
     {
       gcc_assert (GET_MODE (data_regs[i]) == QImode);
+      gcc_assert (TARGET_BWX || !TARGET_SAFE_PARTIAL);
       emit_move_insn (adjust_address (orig_dst, QImode, ofs), data_regs[i]);
       i++;
       ofs += 1;
@@ -4193,6 +4666,27 @@ alpha_expand_block_move (rtx operands[])
   return 1;
 }
 
+/* Expand a multi-thread and async-signal safe partial clear of a longword
+   or a quadword quantity indicated by MODE at aligned memory location MEM
+   according to MASK.  */
+
+static void
+alpha_expand_clear_safe_partial_nobwx (rtx mem, machine_mode mode,
+                                      HOST_WIDE_INT mask)
+{
+  rtx label = gen_rtx_LABEL_REF (DImode, gen_label_rtx ());
+  emit_label (XEXP (label, 0));
+
+  rtx temp = gen_reg_rtx (mode);
+  rtx status = mode == DImode ? temp : gen_rtx_SUBREG (DImode, temp, 0);
+
+  emit_insn (gen_load_locked (mode, temp, mem));
+  emit_insn (gen_rtx_SET (temp, gen_rtx_AND (mode, temp, GEN_INT (mask))));
+  emit_insn (gen_store_conditional (mode, status, mem, temp));
+
+  alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, status, const0_rtx), label);
+}
+
 int
 alpha_expand_block_clear (rtx operands[])
 {
@@ -4237,8 +4731,9 @@ alpha_expand_block_clear (rtx operands[]
     {
       /* Given that alignofs is bounded by align, the only time BWX could
         generate three stores is for a 7 byte fill.  Prefer two individual
-        stores over a load/mask/store sequence.  */
-      if ((!TARGET_BWX || alignofs == 7)
+        stores over a load/mask/store sequence.  In the partial safety
+        mode always do individual stores regardless of their count.  */
+      if ((!TARGET_BWX || (!TARGET_SAFE_PARTIAL && alignofs == 7))
               && align >= 32
               && !(alignofs == 4 && bytes >= 4))
        {
@@ -4264,10 +4759,15 @@ alpha_expand_block_clear (rtx operands[]
            }
          alignofs = 0;
 
-         tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask),
-                             NULL_RTX, 1, OPTAB_WIDEN);
+         if (TARGET_SAFE_PARTIAL)
+           alpha_expand_clear_safe_partial_nobwx (mem, mode, mask);
+         else
+           {
+             tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask),
+                                 NULL_RTX, 1, OPTAB_WIDEN);
 
-         emit_move_insn (mem, tmp);
+             emit_move_insn (mem, tmp);
+           }
        }
 
       if (TARGET_BWX && (alignofs & 1) && bytes >= 1)
@@ -4372,7 +4872,11 @@ alpha_expand_block_clear (rtx operands[]
     {
       words = bytes / 8;
 
-      alpha_expand_unaligned_store_words (NULL, orig_dst, words, ofs);
+      if (TARGET_SAFE_PARTIAL)
+       alpha_expand_unaligned_store_words_safe_partial (NULL, orig_dst,
+                                                        words, ofs, align);
+      else
+       alpha_expand_unaligned_store_words (NULL, orig_dst, words, ofs);
 
       bytes -= words * 8;
       ofs += words * 8;
@@ -4389,7 +4893,7 @@ alpha_expand_block_clear (rtx operands[]
 
   /* If we have appropriate alignment (and it wouldn't take too many
      instructions otherwise), mask out the bytes we need.  */
-  if ((TARGET_BWX ? words > 2 : bytes > 0)
+  if ((TARGET_BWX ? !TARGET_SAFE_PARTIAL && words > 2 : bytes > 0)
       && (align >= 64 || (align >= 32 && bytes < 4)))
     {
       machine_mode mode = (align >= 64 ? DImode : SImode);
@@ -4401,18 +4905,46 @@ alpha_expand_block_clear (rtx operands[]
 
       mask = HOST_WIDE_INT_M1U << (bytes * 8);
 
-      tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask),
-                         NULL_RTX, 1, OPTAB_WIDEN);
+      if (TARGET_SAFE_PARTIAL)
+       alpha_expand_clear_safe_partial_nobwx (mem, mode, mask);
+      else
+       {
+         tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask),
+                             NULL_RTX, 1, OPTAB_WIDEN);
 
-      emit_move_insn (mem, tmp);
+         emit_move_insn (mem, tmp);
+       }
       return 1;
     }
 
-  if (!TARGET_BWX && bytes >= 4)
+  if (bytes >= 4)
     {
-      alpha_expand_unaligned_store (orig_dst, const0_rtx, 4, ofs);
-      bytes -= 4;
-      ofs += 4;
+      if (align >= 32)
+       do
+         {
+           emit_move_insn (adjust_address (orig_dst, SImode, ofs),
+                           const0_rtx);
+           bytes -= 4;
+           ofs += 4;
+         }
+       while (bytes >= 4);
+      else if (!TARGET_BWX)
+       {
+         gcc_assert (bytes < 8);
+         if (TARGET_SAFE_PARTIAL)
+           {
+             alpha_expand_unaligned_store_safe_partial (orig_dst, const0_rtx,
+                                                        bytes, ofs, align);
+             ofs += bytes;
+             bytes = 0;
+           }
+         else
+           {
+             alpha_expand_unaligned_store (orig_dst, const0_rtx, 4, ofs);
+             bytes -= 4;
+             ofs += 4;
+           }
+       }
     }
 
   if (bytes >= 2)
@@ -4428,18 +4960,38 @@ alpha_expand_block_clear (rtx operands[]
        }
       else if (! TARGET_BWX)
        {
-         alpha_expand_unaligned_store (orig_dst, const0_rtx, 2, ofs);
-         bytes -= 2;
-         ofs += 2;
+         gcc_assert (bytes < 4);
+         if (TARGET_SAFE_PARTIAL)
+           {
+             alpha_expand_unaligned_store_safe_partial (orig_dst, const0_rtx,
+                                                        bytes, ofs, align);
+             ofs += bytes;
+             bytes = 0;
+           }
+         else
+           {
+             alpha_expand_unaligned_store (orig_dst, const0_rtx, 2, ofs);
+             bytes -= 2;
+             ofs += 2;
+           }
        }
     }
 
   while (bytes > 0)
-    {
-      emit_move_insn (adjust_address (orig_dst, QImode, ofs), const0_rtx);
-      bytes -= 1;
-      ofs += 1;
-    }
+    if (TARGET_BWX || !TARGET_SAFE_PARTIAL)
+      {
+       emit_move_insn (adjust_address (orig_dst, QImode, ofs), const0_rtx);
+       bytes -= 1;
+       ofs += 1;
+      }
+    else
+      {
+       gcc_assert (bytes < 2);
+       alpha_expand_unaligned_store_safe_partial (orig_dst, const0_rtx,
+                                                  bytes, ofs, align);
+       ofs += bytes;
+       bytes = 0;
+      }
 
   return 1;
 }
Index: gcc/gcc/config/alpha/alpha.md
===================================================================
--- gcc.orig/gcc/config/alpha/alpha.md
+++ gcc/gcc/config/alpha/alpha.md
@@ -4781,9 +4781,15 @@
          && INTVAL (operands[1]) != 64))
     FAIL;
 
-  alpha_expand_unaligned_store (operands[0], operands[3],
-                               INTVAL (operands[1]) / 8,
-                               INTVAL (operands[2]) / 8);
+  if (TARGET_SAFE_PARTIAL)
+    alpha_expand_unaligned_store_safe_partial (operands[0], operands[3],
+                                              INTVAL (operands[1]) / 8,
+                                              INTVAL (operands[2]) / 8,
+                                              BITS_PER_UNIT);
+  else
+    alpha_expand_unaligned_store (operands[0], operands[3],
+                                 INTVAL (operands[1]) / 8,
+                                 INTVAL (operands[2]) / 8);
   DONE;
 })
 
Index: gcc/gcc/config/alpha/alpha.opt
===================================================================
--- gcc.orig/gcc/config/alpha/alpha.opt
+++ gcc/gcc/config/alpha/alpha.opt
@@ -73,6 +73,10 @@ msafe-bwa
 Target Mask(SAFE_BWA)
 Emit multi-thread and async-signal safe code for byte and word memory accesses.
 
+msafe-partial
+Target Mask(SAFE_PARTIAL)
+Emit multi-thread and async-signal safe code for partial memory accesses.
+
 mexplicit-relocs
 Target Mask(EXPLICIT_RELOCS)
 Emit code using explicit relocation directives.
Index: gcc/gcc/config/alpha/alpha.opt.urls
===================================================================
--- gcc.orig/gcc/config/alpha/alpha.opt.urls
+++ gcc/gcc/config/alpha/alpha.opt.urls
@@ -38,6 +38,9 @@ UrlSuffix(gcc/DEC-Alpha-Options.html#ind
 msafe-bwa
 UrlSuffix(gcc/DEC-Alpha-Options.html#index-msafe-bwa)
 
+msafe-partial
+UrlSuffix(gcc/DEC-Alpha-Options.html#index-msafe-partial)
+
 mexplicit-relocs
 UrlSuffix(gcc/DEC-Alpha-Options.html#index-mexplicit-relocs)
 
Index: gcc/gcc/doc/invoke.texi
===================================================================
--- gcc.orig/gcc/doc/invoke.texi
+++ gcc/gcc/doc/invoke.texi
@@ -976,7 +976,7 @@ Objective-C and Objective-C++ Dialects}.
 -mtrap-precision=@var{mode}  -mbuild-constants
 -mcpu=@var{cpu-type}  -mtune=@var{cpu-type}
 -mbwx  -mmax  -mfix  -mcix
--msafe-bwa
+-msafe-bwa  -msafe-partial
 -mfloat-vax  -mfloat-ieee
 -mexplicit-relocs  -msmall-data  -mlarge-data
 -msmall-text  -mlarge-text
@@ -25700,6 +25700,16 @@ Indicate whether in the absence of the o
 GCC should generate multi-thread and async-signal safe code for byte
 and aligned word memory accesses.
 
+@opindex msafe-partial
+@opindex mno-safe-partial
+@item -msafe-partial
+@itemx -mno-safe-partial
+Indicate whether GCC should generate multi-thread and async-signal
+safe code for partial memory accesses, including piecemeal accesses
+to unaligned data as well as block accesses to leading and trailing
+parts of aggregate types or other objects in memory that do not
+respectively start and end on an aligned 64-bit data boundary.
+
 @opindex mfloat-vax
 @opindex mfloat-ieee
 @item -mfloat-vax
Index: gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mbwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "memclr-a2-o1-c9-ptr.c"
+
+/* Expect assembly such as:
+
+       stb $31,1($16)
+       stw $31,2($16)
+       stw $31,4($16)
+       stw $31,6($16)
+       stw $31,8($16)
+
+   that is with a byte store at offset 1, followed by word stores at
+   offsets 2, 4, 6, and 8.  */
+
+/* { dg-final { scan-assembler-times "\\sstb\\s\\\$31,1\\\(\\\$16\\\)\\s" 1 } 
} */
+/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,2\\\(\\\$16\\\)\\s" 1 } 
} */
+/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,4\\\(\\\$16\\\)\\s" 1 } 
} */
+/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,6\\\(\\\$16\\\)\\s" 1 } 
} */
+/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,8\\\(\\\$16\\\)\\s" 1 } 
} */
Index: gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c
+++ gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mbwx" } */
+/* { dg-options "-mbwx -mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 typedef unsigned int __attribute__ ((mode (QI))) int08_t;
Index: 
gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c
===================================================================
--- /dev/null
+++ 
gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-msafe-partial -mbwx" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "memcpy-di-unaligned-dst.c"
+
+/* { dg-final { scan-assembler-times "\\sldq\\s" 7 } } */
+/* { dg-final { scan-assembler-times "\\sstb\\s" 16 } } */
+/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */
+/* { dg-final { scan-assembler-not "\\sldq_l\\s" } } */
+/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */
+/* { dg-final { scan-assembler-not "\\sstq\\s" } } */
+/* { dg-final { scan-assembler-not "\\sstq_c\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msafe-partial -mno-bwx" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "memcpy-di-unaligned-dst.c"
+
+/* { dg-final { scan-assembler-times "\\sldq\\s" 7 } } */
+/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */
+/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */
+/* { dg-final { scan-assembler-not "\\sstq\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c
+++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "" } */
+/* { dg-options "-mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 unsigned long unaligned_src_di[9] = { [0 ... 8] = 0xfefdfcfbfaf9f8f7 };
Index: 
gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c
===================================================================
--- /dev/null
+++ 
gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-msafe-partial -mbwx" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "memcpy-si-unaligned-dst.c"
+
+/* { dg-final { scan-assembler-times "\\sldl\\s" 15 } } */
+/* { dg-final { scan-assembler-times "\\sstb\\s" 20 } } */
+/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */
+/* { dg-final { scan-assembler-not "\\sldq_l\\s" } } */
+/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */
+/* { dg-final { scan-assembler-not "\\sstl\\s" } } */
+/* { dg-final { scan-assembler-not "\\sstq_c\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msafe-partial -mno-bwx" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "memcpy-si-unaligned-dst.c"
+
+/* { dg-final { scan-assembler-times "\\sldl\\s" 15 } } */
+/* { dg-final { scan-assembler-times "\\sldq_l\\s" 4 } } */
+/* { dg-final { scan-assembler-times "\\sstq_c\\s" 4 } } */
+/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */
+/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */
+/* { dg-final { scan-assembler-not "\\sstl\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c
+++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "" } */
+/* { dg-options "-mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 unsigned int unaligned_src_si[17] = { [0 ... 16] = 0xfefdfcfb };
Index: gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial-bwx.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial-bwx.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mbwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "stlx0.c"
+
+/* Expect assembly such as:
+
+       stb $31,0($16)
+       stb $31,1($16)
+       stb $31,2($16)
+       stb $31,3($16)
+
+   without any LDQ_U or STQ_U instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sstb\\s" 4 } } */
+/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-bwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "stlx0.c"
+
+/* Expect assembly such as:
+
+       lda $2,3($16)
+       bic $2,7,$2
+$L2:
+       ldq_l $1,0($2)
+       msklh $1,$16,$1
+       stq_c $1,0($2)
+       beq $1,$L2
+       bic $16,7,$2
+$L3:
+       ldq_l $1,0($2)
+       mskll $1,$16,$1
+       stq_c $1,0($2)
+       beq $1,$L3
+
+   without any INSLH, INSLL, BIS, LDQ_U, or STQ_U instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\smsklh\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\smskll\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:bis|inslh|insll|ldq_u|stq_u)\\s" } } 
*/
Index: gcc/gcc/testsuite/gcc.target/alpha/stlx0.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/stlx0.c
+++ gcc/gcc/testsuite/gcc.target/alpha/stlx0.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "" } */
+/* { dg-options "-mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 typedef struct { int v __attribute__ ((packed)); } intx;
Index: gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial-bwx.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial-bwx.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mbwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "stqx0.c"
+
+/* Expect assembly such as:
+
+       stb $31,0($16)
+       stb $31,1($16)
+       stb $31,2($16)
+       stb $31,3($16)
+       stb $31,4($16)
+       stb $31,5($16)
+       stb $31,6($16)
+       stb $31,7($16)
+
+   without any LDQ_U or STQ_U instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sstb\\s" 8 } } */
+/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-bwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "stqx0.c"
+
+/* Expect assembly such as:
+
+       lda $2,7($16)
+       bic $2,7,$2
+$L2:
+       ldq_l $1,0($2)
+       mskqh $1,$16,$1
+       stq_c $1,0($2)
+       beq $1,$L2
+       bic $16,7,$2
+$L3:
+       ldq_l $1,0($2)
+       mskql $1,$16,$1
+       stq_c $1,0($2)
+       beq $1,$L3
+
+   without any INSLH, INSLL, BIS, LDQ_U, or STQ_U instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\smskqh\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\smskql\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:bis|insqh|insql|ldq_u|stq_u)\\s" } } 
*/
Index: gcc/gcc/testsuite/gcc.target/alpha/stqx0.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/stqx0.c
+++ gcc/gcc/testsuite/gcc.target/alpha/stqx0.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "" } */
+/* { dg-options "-mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 typedef struct { long v __attribute__ ((packed)); } longx;
Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0-bwx.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/stwx0-bwx.c
+++ gcc/gcc/testsuite/gcc.target/alpha/stwx0-bwx.c
@@ -1,19 +1,15 @@
 /* { dg-do compile } */
-/* { dg-options "-mbwx" } */
+/* { dg-options "-mbwx -mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
-typedef struct { short v __attribute__ ((packed)); } shortx;
-
-void
-stwx0 (shortx *p)
-{
-  p->v = 0;
-}
+#include "stwx0.c"
 
 /* Expect assembly such as:
 
         stb $31,0($16)
         stb $31,1($16)
- */
+
+   without any LDQ_U or STQ_U instructions.  */
 
 /* { dg-final { scan-assembler-times "\\sstb\\s\\\$31," 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial-bwx.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial-bwx.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mbwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "stwx0.c"
+
+/* Expect assembly such as:
+
+        stb $31,0($16)
+        stb $31,1($16)
+
+   without any LDQ_U or STQ_U instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sstb\\s\\\$31," 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-bwx -msafe-partial" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#include "stwx0.c"
+
+/* Expect assembly such as:
+
+       lda $2,1($16)
+       bic $2,7,$2
+$L2:
+       ldq_l $1,0($2)
+       mskwh $1,$16,$1
+       stq_c $1,0($2)
+       beq $1,$L2
+       bic $16,7,$2
+$L3:
+       ldq_l $1,0($2)
+       mskwl $1,$16,$1
+       stq_c $1,0($2)
+       beq $1,$L3
+
+   without any INSWH, INSWL, BIS, LDQ_U, or STQ_U instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\smskwh\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\smskwl\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:bis|inswh|inswl|ldq_u|stq_u)\\s" } } 
*/
Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/alpha/stwx0.c
+++ gcc/gcc/testsuite/gcc.target/alpha/stwx0.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mno-bwx" } */
+/* { dg-options "-mno-bwx -mno-safe-partial" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 typedef struct { short v __attribute__ ((packed)); } shortx;

Reply via email to