Hi,

This is the first of two patches to add unaligned-access support to the
ARM backend. This is done somewhat differently to Jie Zhang's earlier
patch:

  http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01890.html

In that with Jie's patch, *any* pointer dereference would be allowed to
access unaligned data. This has the undesirable side-effect of
disallowing instructions which don't support unaligned accesses (LDRD,
LDM etc.) when unaligned accesses are enabled.

Instead, this patch enables only packed-structure accesses to use
ldr/str/ldrh/strh, by taking a hint from the MIPS ldl/ldr
implementation. I figured the unaligned-access ARM case is kind of
similar to those, except that normal loads/stores are used, and the
shifting/merging happens in hardware.

The standard names extv/extzv/insv can take a memory
operand for the source/destination of the extract/insert operation, so
we just expand to unspec'ed versions of the load and store operations
when unaligned-access support is enabled: the benefit of doing that
rather than, say, expanding using the regular movsi pattern is that we
bypass any smartness in the compiler which might replace operations
which work for unaligned accesses (ldr/str/ldrh/strh) with operations
which don't work (ldrd/strd/ldm/stm/vldr/...). The downside is we might
potentially miss out on optimization opportunities (since these things
no longer look like plain memory accesses).

Doing things this way allows us to leave the settings for
STRICT_ALIGNMENT/SLOW_BYTE_ACCESS alone, avoiding the disruption that
changing them might cause.

The most awkward change in the patch is to generic code (expmed.c,
{store,extract}_bit_field_1): in big-endian mode, the existing behaviour
(when inserting/extracting a bitfield to a memory location) is
definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations,
and if bitsize (the size of the field to insert/extract) is greater than
BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative.
That can't possibly be intentional; I can only assume that this code
path is not exercised for machines which have memory alternatives for
bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN
mode.

The logic for choosing when to enable the unaligned-access support (and
the name of the option to override the default behaviour) is lifted from
Jie's patch.

Tested with cross to ARM Linux, and (on a branch) in both little &
big-endian mode cross to ARM EABI, with no regressions. OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
    memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.
commit e76508ff702406fd63bc59465d9c7ab70dcb3266
Author: Julian Brown <jul...@henry7.codesourcery.com>
Date:   Wed May 4 10:06:25 2011 -0700

    Permit regular ldr/str/ldrh/strh for packed-structure accesses etc.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f9c2aa..a18aea6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1833,6 +1833,22 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 40ebf35..7d37445 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -104,6 +104,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2393,7 +2397,7 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
+  [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "")
                          (match_operand:SI 1 "general_operand" "")
                          (match_operand:SI 2 "general_operand" ""))
         (match_operand:SI 3 "reg_or_int_operand" ""))]
@@ -2407,35 +2411,66 @@
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
-
-	    if (val == 0)
+	    rtx base_addr;
+	    
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3628,7 +3663,7 @@
 
 (define_expand "extzv"
   [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
+	(ashift:SI (match_operand:SI   1 "nonimmediate_operand" "")
 		   (match_operand:SI   2 "const_int_operand" "")))
    (set (match_operand:SI              0 "register_operand" "")
 	(lshiftrt:SI (match_dup 4)
@@ -3641,10 +3676,53 @@
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3659,7 +3737,103 @@
   }"
 )
 
-(define_insn "extv"
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT) == 0)
+    {
+      rtx base_addr;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+    DONE;
+  else
+    FAIL;
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(sign_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")]
+				   UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(zero_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")]
+				   UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 7d2d84c..3ed9016 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -170,3 +170,7 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 7482747..a525184 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -648,7 +648,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
 
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* We have been counting XBITPOS within UNIT.
@@ -1456,7 +1456,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */

Reply via email to