On 8 November 2014 00:32, Szikra István <steven.sp...@gmail.com> wrote:
> Hi everyone!
>
> My problem in sort: I’m getting
>         in      r24, 0x18
>         ldi     r25, 0x00
>         andi    r24, 0xEF
>         out     0x18, r24
> instead of
>         cbi     0x18, 4
> .
>
> I’m trying to write efficient modern C/C++ code for multiple platforms
> including AVR 8 bit controllers.
>
> Unfortunately GCC support for AVR (among other things) is not always
> flawless. And it changes from versions to version (and not always for the
> better).
> Since I’m a long time AVR developer I have a lot of compiler versions
> installed (WinAVR 2006-2010, and Atmel Studio 6.2 with GCC 4.8.1), but I
> could test my code with only one (the latest).
>
> I run into some trouble with clearing port bits not translating from C into
> cbi in assembler. It is caused by my bits template, but I do not know why.
> It seems to me like a bug in GCC. Maybe someone here can shed some light on
> the reason, or suggest a fix.

The transformation would seem prima facia best suited for the combine pass.
When we see the store, we know it's only 8 bit wide, thus we can perform the
arithmetic in 8 bit.  However, the store is to a volatile (actually,
I/O) address.
Now, gcc is not very good at optimizing code involving volatile (the
target-independent
code would likely already have simplified the arithmetic if these
weren't in the way -
buy you'd really need a completely different test case without
volatile to keep the
computation relevant).
These memory / I/O accesses require extra ordering constraints, so
large parts of the optimizers just punt when they encounter a volatile
reference.
There is some scope to tweak this - I've attached at proof of concept
patch to optimize
your code (based on gcc 5.0).  However, this opens the possibility
that it'll break something with respect to volatile ordering now - or
even later down the line.

ISTR we have some more detailed dependency checks at least in some places. in
fact, if there weren't, we should already see the existing cbi pattern
misbehaving.
consider:
typedef unsigned char uint8_t;

main ()
{
int i = (*(volatile uint8_t *)((0x18) + 0x20)) ;
(*(volatile uint8_t *)((0x17) + 0x20)) = 0x0f;
(*(volatile uint8_t *)((0x18) + 0x20))  = i & ~4;
}
if the complier was oblivious to the intervening write it could
generate a cbi here -
but it doesn't - well, at least not at -O2 ...

Another possible approach would be to use a peephole2 pattern or a
target-specific
optimization pass to do the 16->8 bit arithmetic transformation.
However, if you work after combine (as is definitely the case with peephole2),
you need to
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index cc85b79..6e7e353 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -679,8 +679,8 @@ (define_expand "mov<mode>"
 ;; "movqi_insn"
 ;; "movqq_insn" "movuqq_insn"
 (define_insn "mov<mode>_insn"
-  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r    ,d    ,Qm   ,r 
,q,r,*r")
-        (match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
+  [(set (match_operand:ALL1 0 "movdst_operand"     "=r    ,d    ,Qm   ,r 
,q,r,*r")
+        (match_operand:ALL1 1 "nox_general_operand" "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
   "(register_operand (operands[0], <MODE>mode)
    || reg_or_0_operand (operands[1], <MODE>mode)) &&
    /* skip if operands are out of lds/sts memory access range(0x40..0xbf)
@@ -3086,6 +3086,20 @@ (define_peephole2 ; andi
     operands[1] = GEN_INT (INTVAL (operands[1]) & INTVAL (operands[2]));
   })
 
+; standard promotion can get us a 16 bit int where only an 8 bit value
+; is required.  combine-split extend-binop-store to binop-store.
+
+(define_split
+  [(set (match_operand:QI 0 "any_qi_mem_operand")
+       (match_operator 3 "avr_binary_operator"
+        [(match_operand 1 "nonmemory_operand")
+         (match_operand 2 "nonmemory_operand")]))
+   (clobber (match_operand:QI 4 "register_operand"))]
+  ""
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (match_dup 4))]
+  "");
+
 ;;|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 ;; ior
 
diff --git a/gcc/config/avr/predicates.md b/gcc/config/avr/predicates.md
index 4b456a5..ea8e095 100644
--- a/gcc/config/avr/predicates.md
+++ b/gcc/config/avr/predicates.md
@@ -69,6 +69,24 @@ (define_predicate "nop_general_operand"
   (and (match_operand 0 "general_operand")
        (match_test "!avr_mem_flash_p (op)")))
 
+; allow a QImode memory operand, including volatile / I/O.
+(define_predicate "any_qi_mem_operand"
+  (and (match_code "mem")
+       (match_test "address_operand (XEXP (op, 0), Pmode)")
+       (match_test "mode == QImode")))
+
+;; We also allow volatile memory here, so that the combiner can work with
+;; stores for I/O accesses as combiner-splitter results.
+;; As this is for a destination location, that should generally be safe
+;; for combine, as we should he handling the last insn then, which stays
+;; in place.
+;; ??? Could this ever match for a two-destination combination?
+;; ??? Should we make sure this predicate can't match I/O for transformations
+;; with a wider scope, like store sinking?
+(define_predicate "movdst_operand"
+  (ior (match_operand 0 "nonimmediate_operand")
+       (match_operand 0 "any_qi_mem_operand")))
+
 ;; Return 1 if OP is an "ordinary" general operand, i.e. a general
 ;; operand whose load is not handled by a libgcc call or ELPM.
 (define_predicate "nox_general_operand"
@@ -180,6 +198,10 @@ (define_predicate "simple_comparison_operator"
   (and (match_operand 0 "comparison_operator")
        (not (match_code "gt,gtu,le,leu"))))
 
+; A binary operator that the avr can perform in a single insn.
+(define_predicate "avr_binary_operator"
+  (match_code "and,ior,xor,plus,minus"))
+
 ;; Return true if OP is a valid call operand.
 (define_predicate "call_insn_operand"
   (and (match_code "mem")
_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Reply via email to