Some time ago I had a private conversation with Sean on porting his avr-gcc fixed-point support to mainline avr-gcc 4.7.
During the conversation there were several draft versions of the work against 4.7, but suddenly the conversation stopped and the fixed-point support ended up nowhere... Hope that nothing serious happened. Because I think it's a pity if all that work is lost, I decided to post the latest version of the patch from 2012-12-13 together with the according email. Johann On Wed, Dec 07, 2011 at 06:08:19PM +0100, Georg-Johann Lay wrote:
There are some missing, I attached a patch atop of your patch that
I applied it to my changes, but I could not accept a few minor changes to avr-fixed.md @@ -43,25 +42,29 @@ (define_insn "fract<FIXED2:mode><FIXED1:mode>2" [(set (match_operand:FIXED1 0 "register_operand" "=r") (fract_convert:FIXED1 (match_operand:FIXED2 1 "register_operand" "r")))] - "" + "<FIXED1:MODE>mode != <FIXED2:MODE>mode + && (!SCALAR_INT_MODE_P (<FIXED1:MODE>mode) + || !SCALAR_INT_MODE_P (<FIXED2:MODE>mode))" { What is the purpose of this logic? I think it is redundant within gcc already. @@ -267,8 +270,8 @@ ;; "*mulsa3_call" "*mulusa3_call" (define_insn "*mul<mode>3_call" [(set (reg:ALLSA 14) (mult:ALLSA (reg:ALLSA 18) (reg:ALLSA 24))) - (clobber (reg:ALLSA 18)) - (clobber (reg:ALLSA 24)) + (clobber (reg:SI 18)) + (clobber (reg:SI 24)) (clobber (reg:HI 22))] "!AVR_HAVE_MUL" "%~call __mul<mode>3" This change caused a new internal compiler error with my test code, so I left it as ALLSA.
Have a look at the new %T as documented at top of avr.md. I tried to rewrite avr_fract_out and got lost in the function. That function need definitely cleanup, to be disentangled andbetter documented!
Take a look at the new latest patch. I made it from newest svn gcc with hopefully all the new updates. I got rid of all the sprintf and did some cleanup in fract_out.
It is a potential source of errors and hard to understand and maintain in its current style. For example, with blen[i] = GET_MODE_BITSIZE (GET_MODE (operands[i])): xop[2] = GEN_INT (blen[0] - 1); xop[3] = GEN_INT (blen[1] - 1); /* Store the sign bit if the destination is a signed fract and the source has a sign in the integer part. */ if (sbit[0] && !ilen[0] && sbit[1] && ilen[1]) { /* To avoid using BST and BLD if the source and destination registers overlap we can use a single LSL since we don't care about preserving the source register. */ if (rdest < rsource + tlen[1] && rdest + tlen[0] > rsource) { avr_asm_len ("lsl %T1%t3", xop, plen, 1); sign_in_Carry = true; } else { avr_asm_len ("bst %T1%T3", xop, plen, 1); sign_in_T = true; } } But I got lost in the remainder of the routine and gave up, so it's in a mess with my patch... Document what the routine does and which part handles what bytes, what is already finished and what is still to be done. Moreover, it appears you do sprintf to buf[] but never actually output it in some places? It's just remark on my personal preference. One more thing: Don't use magic comments like ;; -*- Mode: Scheme -*-
I forgot about that!
in avr-fixed.md! If you like emacs, you can bind extension(s) to major-mode like so (provided you have support for md-mode): (setq auto-mode-alist (cons '("\\.md$" . md-mode) auto-mode-alist)) with the additional benefit that *any* md-file you open will be in the right mode. More:
Ok, perfect I think I get the same effect as my magic comment.
(setq auto-mode-alist (cons '("[a-z\-]+\\.def$" . c-mode) auto-mode-alist)) (setq auto-mode-alist (cons '("Makefile\\.def$" . autoconf-mode) auto-mode-alist)) (setq auto-mode-alist (cons '("config.gcc$" . sh-mode) auto-mode-alist)) (setq auto-mode-alist (cons '("\\.opt$" . md-mode) auto-mode-alist)) (setq auto-mode-alist (cons '(".*\\bt\\-[a-zA-Z0-9\-]+\\'" . makefile-mode) auto-mode-alist)) (setq auto-mode-alist (cons '("ChangeLog.*$" . change-log-mode) auto-mode-alist)) One more non-gcc related hint: You can set diff-cmd in your .subversion/config: ### This file configures various client-side behaviors. ... ### Section for configuring external helper applications. [helpers] ... diff-cmd=.../svndiff and the svndiff: #!/bin/sh diff=/usr/bin/diff extra_args="-F^(define" # Do not pass multiple -c/-u options case "$1" in -c*|-C*) exec ${diff} -p ${extra_args} "$@" ;; -u*|-U*) exec ${diff} -p ${extra_args} "$@" ;; *) exec ${diff} -up ${extra_args} "$@" ;; esac That way you get context of changes in diff for insns, expanders, peepholes, etc. (any line starting with (define starts a new context :-))
Wow, I have to try it!
Suppose addsa3 insn. It has 2 constraint alternatives: "r,0,r" and "d,0,i". If $2 is CONST, it comes to reload and there is no more d-register, reload has to satisfy the constraints in some way. As there is only one alternative that supports "r" in $0, it picks that constraint and reloads $2 to some register (which is always possible). That way, you need an SAmode register to load the constant which is 4 bytes. This increases register pressure which might lead to more spills/stack space and is suboptimal with respect to the generated instructions. Suppose you add 0xffff0001, then in the better case where the reload reg is r-reg: LDI r20, 0x01 ; movsa LDI r21, 0x00 LDI r22, 0xff LDI r23, 0xff ADD r10, r20 ADC r11, r21 ADC r12, r22 ADC r13, r23 and in the case where the reload reg is a l-reg: LDI r30, 0x01 ; movsa mov r6, r30 mov r7, __zero_reg__ LDI r30, 0xff mov r8, r30 mov r9, r30 ADD r10, r6 ADC r11, r7 ADC r12, r8 ADC r13, r9 With a QI clobber and using avr_out_plus_1, however, you get: LDI r30, 0xff sub r10, r30 sbc r11, r30 sbc r12, __zero_reg sbc r13, __zero_reg Notice that you can use avr_out_plus_1 to find out if adding is better or subtracting the negated constant. I'd suppose you don't include this in your initial patch. Better do tweaks in a follow-up patch
I agree. There are many other optimizations besides this one.
Wouldn't it be nice if we could just describe every possible opcode to gcc and what it did, then have it sit there and figure out the best way to do everything? Then it wouldn't be any extra work to support new types.The problem is too hard, many NP-complete problems all over the place. Therefore, GCC uses several passes to narrow down a good solution step-by-step. Notice that in GCC no algorithms with quadratic or even worse complexitx are accepted.
Yes, because it would be working backwards. I wrote a program which compounded assembly instructions and could convert sequences to shorter sequences. I got to about 5 instructions deep on the avr before the complexity was too great. It isn't really an easy problem, but I believe it is possible. Maybe a program which could generate rtl from an instruction set. It might take a while to accomplish this, but it's ok because it only needs to run once in a while.
Are add/sub insns really needed because they are not mapped to qi/hi/si if no insn is available? * If you are more convenient with non-ABI interface then just do it. I even dared to change the ABI for 32 bit multiplication to pass values in R26. If you are using FMUL* it might be better to use non-ABI interface to avoid moves.I only use FMUL on the fractional types with sign bits.== Potential Problems/Bugs == * You must provide ChangeLogI will add that once the patch is finalized. Is it part of the patch or a separate file?Don't include them in the patch, just write it as text and include it in the mail you send for review, see http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01820.html At the end of the mail (prior to the attachment) there is the ChangeLog as it will be used as commit message. This text is inserted into respective ChangeLog files after approval and committed as one change-set, see the respective commit (message) and changes to respective ChangeLogs: http://gcc.gnu.org/viewcvs?view=revision&revision=181482* You intend to supply test cases for test suite, too? Or is the overall fixed regression tests in the suite enough?I did not intend to supply any test cases, but it would be nice. I don't think the fixed regression tests are sufficient, but I need to take another look.There are still more problems: If there is no move insn, there must not be insns that might require respective reloads. Example: If you write an adddi3 insn to add 8-byte integers (an insn which would be more or less straight forward), you'd also need to supply movdi (writing it would by no means be straight forward). This is the reason why here is no native DImode support in avr-gcc: movdi is just too painful to write. It's similar with DAmode or TAmode. You must not write insns that might request respective reloads: A DA pseude might need 8-byte reload which is nowhere in avr.md. If reload needs to reload such a value, it need respective insn to accomplish the reload. So [U]DA, [U]TA in FIXED1/2 mode iterator are not appropriate.
Actually these modes should still work. I'm not too sure about the correctness of 128bit TAmode, because I tried to use it and it loaded the constant wrong. DAmode seems fine. It has compiled c code to support it so it would be rather slow, but it should still work. The saturating types are also probably supported with c routines. Maybe supporting saturating types isn't that hard.
* How is insn length adjusted for "fract_out"? See how it works for, e.g. $ grep -i addto_sp avr.* and observe how addto_sp from adjust_len attribute is used in avr.c:adjust_insn_length().I did an outline of this in the patch-atop-patch.
Sean
With this func, 4 extra mov instructions are emmited that need not be generated. accum func(short accum y) { return y; } func: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 ; (insn 6 3 31 (set (reg:SA 24 r24 [45]) ; (fract_convert:SA (reg:HA 24 r24 [ y ]))) test.c:7 208 {fracthasa2} ; (nil)) clr r27 ; 6 fracthasa2 [length = 6] mov r26, r25 mov r25, r24 clr r24 sbrc r26,7 com r27 ; (insn 31 6 32 (set (reg:QI 22 r22 [55]) ; (reg:QI 24 r24 [orig:51 <retval> ] [51])) test.c:8 899 {movqi_insn} ; (expr_list:REG_DEAD (reg:QI 21 r21 [orig:51 <retval> ] [51]) ; (nil))) mov r22,r24 ; 31 movqi_insn/1 [length = 1] ; (insn 32 31 33 (set (reg:QI 23 r23 [+1 ]) ; (reg:QI 25 r25 [orig:52 <retval>+1 ] [52])) test.c:8 899 {movqi_insn} ; (expr_list:REG_DEAD (reg:QI 20 r20 [orig:52 <retval>+1 ] [52]) ; (nil))) mov r23,r25 ; 32 movqi_insn/1 [length = 1] ; (insn 33 32 34 (set (reg:QI 24 r24 [+2 ]) ; (reg:QI 26 r26 [orig:53 <retval>+2 ] [53])) test.c:8 899 {movqi_insn} ; (expr_list:REG_DEAD (reg:QI 19 r19 [orig:53 <retval>+2 ] [53]) ; (nil))) mov r24,r26 ; 33 movqi_insn/1 [length = 1] ; (insn 34 33 14 (set (reg:QI 25 r25 [+3 ]) ; (reg:QI 27 r27 [orig:54 <retval>+3 ] [54])) test.c:8 899 {movqi_insn} ; (expr_list:REG_DEAD (reg:QI 18 r18 [orig:54 <retval>+3 ] [54]) ; (nil))) mov r25,r27 ; 34 movqi_insn/1 [length = 1] ; (jump_insn 39 14 38 (return) test.c:8 1231 {return} ; (nil) ; -> return) ret ; 39 return [length = 1] Maybe because gcc has some alignment constraints and can't just ask fract_out for the accum in the return registers instead of some other register group which it then copies to the return registers. other way around, no problem. short accum func(accum y) { return y; } func: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 ; (insn 6 3 14 (set (reg:HA 24 r24 [45]) ; (fract_convert:HA (reg/v:SA 22 r22 [orig:44 y ] [44]))) test.c:7 170 {fractsaha2} ; (expr_list:REG_DEAD (reg:QI 23 r23) ; (expr_list:REG_DEAD (reg:QI 22 r22) ; (nil)))) mov r25, r24 ; 6 fractsaha2 [length = 2] mov r24, r23 ; (jump_insn 31 14 30 (return) test.c:8 1231 {return} ; (nil) ; -> return) ret ; 31 return [length = 1]
avr-gcc-fixedpoint-13-12-2011.diff.bz2
Description: Binary data
_______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list