Re: SH optimized software floating point routines

2010-07-22 Thread Christian Bruel

Christian Bruel wrote:

Hi Kaz,

Kaz Kojima wrote:



BTW, it looks that softfp __unord?f2 routines check signaling NaNs
only.  This makes __builtin_isnan return false for quiet NaNs for
which current fp-bit ones return true when -mieee enabled.  Perhaps
that change of behavior might be OK for software FP.


I use the attached patch to handle the QNaNs in the assembly solf-fp. 
Need to be updated for trunk (and update the dates in changelogs). Will do.


Edited to apply on top of latest Joern's patch. Certainly not optimal 
but it fixes the QNaNs checks for builtins and inlined unordered 
comparisons for -mieee or -fno-inite-math-only.


Best Regards

Christian


2010-07-22  Christian Bruel  

* gcc.dg/builtins-nan.c: New test.

2010-07-22  Christian Bruel  

* config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs.
* config/sh/ieee-754-sf.S (nesf2f): Likewise.
* config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. 
(cmpnesf_i1, cmpnedf_i1): Clobber R2.

diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN 
gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S 
gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S   2010-07-21 
18:04:17.0 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S   2010-07-21 18:09:10.0 
+0200
@@ -92,11 +92,12 @@
HIDDEN_FUNC(GLOBAL(nedf2))
 GLOBAL(nedf2):
cmp/eq  DBL0L,DBL1L
-   mov.l   LOCAL(c_DF_NAN_MASK),r1
-   bf LOCAL(ne)
+   bf.sLOCAL(ne)
+   mov #1,r0
cmp/eq  DBL0H,DBL1H
+   mov.l   LOCAL(c_DF_NAN_MASK),r1
+   bt.sLOCAL(check_nan)
not DBL0H,r0
-   bt  LOCAL(check_nan)
mov DBL0H,r0
or  DBL1H,r0
add r0,r0
@@ -104,11 +105,17 @@
or  DBL0L,r0
 LOCAL(check_nan):
tst r1,r0
-   rts
+   bt.sLOCAL(nan)
+   mov #12,r2
+   shll16  r2
+   xor r2,r1
+   tst r1,r0
+LOCAL(nan):
movtr0
 LOCAL(ne):
rts
-   mov #1,r0
+   nop
+   
.balign 4
 LOCAL(c_DF_NAN_MASK):
.long DF_NAN_MASK
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN 
gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S 
gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S   2010-07-21 
18:04:18.0 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S   2010-07-21 18:09:10.0 
+0200
@@ -51,13 +51,19 @@
cmp/eq  r4,r5
mov.l   LOCAL(c_SF_NAN_MASK),r1
not r4,r0
-   bt  LOCAL(check_nan)
+   bt.sLOCAL(check_nan)
mov r4,r0
or  r5,r0
rts
add r0,r0
 LOCAL(check_nan):
tst r1,r0
+   bt.sLOCAL(nan)
+   mov #96,r2
+   shll16  r2
+   xor r2,r1
+   tst r1,r0   
+ LOCAL(nan):   
rts
movtr0
.balign 4
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN 
gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md
--- gnu_trunk.ref/gcc/gcc/config/sh/sh.md   2010-07-21 18:06:25.0 
+0200
+++ gnu_trunk/gcc/gcc/config/sh/sh.md   2010-07-22 09:13:12.0 +0200
@@ -10262,6 +10262,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
   "jsr @%1%#"
@@ -10337,13 +10338,18 @@
 
 (define_insn "cmpunsf_i1"
   [(set (reg:SI T_REG)
-   (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
- (match_operand:SF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+   (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+"not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3
+ 0:"
+[(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.
@@ -11069,6 +11075,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr @%1%#"
@@ -11093,6 +11100,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "j

Re: Revisiting the use of cselib in alias.c for scheduling

2010-07-22 Thread Maxim Kuvyrkov

On 7/22/10 3:34 AM, Steven Bosscher wrote:

On Wed, Jul 21, 2010 at 10:09 PM, Maxim Kuvyrkov  wrote:

Cselib can /always/ be used during second scheduling pass


Except with the selective scheduler when it works on regions that are
not extended basic blocks, I suppose?


Right, I was considering sched-rgn scheduler, not sel-sched.




and on
single-block regions during the first scheduling pass (after RA sched-rgn
operates on single-block regions).

Modulo the bugs enabling cselib might surface, the only reason not to enable
cselib for single-block regions in sched-rgn may be increased compile time.
  That requires some benchmarking, but my gut feeling is that the benefits
would outweigh the compile-time cost.


So something like the following _should_ work? If so, I'll give it a
try on x86*.

Ciao!
Steven

Index: sched-rgn.c
===
--- sched-rgn.c (revision 162355)
+++ sched-rgn.c (working copy)
@@ -3285,8 +3285,11 @@
  rgn_setup_sched_infos (void)
  {
if (!sel_sched_p ())
-memcpy (&rgn_sched_deps_info,&rgn_const_sched_deps_info,
-   sizeof (rgn_sched_deps_info));
+{
+  memcpy (&rgn_sched_deps_info,&rgn_const_sched_deps_info,
+ sizeof (rgn_sched_deps_info));
+  rgn_sched_deps_info.use_cselib = reload_completed;



Yes, this should work.  You can also enable cselib for single-block 
regions for first scheduling pass too.  I.e.,


index 89743c3..047b717 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -2935,6 +2935,9 @@ schedule_region (int rgn)
   if (sched_is_disabled_for_current_region_p ())
 return;

+  gcc_assert (!reload_completed || current_nr_blocks == 1);
+  rgn_sched_deps_info.use_cselib = (current_nr_blocks == 1);
+
   sched_rgn_compute_dependencies (rgn);

   sched_rgn_local_init (rgn);

Thanks,

--
Maxim Kuvyrkov
CodeSourcery
ma...@codesourcery.com
(650) 331-3385 x724


Re: SH optimized software floating point routines

2010-07-22 Thread Joern Rennecke

Quoting Christian Bruel :


Edited to apply on top of latest Joern's patch. Certainly not optimal
but it fixes the QNaNs checks for builtins and inlined unordered
comparisons for -mieee or -fno-inite-math-only.


You are still on the wrong track; as I said in my earlier message, we
should not emit the library call for SH4 in the first place.

Please try the attached patch instead.
--- predicates.md-20100718  2010-07-22 08:17:37.273500678 +0100
+++ predicates.md   2010-07-22 08:28:57.257502902 +0100
@@ -575,9 +575,10 @@
 ;; UNORDERED is only supported on SHMEDIA.
 
 (define_predicate "sh_float_comparison_operator"
-  (ior (match_operand 0 "ordered_comparison_operator")
-   (and (match_test "TARGET_SHMEDIA")
-   (match_code "unordered"
+  (if_then_else (match_test "TARGET_SHMEDIA")
+   (ior (match_operand 0 "ordered_comparison_operator")
+(match_code "unordered"))
+   (match_operand 0 "comparison_operator")))
 
 (define_predicate "shmedia_cbranch_comparison_operator"
   (ior (match_operand 0 "equality_comparison_operator")


GCC 4.5.1 Status Report (2010-07-22)

2010-07-22 Thread Richard Guenther

Status
==

The GCC 4.5 branch is now frozen for preparation of a GCC 4.5.1
release candidate.  Please refrain from checking in non-documentation
changes without release manager approval.  If everything goes
right GCC 4.5.1 will be released around Jul 1st.

Quality Data


Priority  # Change from Last Report
--- ---
P10
P2  103 + 13 
P33 -  7 
--- ---
Total   106 +  6


Previous Report
===

http://gcc.gnu.org/ml/gcc/2010-04/msg00321.html


The next status report will be sent by me.


-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


GCC 4.5 branch is frozen now

2010-07-22 Thread Richard Guenther

The 4.5 branch is frozen for preparation of a 4.5.1 release candidate
and the 4.5.1 release.  Please refrain from checking in non-documentation
changes without release manager approval.

Thanks,
Richard.


GCC 4.5.1 Release Candidate available from gcc.gnu.org

2010-07-22 Thread Richard Guenther

A release canidate for GCC 4.5.1 is available from

ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.1-RC-20100722/

and shortly its mirrors.  It has been generated from SVN revision 162408.

I have sofar bootstrapped and tested the release candidate on
x86_64-unknown-linux-gnu.  Please test it and report any issues to
bugzilla.

The branch remains frozen and all checkins until after the final
release of GCC 4.5.1 require explicit RM approval.

If all goes well, I'd like to release 4.5.1 before Aug 1st.

Richard.

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


Re: SH optimized software floating point routines

2010-07-22 Thread Christian Bruel

Joern Rennecke wrote:

Quoting Christian Bruel :


Edited to apply on top of latest Joern's patch. Certainly not optimal
but it fixes the QNaNs checks for builtins and inlined unordered
comparisons for -mieee or -fno-inite-math-only.


You are still on the wrong track; as I said in my earlier message, we
should not emit the library call for SH4 in the first place.


>

Please try the attached patch instead.



Hello, Sorry for the mails that crossed.

I think we are dealing with 2 different problems here, that have the 
same root. Original one was about undefined __unorddf2/__unordsf2 
regression, for which you said that the library functions should not be 
called. I agree, and my patch is not exclusive with yours in this regard.


I was dealing with functional issues in the SNanS bit checking in the 
cmpun_ patterns (in addition to the floating point comparisons 
functions). Which is exposed by the regression test that I provided (for 
-m4-nofpu -mieee).


About the other part of your answer, non supporting SNaNs in the 
fp-bit.c, it is a possibility that I didn't consider in my fix. This 
restriction is quite a surprise to me because, related to NaNs, it is 
not what I guess from the implementation of the fp-bit.c's isnan 
function that does check for CLASS_SNAN, and CLASS_QNAN.


See for example the result of

static int misnanf(float v)
{
  return (v != v);
}

called with either a QNaN or a SNaN. IMO The assembly model should have 
the same semantic that the C model, which is not the case today.


Using -fsignaling-nans and eventually putting #ifdef  __SUPPORT_SNAN__ 
around the checking doesn't change anything since the same call is done 
to the floating point comparison function, that really needs to check 
for both formats. If your are concerned about the extra cycles needed in 
the nesf2f implementation (wich is nothing anyway compared to the C 
model), we could certainly provide a specialized one just for 
-fsignaling-nans.


Best Regards

Christian


Re: SH optimized software floating point routines

2010-07-22 Thread Kaz Kojima
Joern Rennecke  wrote:
> That's a bug, then; we shouldn't use a library function there,
> but the cmpordered[sd]f_t_4 patterns.

Argh, I've missed the required patterns are incorporated already
in your patch.  I'll test it again with sh-softfp-predicate-fix
when the tests for 4.5.1-rc are done.  Thanks!

Regards,
kaz


Re: SH optimized software floating point routines

2010-07-22 Thread Christian Bruel

oops, resending it with a small typo fix (a branch became delayed :-().

Just in case it we accepted that SNaNs and QNaNs are not exclusive and 
mimic the C model, a synthetic illustrative test case:


Compile with
sh-superh-elf-gcc -O2 -mieee -m4-nofpu snan.c snan2.c -g -o l.u ; 
sh-superh-elf-run l.u ; echo $?


Original 4.6 fp-bit C model:
OK

Using the ieee-sf.S implementation:
FAIL

Using the ieee-sf.S + this patch
OK

same for sh4-linux.

Best Regards,

Christian



Christian Bruel wrote:

Christian Bruel wrote:

Hi Kaz,

Kaz Kojima wrote:


BTW, it looks that softfp __unord?f2 routines check signaling NaNs
only.  This makes __builtin_isnan return false for quiet NaNs for
which current fp-bit ones return true when -mieee enabled.  Perhaps
that change of behavior might be OK for software FP.
I use the attached patch to handle the QNaNs in the assembly solf-fp. 
Need to be updated for trunk (and update the dates in changelogs). Will do.


Edited to apply on top of latest Joern's patch. Certainly not optimal 
but it fixes the QNaNs checks for builtins and inlined unordered 
comparisons for -mieee or -fno-inite-math-only.


Best Regards

Christian



diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN 
gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S 
gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S   2010-07-21 
18:04:17.94995 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S   2010-07-21 18:09:10.602376000 
+0200
@@ -92,11 +92,12 @@
HIDDEN_FUNC(GLOBAL(nedf2))
 GLOBAL(nedf2):
cmp/eq  DBL0L,DBL1L
-   mov.l   LOCAL(c_DF_NAN_MASK),r1
-   bf LOCAL(ne)
+   bf.sLOCAL(ne)
+   mov #1,r0
cmp/eq  DBL0H,DBL1H
+   mov.l   LOCAL(c_DF_NAN_MASK),r1
+   bt.sLOCAL(check_nan)
not DBL0H,r0
-   bt  LOCAL(check_nan)
mov DBL0H,r0
or  DBL1H,r0
add r0,r0
@@ -104,11 +105,17 @@
or  DBL0L,r0
 LOCAL(check_nan):
tst r1,r0
-   rts
+   bt.sLOCAL(nan)
+   mov #12,r2
+   shll16  r2
+   xor r2,r1
+   tst r1,r0
+LOCAL(nan):
movtr0
 LOCAL(ne):
rts
-   mov #1,r0
+   nop
+   
.balign 4
 LOCAL(c_DF_NAN_MASK):
.long DF_NAN_MASK
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN 
gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S 
gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S
--- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S   2010-07-22 
14:21:50.606831000 +0200
+++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S   2010-07-22 15:30:17.928097000 
+0200
@@ -58,6 +58,12 @@
add r0,r0
 LOCAL(check_nan):
tst r1,r0
+   bt.sLOCAL(nan)
+   mov #96,r2
+   shll16  r2
+   xor r2,r1
+   tst r1,r0   
+ LOCAL(nan):   
rts
movtr0
.balign 4
diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN 
gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md
--- gnu_trunk.ref/gcc/gcc/config/sh/sh.md   2010-07-21 18:06:25.978547000 
+0200
+++ gnu_trunk/gcc/gcc/config/sh/sh.md   2010-07-22 09:13:12.599669000 +0200
@@ -10262,6 +10262,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
   "jsr @%1%#"
@@ -10337,13 +10338,18 @@
 
 (define_insn "cmpunsf_i1"
   [(set (reg:SI T_REG)
-   (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r")
- (match_operand:SF 1 "arith_reg_operand" "r,r")))
-   (use (match_operand:SI 2 "arith_reg_operand" "r,r"))
-   (clobber (match_scratch:SI 3 "=0,&r"))]
+   (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r")
+ (match_operand:SF 1 "arith_reg_operand" "r")))
+ (use (match_operand:SI 2 "arith_reg_operand" "r"))
+ (clobber (match_scratch:SI 3 "=&r"))]
   "TARGET_SH1 && ! TARGET_SH2E"
-  "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:"
-  [(set_attr "length" "10")])
+"not\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f
+\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2
+\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f
+\tnot\t%1,%3\;tst\t%2,%3
+ 0:"
+[(set_attr "length" "28")])
 
 ;; ??? This is a lot of code with a lot of branches; a library function
 ;; might be better.
@@ -11069,6 +11075,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr @%1%#"
@@ -11093,6 +11100,7 @@
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))
(clobber (reg:SI R1_REG))
+   (clobber (reg:SI R2_REG))
(use (match_operand:SI 1 "arith_reg_operand" "r"))]
   "TARGET_SH1_SOFTFP"
   "jsr @%1%#"
@@ -0,13 +8,18 @@
 
 (define_in

Re: SH optimized software floating point routines

2010-07-22 Thread Joern Rennecke

Quoting Christian Bruel :

 > About the other part of your answer, non supporting SNaNs in the

fp-bit.c, it is a possibility that I didn't consider in my fix. This
restriction is quite a surprise to me because, related to NaNs, it is
not what I guess from the implementation of the fp-bit.c's isnan
function that does check for CLASS_SNAN, and CLASS_QNAN.


Well, it looks like a classic top-down implementation, carving up the
problem in little sub-problems, and then not implementing some of these
so that the case distinction between CLASS_SNAM and CLASS_QNAN becomes
pointless.


See for example the result of

static int misnanf(float v)
{
  return (v != v);
}

called with either a QNaN or a SNaN. IMO The assembly model should have
the same semantic that the C model, which is not the case today.


I would consider the exact bit patterns used for NaNs an implementation
detail, which the user should not need to care about.
We only implement QNaNs.  fp-bit.c recognizes all NaN patterns, but treats
them all as QNaNs.


Using -fsignaling-nans and eventually putting #ifdef  __SUPPORT_SNAN__
around the checking doesn't change anything since the same call is done
to the floating point comparison function, that really needs to check
for both formats.


Considering that the signals don't work, wouldn't a better implementation
of -fsignaling-nans be to issue a diagnostic when using this for a software
floating point ABI in sh.h OVERRIDE_OPTIONS ?
And somehow make using __builtin_nans / __builtin_nansf give a
diagnostic, too.

Unless you want to go further and really implement the signals.
I suppose you could use config/soft-fp for that.


If your are concerned about the extra cycles needed


Both cycles and bytes.


in the nesf2f implementation (wich is nothing anyway compared to the C
model),


fp-bit is so slow that it can't be taken seriously as a benchmark for
software floating point emulation speed.  The point of having a
hand-optimized assembly version is that you actually can show reasonable
performance for codes with light fpu usage, compared to a processors with
hardware floating point (which needs more die space and power, and might
not clock as high as the fpu-less version).
IIRC some EEMBC benchmarks are in that class, i.e. with the hand-optimized
software floating point they run several times faster than with fp-bit,
but going all the way to hardware floating point then gives diminishing
returns.


we could certainly provide a specialized one just for
-fsignaling-nans.


You'd also have to handle the other comparisons.  grep for F_NAN_MASK
in ieee-754-sf.S / ieee-754-df.S.

The original intent was that the faster & more compact NaN check would
be available for all the software emulation code, although I used a more
inclusive check if I saw it could be done with the same cycle count.
I can't remember if I ended up using the mask check anywhere but in
ieee-754-sf.S / ieee-754-df.S .

If you want all possible IEEE NaN patterns to be honoured, someone should
check all these checks in the config/sh/IEEE-754/m3 directory...


Re: SH optimized software floating point routines

2010-07-22 Thread Joern Rennecke

Quoting Christian Bruel :


Using the ieee-sf.S + this patch
OK


Is this only a proof-of-concept, because you only change the ne[sd]f2  
implementation?  And you go out of your way to only accept a restricted

set of values.  Plus, the overuse of the arithmetic unit hurts SH4-100 /
SH4-200 instruction pairing.

AFAICT you need only one cycle penalty, in the check_nan path:

GLOBAL(nesf2):
/* If the raw values are unequal, the result is unequal, unless
   both values are +-zero.
   If the raw values are equal, the result is equal, unless
   the values are NaN.  */
cmp/eq  r4,r5
mov.l   LOCAL(inf2),r1
bt/s LOCAL(check_nan)
mov r4,r0
or  r5,r0
rts
add r0,r0
LOCAL(check_nan):
add r0,r0
cmp/hi  r1,r0
rts
movtr0
.balign 4
LOCAL(inf2):
.long 0xff00

You could even save four bytes by putting the check_nan label into the
delay slot, but I'm not sure if that'll discomfit any branch  
prediction mechanism.


Disclaimer: I've not tested this code.

For the DFmode case, what about NaNs denoted by the low word, e.g.
0x7ff0 1 ?

If so, the DFmode code could become something like this:

GLOBAL(nedf2):
cmp/eq  DBL0L,DBL1L
mov.l   LOCAL(inf2),r1
bf LOCAL(ne)
cmp/eq  DBL0H,DBL1H
bt/sLOCAL(check_nan)
mov DBL0H,r0
or  DBL1H,r0

add r0,r0
rts
or  DBL0L,r0
LOCAL(check_nan):
tst DBL0L,DBL0L
add r0,r0
subcr1,r0
mov #-1,r0
rts
negcr0,r0
LOCAL(ne):
rts
mov #1,r0
.balign 4
LOCAL(inf2):
.long 0xffe0


Re: SH optimized software floating point routines

2010-07-22 Thread Joern Rennecke

Quoting Christian Bruel :


oops, resending it with a small typo fix (a branch became delayed :-().


For an actual patch, you need to use the SL* macros from
config/sh/lib1funcs.h because the SH1 does not have delayed branches.


Re: Attributes

2010-07-22 Thread Joseph S. Myers
I refer you to the various issues I have raised with attributes proposals 
in WG14 in the past, some at least of which have also been forwarded to 
WG21.  The committees have generally chosen not to listen to compatibility 
concerns, except insofar as WG14 ended up not adopting general attribute 
syntax at all but instead adding new keywords for some particular uses of 
attributes.

I also quote here something I sent off-reflector to Herb Sutter 
(Microsoft) last September (see in particular the first paragraph 
regarding treating [[]] as having clearly separate rules from 
__attribute__):

  [...] I'm also concerned about things for users.  If I implement the 
  proposals in GCC, compatibility with existing code would inevitably mean 
  that __attribute__ follows existing rules for how it binds to syntax 
  constructs, and [[]] follows the different C1x rules, and 
  __attribute__((noreturn)) is part of the type system, and [[noreturn]] 
  isn't.  This avoids breaking existing code, but is bound to confuse users 
  when they try to change syntax (likely with macros such as those you 
  mention) if they happened to be using cases where the two syntaxes bind 
  differently.  When you have two similar but incompatible ways of doing 
  something, you have a no win situation for users - and it's not WG14's job 
  to create work for front-end writers implementing the different modes for 
  compatibility with different code, they should be helping users by 
  avoiding needing the different modes.

  The fact that there are some problems with existing implementations 
  (including bugs where it may not be at all clear what is actually the best 
  thing to do), such as questions of exactly how attributes should interact 
  with the type system, may mean that a standard version cannot be perfectly 
  compatible with existing implementations in all cases.  But by studying 
  what implementations actually do, and what real code using these features 
  does, and what problems have been noted in implementation over the years, 
  it ought to be possible to come up with something friendlier to users (and 
  less ambitious) than the existing proposals.  So far however the proposers 
  of attributes standardization do not seem to have been paying any 
  attention to my points [...]

  Issues we've observed in practice with how attributes interact with the 
  type system include those described in 
   and 
  .

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Error in GCC documentation page

2010-07-22 Thread Joseph S. Myers
On Thu, 8 Jul 2010, Robert Dewar wrote:

> For another take, though the Ada standard extensively uses the word
> integral, it does prefer integer type, by analogy with array type,
> record type etc, where no adjective is available.
> 
> But as noted the C++ standard prefers integral type, so might as well
> standardize on that when talking about C or C++.

C99 uses "integer type".  This was adopted consistently following DR#067 
pointing out the variation in C90 between "integer type" and "integral 
type".

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_067.html

-- 
Joseph S. Myers
jos...@codesourcery.com


gcc-4.5-20100722 is now available

2010-07-22 Thread gccadmin
Snapshot gcc-4.5-20100722 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100722/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 162431

You'll find:

gcc-4.5-20100722.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100722.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100722.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100722.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100722.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100722.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100722.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100722.tar.bz2The GCC testsuite

Diffs from 4.5-20100715 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


RE: SH optimized software floating point routines

2010-07-22 Thread Joseph S. Myers
On Fri, 16 Jul 2010, Joern Rennecke wrote:

> Quoting "Naveen H. S" :
> 
> > extendsfdf2 - gcc.c-torture/execute/conversion.c
> > gcc.dg/torture/fp-int-convert-float.c, gcc.dg/pr28796-2.c
> 
> Note that some tests invoke undefined behaviour; I've also come across this
> when doing optimized soft FP for ARCompact:
> 
> http://gcc.gnu.org/viewcvs/branches/arc-4_4-20090909-branch/gcc/testsuite/gcc.dg/torture/fp-int-convert.h?r1=151539&r2=151545

That diff does not appear to relate to undefined behavior.  GCC considers 
these out-of-range conversions to yield an unspecified value, possibly 
raising an exception, as per Annex F, and does not take the liberty of 
optimizing on the basis of them being undefined when not in an IEEE mode.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: GFDL/GPL issues

2010-07-22 Thread Mark Mitchell
Benjamin Kosnik wrote:

> Is there a separate issue for libstdc++ doxygen? This situation is
> subtly different from the one outlined above: it is the application of
> a GPL'd tool over GPL'd sources, which the FSF + Red Hat legal have
> both told me for years results in GPL'd docs (and is clearly noted as
> such in the libstdc++ manual under Licensing.)  I consider this sane,
> actually, and would be most unhappily surprised if the act of generating
> the HTML changed the license to GFDL.

As far as I know, everything you say above is correct; the documentation
you're generating is GPL'd.  (IANAL, of course.)

In any case, that wasn't what the discussion with RMS was about.  It was
about two things:

1. What license should "manuals" have?

The FSF wants them to be GFDL.  However, RMS agreed that it's OK for
"cross-reference" information (as opposed to "manuals"), auto-generated
from source code, such as the documentation you're generating with
doxygen to be GPL'd.  So, the procedure you're using is fine, not just
from a "is this legal" point of view, but also from an FSF policy point
of view.

2. Can we move GPL'd code into GFDL'd manuals, or copy text from GFDL's
manuals into GPL'd code, or auto-generated GFDL's manuals from GPL'd code?

This got complicated; see previous postings.  But, it's not relevant to
your question, since you're not trying to do that.

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713


Re: GFDL/GPL issues

2010-07-22 Thread Steven Bosscher
On Fri, Jul 23, 2010 at 1:22 AM, Mark Mitchell  wrote:
> 2. Can we move GPL'd code into GFDL'd manuals, or copy text from GFDL's
> manuals into GPL'd code, or auto-generated GFDL's manuals from GPL'd code?
>
> This got complicated; see previous postings.  But, it's not relevant to
> your question, since you're not trying to do that.

I would like to do this for the constraints.md files, but it's not
clear to me right now whether this is allowed or not. What do you
think?

Ciao!
Steven


Re: GFDL/GPL issues

2010-07-22 Thread Mark Mitchell
Steven Bosscher wrote:

>> 2. Can we move GPL'd code into GFDL'd manuals, or copy text from GFDL's
>> manuals into GPL'd code, or auto-generated GFDL's manuals from GPL'd code?
>>
>> This got complicated; see previous postings.  But, it's not relevant to
>> your question, since you're not trying to do that.
> 
> I would like to do this for the constraints.md files, but it's not
> clear to me right now whether this is allowed or not. What do you
> think?

I think it's allowed, but not a good idea, due to the fact that I think
it creates a "trap" for people.

The FSF has said that it's OK for *us* to do it, in the FSF repository,
because the FSF can itself relicense code.  But, it's said that it's not
OK for third parties to do it, because they can't.  And, the natural way
for us to do it is via generator programs.  This creates a situation
where a third party could rerun the generator program and end up with
something they couldn't distribute.  That seems very tricky to me.

I believe that the only real fix here is (a) for the FSF to abandon the
GFDL, and relicense manuals under the GPL, or (b) for the FSF to add an
exception to the GFDL, making it compatible with the GPL in some way.
However, I have no evidence that the FSF is considering either of these
ideas; RMS didn't provide encouraging feedback when I made such suggestions.

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713


Re: GFDL/GPL issues

2010-07-22 Thread Joe Buck
On Thu, Jul 22, 2010 at 04:36:46PM -0700, Mark Mitchell wrote:
> Steven Bosscher wrote:
> 
> >> 2. Can we move GPL'd code into GFDL'd manuals, or copy text from GFDL's
> >> manuals into GPL'd code, or auto-generated GFDL's manuals from GPL'd code?
> >>
> >> This got complicated; see previous postings.  But, it's not relevant to
> >> your question, since you're not trying to do that.
> > 
> > I would like to do this for the constraints.md files, but it's not
> > clear to me right now whether this is allowed or not. What do you
> > think?
> 
> I think it's allowed, but not a good idea, due to the fact that I think
> it creates a "trap" for people.
> 
> The FSF has said that it's OK for *us* to do it, in the FSF repository,
> because the FSF can itself relicense code.  But, it's said that it's not
> OK for third parties to do it, because they can't.  And, the natural way
> for us to do it is via generator programs.  This creates a situation
> where a third party could rerun the generator program and end up with
> something they couldn't distribute.  That seems very tricky to me.
> 
> I believe that the only real fix here is (a) for the FSF to abandon the
> GFDL, and relicense manuals under the GPL, or (b) for the FSF to add an
> exception to the GFDL, making it compatible with the GPL in some way.
> However, I have no evidence that the FSF is considering either of these
> ideas; RMS didn't provide encouraging feedback when I made such suggestions.

RMS is unlikely to abandon the GFDL because the features that many object
to as non-free are intentionally chosen, in part to make sure that he can
get his message out even in situations where a distributor would not agree
with that message.  I think he hasn't gotten over ESR's attempts in the
late 90s to write him out of history, so he thinks he has to force people
to carry his message along with the GNU tools.

However, if we have text that is entirely generated from a GPL program
by some kind of generator program, that text can be distributed under
the GPL.  It just can't be combined with GFDL text, except by "mere
aggregation" (you can print the two "manuals" one after the other as
chapters, or publish them both from the same web site).

RMS didn't object to what he called a "cross reference" or an "index",
generated this way, to be distributed under the GPL.

Not a great solution, but perhaps it can be made to work for a while.


Re: Reload problems with only one base reg for "base + offset" addressing mode

2010-07-22 Thread redriver jiang
Hi,

You mean I should define insn like this:

(define_insn "*iorqi3_imm"
 [(set (mem:QI (match_operand:HI 0 "register_operand"   "b"))
   (ior:QI (mem:QI (match_operand:HI 1 "register_operand"   "b")
 (mem:QI (plus: HI (match_operand:HI 2
"register_operand"  "f")
   (match_operand: 3 HI
"immediate_operand" "K")   ]
""
"..."
[( set_attr "length" "1" )])

"b" for R16,R17,R18, and "f" for R18, "K" for immediate operand with
range "0-127"?


Thanks!


2010/7/20 Ian Lance Taylor :
> redriver jiang  writes:
>
>> I am porting GCC to a 8bit architecture, and now I have problem on
>> reload problem on addressing mode.
>> Besides 15 general registers, it has three 16bit address registers,
>> R16,R17,R18.
>> R16,R17,R18 are able to be as base register in "base" address mode,
>> but only R18 can be base regs for "base+offse(immediate)t" address
>> mode.
>> I make "BASE_REGS" class  for "R16,R17,R18", and "POINTER_REG" class
>> for R18, and frame pointer is R18, the maxim "offset" in "base+offset"
>> is 127.
>>
>> And now the test compiler sometimes generate following errors:
>>
>> test3.c: In function 'OS_EventTaskWait':
>> test3.c:62: error: unable to find a register to spill in class 'POINTER_REG'
>> test3.c:62: error: this is the insn:
>> (insn 58 57 61 2 (set (mem/s:QI (plus:HI (reg:HI 16 R16 [51])
>>                 (const_int 5 [0x5])) [0 .OSEventTbl S1 A8])
>>         (ior:QI (mem/s:QI (plus:HI (reg:HI 16 R16 [51])
>>                     (const_int 5 [0x5])) [0 .OSEventTbl S1 A8])
>>             (mem/s:QI (plus:HI (reg:HI 17 R17 [orig:38 OSTCBCur.0 ] [38])
>>                     (const_int 14 [0xe])) [0 .OSTCBBitX+0 S1
>> A8]))) 25 {*iorqi3_noimm} (insn_list:REG_DEP_TRUE 51 (nil))
>>     (expr_list:REG_DEAD (reg:HI 17 R17 [orig:38 OSTCBCur.0 ] [38])
>>         (expr_list:REG_DEAD (reg:HI 16 R16 [51])
>> test3.c:62: confused by earlier errors, bailing out.
>
> You should be able to fix this by using constraints.  Define a
> constraint which uses the base register and define one which permits one
> of the indirect registers.  Write different alternatives such that only
> one operand uses the base register in each alternative.  Then reload
> should be able to pick the best one, and reload the other addresses into
> the indirect register.
>
> Ian
>


RE: SH optimized software floating point routines

2010-07-22 Thread Joern Rennecke

Quoting "Joseph S. Myers" :


That diff does not appear to relate to undefined behavior.  GCC considers
these out-of-range conversions to yield an unspecified value, possibly
raising an exception, as per Annex F, and does not take the liberty of
optimizing on the basis of them being undefined when not in an IEEE mode.


Well, still, the test is wrong in possibly raising an exception there,
with no provisions to ignore the exception or catch any signal raised.

For the ARCompact, in order to test the floating point emulation better,
I had (there are still there in #if 0 /*DEBUG */ blocks) small wrappers
for each function to evaluate it once with the hand-optimized version,
and once with fp-bit.c, and abort on getting different values.
Now, fp-bit generally tries to yield some value that the programmer thought
might mean something, whereas the hand-optimized version treats computations
of unspecified values as irrelevant.

Considering:

GLOBAL(fixunsdfsi):
mov.w   LOCAL(x413),r1  ! bias + 20
mov DBL0H,r0
shllDBL0H
mov.l   LOCAL(mask),r3
mov #-21,r2
shldr2,DBL0H! SH4-200 will start this insn in a new cycle
bt/sLOCAL(ret0)
sub r1,DBL0H
cmp/pl  DBL0H   ! SH4-200 will start this insn in a new cycle
and r3,r0
bf/sLOCAL(ignore_low)
addcr3,r0   ! uses T == 1; sets implict 1
mov #11,r2
shldDBL0H,r0! SH4-200 will start this insn in a new cycle
cmp/gt  r2,DBL0H
add #-32,DBL0H
bt  LOCAL(retmax)
shldDBL0H,DBL0L
rts
or  DBL0L,r0

and:

__fixunsdfsi:
bbit0 DBL0H,30,.Lret0or1
lsr r2,DBL0H,20
bmsk_s DBL0H,DBL0H,19
sub_s r2,r2,19; 0x3ff+20-0x400
neg_s r3,r2
btst_s r3,10
bset_s DBL0H,DBL0H,20
#ifdef __LITTLE_ENDIAN__
mov.ne DBL0L,DBL0H
asl DBL0H,DBL0H,r2
#else
asl.eq DBL0H,DBL0H,r2
lsr.ne DBL0H,DBL0H,r3
#endif
lsr DBL0L,DBL0L,r3
j_s.d [blink]
add.eq r0,r0,r1
.Lret0:
j_s.d [blink]
mov_l r0,0
.Lret0or1:
add_s DBL0H,DBL0H,0x10
lsr_s DBL0H,DBL0H,30
j_s.d [blink]
bmsk_l r0,DBL0H,0

You can see that an SH4-300 can perform software floating point
fixunsdfsi in ten cycles, and the SH4-400 (SH4-200 sans FPU)
and ARC700 in twelve.

Adding any code in order to compute nice, fluffy values for
unspecified results would cause a significant performance degradation.


Re: GFDL/GPL issues

2010-07-22 Thread Joern Rennecke

Quoting Joe Buck :


RMS is unlikely to abandon the GFDL because the features that many object
to as non-free are intentionally chosen, in part to make sure that he can
get his message out even in situations where a distributor would not agree
with that message.  I think he hasn't gotten over ESR's attempts in the
late 90s to write him out of history, so he thinks he has to force people
to carry his message along with the GNU tools.


What about extending the allowed additional Terms of the GPL under clause
7 (7b is closest right now) to allow an author / Copyright holder to have
his soapbox in the software (i.e. preserve designated comments /
documentation in source and printed form), and then implement the GFDL on
top of that.


Re: GCC 4.5.1 Release Candidate available from gcc.gnu.org

2010-07-22 Thread Dennis Clarke

>
> A release canidate for GCC 4.5.1 is available from
>
> ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.1-RC-20100722/
>
> and shortly its mirrors.  It has been generated from SVN revision 162408.
>
> I have sofar bootstrapped and tested the release candidate on
> x86_64-unknown-linux-gnu.  Please test it and report any issues to
> bugzilla.
>
> The branch remains frozen and all checkins until after the final
> release of GCC 4.5.1 require explicit RM approval.
>
> If all goes well, I'd like to release 4.5.1 before Aug 1st.
>

FYI , bug 44455 is a show stopper in the Solaris world.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44455

-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris




Re: GFDL/GPL issues

2010-07-22 Thread Mark Mitchell
Joe Buck wrote:

> However, if we have text that is entirely generated from a GPL program
> by some kind of generator program, that text can be distributed under
> the GPL.  

As a license statement, that's accurate.  As a policy statement, the FSF
seems to object if the output is a "manual", but not if it is a "cross
reference".  If we had a useful manual generated in this way, I'd argue
very strongly to the FSF that we should permit its distribution under
the GPL, but we don't have such a case, so there's no need for the
argument at this time.

> RMS didn't object to what he called a "cross reference" or an "index",
> generated this way, to be distributed under the GPL.

Right.

> Not a great solution, but perhaps it can be made to work for a while.

Certainly, for the purposes of libstdc++, we're OK.  Nothing has to
change to keep distributing the doxygen-generated cross-reference for
libstdc++.

I agree with you that RMS is unlikely to shift his position regarding
the GFDL.  However, I think it's goofy that we cannot auto-generate
parts of the internals manual, or the user's manual, from GPL'd source
code.  If the FSF's policy of using the GFDL on manuals means that we
can't have as good a user's manual as we would otherwise, then --
whatever its purported benefits -- the GFDL is not serving us well, and
we should continue making that case to the FSF.

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713