Re: Does gengtyped gt-*.h depends upon the configuration of the compiler?

2010-03-12 Thread Laurynas Biveinis
>> You might want to look at the gengtype debugging dump support on
>> gc-improv branch, which I will submit shortly for 4.6 trunk.
>
> Thanks! Yes, I looked at your gengtype.c in your branch, and it is the kind
> of code I was dreaming of.
>  Usually, in persistency machinery, the code to reload data from the file is
> a bit more complex than the code to dump it. Do you have any ideas?

The dumping is code is short and straightforward. The loading code
would be slightly more involved, yes, mostly because of patching
pointer target addresses between all the structs, but still I think
it's a reasonably small task.

> And more significantly, do you think that my idea of persisting GTY-ed data
> descriptors in gengtype is good enough to have: tools (gengtype) & data
> (e.g. hypothetical gcc-gty-data-descr.json) installed in the GCC
> installation, and reused by gengtype invocation for plugins, so to remove
> the harsh constraint of keeping both source & build tree? Or did I forgot
> something?

Gengtype dependency is a kludge. But since plugins need access to GC,
I really don't have any better idea.

> Also, what is a summary of the GTY & gengtype improvements (w.r.t plugins)
> in your gc-improv branch?

Debugging dump support is the closest thing to "gengtype improvement
wrt plugins" there. Not much else.


-- 
Laurynas


Testcase that causes excessive loads inserted by IRA

2010-03-12 Thread Christian Borntraeger
Hello Vladimir,

On s390x I have seen some testcase where IRA goes ballistic and loads a value
from stack (160(%r15)) over and over again:

[...]
  82:   e3 80 f0 a0 00 04   lg  %r8,160(%r15)   <--
  88:   e3 b0 f0 a0 00 04   lg  %r11,160(%r15)  <--
  8e:   e3 c0 f0 a0 00 04   lg  %r12,160(%r15)  <--
  94:   e3 90 f0 a0 00 04   lg  %r9,160(%r15)   <--
  9a:   e3 10 f0 a0 00 04   lg  %r1,160(%r15)   <--
  a0:   e3 30 f0 a0 00 04   lg  %r3,160(%r15)   <--
  a6:   e3 70 80 00 00 95   llh %r7,0(%r8)
  ac:   e3 00 b0 06 00 95   llh %r0,6(%r11)
  b2:   e3 a0 c0 08 00 95   llh %r10,8(%r12)
  b8:   e3 80 90 0a 00 95   llh %r8,10(%r9)
  be:   e3 50 10 02 00 95   llh %r5,2(%r1)
  c4:   b9 04 00 42 lgr %r4,%r2
  c8:   e3 20 30 04 00 95   llh %r2,4(%r3)
[...]

Afterwards all the six addresses are used immidiately as a base address for
multiple memory accesses. So this testcases triggers 5 unnecessary loads
from stack (and might even cause some delay due to address generation in the
pipeline as the bypass stack has a limited amount of entries).

The smallest testcase I could create out of the exisiting code is
-- snip --
struct dummy {
int a;
int b;
} d;

static unsigned short *(*func) (unsigned short *,int, int, int, int);

extern int *field;
extern int sum;
extern unsigned short *p1, *p2;


void tester(void)
{
unsigned short blocks[256], *orgp, *refp;
int y, z;
int part;
unsigned short *x;


int apply = ((d.a && (d.b == 0 || d.b == 1)) || d.b == 0);

if (apply)
x = p1;
else
x = p2;

orgp = blocks;
for (y = 0; y < 3; y++) {
part = 0;
for (z = 0; z < 3; z++) {
refp = func(x, 0, 1, 2, 3);

part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
part += field[*refp++ - *orgp++];
}
sum = part*4;
}
}

- snip 

and if compiled on s390x with
-march=z9-109 -mtune=z10 -funroll-loops --param max-unrolled-insns=100 -O3
gcc creates the sequence above. The unrolling seems to be necessary to
trigger the right amount of register pressure.

Looking at the dumps 

in 186r.sched we still have memory accesses from address r103+2*x
[...]
(insn 65 61 72 8 tester.c:34 (set (reg:SI 457)
(zero_extend:SI (mem:HI (reg/v/f:DI 103 [ orgp ]) [2 S2 A16]))) 166 
{*zero_extendhisi2_extimm} (nil))

(insn 72 65 79 8 tester.c:35 (set (reg:SI 462)
(zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ orgp ])
(const_int 2 [0x2])) [2 S2 A16]))) 166 
{*zero_extendhisi2_extimm} (nil))

(insn 79 72 86 8 tester.c:36 (set (reg:SI 467)
(zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ orgp ])
(const_int 4 [0x4])) [2 S2 A16]))) 166 
{*zero_extendhisi2_extimm} (nil))

(insn 86 79 93 8 tester.c:37 (set (reg:SI 472)
(zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ orgp ])
(const_int 6 [0x6])) [2 S2 A16]))) 166 
{*zero_extendhisi2_extimm} (nil))
[...]
and so on 




which then gets all the additional loads in the 187r.ira step. 

[...]
(insn 322 61 65 8 tester.c:34 (set (reg:DI 12 %r12)
(mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 160 [0xa0])) [8 %sfp+-624 S8 A64])) 62 {*movdi_64} 
(nil))

(insn 65 322 323 8 tester.c:34 (set (reg:SI 12 %r12)
(zero_extend:SI (mem:HI (reg:DI 12 %r12) [2 S2 A16]))) 166 
{*zero_extendhisi2_extimm} (nil))

(insn 323 65 324 8 tester.c:34 (set (mem/c:SI (plus:DI (reg/f:DI 15 %r15)
(const_int 176 [0xb0])) [8 %sfp+-608 S4 A64])
(reg:SI 12 %r12)) 66 {*movsi_zarch} (nil))

(insn 324 323 72 8 tester.c:35 (set (reg:DI 1 %r1)
(mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 160 [0xa0])) [8 %sfp+-624 S8 A64])) 62 {*movdi_64} 
(nil))

(insn 72 324 325 8 tester.c:35 (set (reg:SI 1 %r1)
(zero_extend:SI (mem:HI (plus:DI (reg:DI 1 %r1)
(const_int 2 [0x2])) [2 S2 A16]))) 166 
{*zero_extendhisi2_extimm} (nil))

(insn 325 72 326 8 tester.c:35 (set (mem/c:SI (plus:DI (reg/f:DI 15 %r15)
(const_int 192 [0xc0])) [8 %sfp+-592 S4 A32])
(reg:SI 1 %r1)) 66 {*movsi_zarch} (nil))

(insn 326 325 79 8 tester.c:36 (set (reg:DI 2 %r2)
(mem/c:DI (plus:DI (reg/f:DI 15 %r15)
(const_int 160 [0xa0])) [8 %sfp+-624 S8 A64])) 62 {*movdi_64} 
(nil))

(insn 79 326 327 

Re: Defining a libffi.so.4 ABI

2010-03-12 Thread Anthony Green
On 03/01/2010 04:47 PM, Rainer Orth wrote:
> If this is deemed acceptable, I'll probably go ahead and implement
> proper support for this in libffi, but only after providing a common
> symbol versioning infrastructure in GCC instead of again duplicating
> what we already have in several runtime libraries.
>   

Thanks Rainer.  This is very helpful.  Please go ahead.

I'll look into that raw api issue this weekend.

AG



Re: Use the wctype builtins functions

2010-03-12 Thread Daniel Jacobowitz
On Thu, Mar 11, 2010 at 10:46:42AM +0100, Paolo Bonzini wrote:
> On 03/05/2010 05:03 PM, Joseph S. Myers wrote:
> >I don't know if there's an existing free software implementation of UAX#14
> >(Unicode Line Breaking Algorithm) suitable for use in GCC; that would be
> >the very heavyweight approach.
> 
> Yes.  You can get it from gnulib like gdb does, or you can link
> libunistring (http://savannah.gnu.org/projects/libunistring).
> libunistring only supports UTF-{8,16,32} encodings though.

I don't think GDB actually does today.  But here's a prototype:

http://sourceware.org/ml/gdb-patches/2006-10/msg0.html

-- 
Daniel Jacobowitz
CodeSourcery


how do I achieve a weaker UNSPEC_VOLATILE?

2010-03-12 Thread Mat Hostetter
I've implemented some special insns that access hardware resources.
These insns have side effects so they cannot be deleted or reordered
with respect to each other.  I made them UNSPEC_VOLATILE, which
generates correct code.  Unfortunately, performance is poor.

The problem is that UNSPEC_VOLATILE is a scheduling barrier, so the
scheduler does not issue any other insn in the same cycle.  Since my
chip is a VLIW, I rely on the scheduler annotations to determine which
insns go in a bundle (same cycle == same bundle).  Due to the
scheduler barrier, none of these special insns ever get bundled with
anything else, which wastes valuable VLIW slots.

How should I achieve the effect I need (preserve these insns and their
relative ordering), while still allowing other insns to be bundled
with them?

One hack that occurs to me is to annotate the special insns to pretend
each one reads and writes a phony hardware register.  This would
preserve ordering and prevent them from being deleted, at least if a
phony hardware register would be considered live on exit from a
function, etc. (would it?)

But even if this works, I worry the phony dependencies and more
complex insn patterns might prevent 'combine' from ever combining two
of these special insns together, which is valuable and works now.

But perhaps there is a cleaner way.  Any advice?  Thanks!

-Mat


missing C++ typeinfo for __float128

2010-03-12 Thread Roman Kononov
Hi,

Typeinfo for __float128 is undefined. Is it a bug?

Thanks.


$ cat test.cpp
#include 
#include 
int main() {
  return strlen(typeid(__float128).name());
}
$ g++ test.cpp
/tmp/ccw01pnm.o: In function `main':
test.cpp:(.text+0x5): undefined reference to `typeinfo for __float128'
collect2: ld returned 1 exit status
$ g++ --version | head -1
g++ (GCC) 4.5.0 20100312 (experimental)
$ g++ -dumpmachine
x86_64-unknown-linux-gnu
$ ld --version | head -1
GNU ld (GNU Binutils) 2.20.1.20100303




LTO and asm specs...

2010-03-12 Thread David Miller

There is one g++ LTO test case (g++.lto/20090303) that fails on sparc,
it compiles the intermediate objects with -fPIC but the final
compilation creates an executable.

The problem is that when LTO re-instantiates the options for the
individual builds, the proper ASM specs of the target are not
executed, so in this case "-K PIC" is not passed down to the assembler
in response to "-fPIC".

As a consequence, relocations against _GLOBAL_OFFSET_TABLE_
in code like this:

sethi   %hi(_GLOBAL_OFFSET_TABLE_), %g1

use the R_SPARC_HI22 relocation instead of R_SPARC_PC22.

Thus the program crashes.

I couldn't figure out immediately how to fix this as the
way LTO does spec overriding and such looked non-trivial.

Thanks.