Re: Does gengtyped gt-*.h depends upon the configuration of the compiler?
>> You might want to look at the gengtype debugging dump support on >> gc-improv branch, which I will submit shortly for 4.6 trunk. > > Thanks! Yes, I looked at your gengtype.c in your branch, and it is the kind > of code I was dreaming of. > Usually, in persistency machinery, the code to reload data from the file is > a bit more complex than the code to dump it. Do you have any ideas? The dumping is code is short and straightforward. The loading code would be slightly more involved, yes, mostly because of patching pointer target addresses between all the structs, but still I think it's a reasonably small task. > And more significantly, do you think that my idea of persisting GTY-ed data > descriptors in gengtype is good enough to have: tools (gengtype) & data > (e.g. hypothetical gcc-gty-data-descr.json) installed in the GCC > installation, and reused by gengtype invocation for plugins, so to remove > the harsh constraint of keeping both source & build tree? Or did I forgot > something? Gengtype dependency is a kludge. But since plugins need access to GC, I really don't have any better idea. > Also, what is a summary of the GTY & gengtype improvements (w.r.t plugins) > in your gc-improv branch? Debugging dump support is the closest thing to "gengtype improvement wrt plugins" there. Not much else. -- Laurynas
Testcase that causes excessive loads inserted by IRA
Hello Vladimir, On s390x I have seen some testcase where IRA goes ballistic and loads a value from stack (160(%r15)) over and over again: [...] 82: e3 80 f0 a0 00 04 lg %r8,160(%r15) <-- 88: e3 b0 f0 a0 00 04 lg %r11,160(%r15) <-- 8e: e3 c0 f0 a0 00 04 lg %r12,160(%r15) <-- 94: e3 90 f0 a0 00 04 lg %r9,160(%r15) <-- 9a: e3 10 f0 a0 00 04 lg %r1,160(%r15) <-- a0: e3 30 f0 a0 00 04 lg %r3,160(%r15) <-- a6: e3 70 80 00 00 95 llh %r7,0(%r8) ac: e3 00 b0 06 00 95 llh %r0,6(%r11) b2: e3 a0 c0 08 00 95 llh %r10,8(%r12) b8: e3 80 90 0a 00 95 llh %r8,10(%r9) be: e3 50 10 02 00 95 llh %r5,2(%r1) c4: b9 04 00 42 lgr %r4,%r2 c8: e3 20 30 04 00 95 llh %r2,4(%r3) [...] Afterwards all the six addresses are used immidiately as a base address for multiple memory accesses. So this testcases triggers 5 unnecessary loads from stack (and might even cause some delay due to address generation in the pipeline as the bypass stack has a limited amount of entries). The smallest testcase I could create out of the exisiting code is -- snip -- struct dummy { int a; int b; } d; static unsigned short *(*func) (unsigned short *,int, int, int, int); extern int *field; extern int sum; extern unsigned short *p1, *p2; void tester(void) { unsigned short blocks[256], *orgp, *refp; int y, z; int part; unsigned short *x; int apply = ((d.a && (d.b == 0 || d.b == 1)) || d.b == 0); if (apply) x = p1; else x = p2; orgp = blocks; for (y = 0; y < 3; y++) { part = 0; for (z = 0; z < 3; z++) { refp = func(x, 0, 1, 2, 3); part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; part += field[*refp++ - *orgp++]; } sum = part*4; } } - snip and if compiled on s390x with -march=z9-109 -mtune=z10 -funroll-loops --param max-unrolled-insns=100 -O3 gcc creates the sequence above. The unrolling seems to be necessary to trigger the right amount of register pressure. Looking at the dumps in 186r.sched we still have memory accesses from address r103+2*x [...] (insn 65 61 72 8 tester.c:34 (set (reg:SI 457) (zero_extend:SI (mem:HI (reg/v/f:DI 103 [ orgp ]) [2 S2 A16]))) 166 {*zero_extendhisi2_extimm} (nil)) (insn 72 65 79 8 tester.c:35 (set (reg:SI 462) (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ orgp ]) (const_int 2 [0x2])) [2 S2 A16]))) 166 {*zero_extendhisi2_extimm} (nil)) (insn 79 72 86 8 tester.c:36 (set (reg:SI 467) (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ orgp ]) (const_int 4 [0x4])) [2 S2 A16]))) 166 {*zero_extendhisi2_extimm} (nil)) (insn 86 79 93 8 tester.c:37 (set (reg:SI 472) (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ orgp ]) (const_int 6 [0x6])) [2 S2 A16]))) 166 {*zero_extendhisi2_extimm} (nil)) [...] and so on which then gets all the additional loads in the 187r.ira step. [...] (insn 322 61 65 8 tester.c:34 (set (reg:DI 12 %r12) (mem/c:DI (plus:DI (reg/f:DI 15 %r15) (const_int 160 [0xa0])) [8 %sfp+-624 S8 A64])) 62 {*movdi_64} (nil)) (insn 65 322 323 8 tester.c:34 (set (reg:SI 12 %r12) (zero_extend:SI (mem:HI (reg:DI 12 %r12) [2 S2 A16]))) 166 {*zero_extendhisi2_extimm} (nil)) (insn 323 65 324 8 tester.c:34 (set (mem/c:SI (plus:DI (reg/f:DI 15 %r15) (const_int 176 [0xb0])) [8 %sfp+-608 S4 A64]) (reg:SI 12 %r12)) 66 {*movsi_zarch} (nil)) (insn 324 323 72 8 tester.c:35 (set (reg:DI 1 %r1) (mem/c:DI (plus:DI (reg/f:DI 15 %r15) (const_int 160 [0xa0])) [8 %sfp+-624 S8 A64])) 62 {*movdi_64} (nil)) (insn 72 324 325 8 tester.c:35 (set (reg:SI 1 %r1) (zero_extend:SI (mem:HI (plus:DI (reg:DI 1 %r1) (const_int 2 [0x2])) [2 S2 A16]))) 166 {*zero_extendhisi2_extimm} (nil)) (insn 325 72 326 8 tester.c:35 (set (mem/c:SI (plus:DI (reg/f:DI 15 %r15) (const_int 192 [0xc0])) [8 %sfp+-592 S4 A32]) (reg:SI 1 %r1)) 66 {*movsi_zarch} (nil)) (insn 326 325 79 8 tester.c:36 (set (reg:DI 2 %r2) (mem/c:DI (plus:DI (reg/f:DI 15 %r15) (const_int 160 [0xa0])) [8 %sfp+-624 S8 A64])) 62 {*movdi_64} (nil)) (insn 79 326 327
Re: Defining a libffi.so.4 ABI
On 03/01/2010 04:47 PM, Rainer Orth wrote: > If this is deemed acceptable, I'll probably go ahead and implement > proper support for this in libffi, but only after providing a common > symbol versioning infrastructure in GCC instead of again duplicating > what we already have in several runtime libraries. > Thanks Rainer. This is very helpful. Please go ahead. I'll look into that raw api issue this weekend. AG
Re: Use the wctype builtins functions
On Thu, Mar 11, 2010 at 10:46:42AM +0100, Paolo Bonzini wrote: > On 03/05/2010 05:03 PM, Joseph S. Myers wrote: > >I don't know if there's an existing free software implementation of UAX#14 > >(Unicode Line Breaking Algorithm) suitable for use in GCC; that would be > >the very heavyweight approach. > > Yes. You can get it from gnulib like gdb does, or you can link > libunistring (http://savannah.gnu.org/projects/libunistring). > libunistring only supports UTF-{8,16,32} encodings though. I don't think GDB actually does today. But here's a prototype: http://sourceware.org/ml/gdb-patches/2006-10/msg0.html -- Daniel Jacobowitz CodeSourcery
how do I achieve a weaker UNSPEC_VOLATILE?
I've implemented some special insns that access hardware resources. These insns have side effects so they cannot be deleted or reordered with respect to each other. I made them UNSPEC_VOLATILE, which generates correct code. Unfortunately, performance is poor. The problem is that UNSPEC_VOLATILE is a scheduling barrier, so the scheduler does not issue any other insn in the same cycle. Since my chip is a VLIW, I rely on the scheduler annotations to determine which insns go in a bundle (same cycle == same bundle). Due to the scheduler barrier, none of these special insns ever get bundled with anything else, which wastes valuable VLIW slots. How should I achieve the effect I need (preserve these insns and their relative ordering), while still allowing other insns to be bundled with them? One hack that occurs to me is to annotate the special insns to pretend each one reads and writes a phony hardware register. This would preserve ordering and prevent them from being deleted, at least if a phony hardware register would be considered live on exit from a function, etc. (would it?) But even if this works, I worry the phony dependencies and more complex insn patterns might prevent 'combine' from ever combining two of these special insns together, which is valuable and works now. But perhaps there is a cleaner way. Any advice? Thanks! -Mat
missing C++ typeinfo for __float128
Hi, Typeinfo for __float128 is undefined. Is it a bug? Thanks. $ cat test.cpp #include #include int main() { return strlen(typeid(__float128).name()); } $ g++ test.cpp /tmp/ccw01pnm.o: In function `main': test.cpp:(.text+0x5): undefined reference to `typeinfo for __float128' collect2: ld returned 1 exit status $ g++ --version | head -1 g++ (GCC) 4.5.0 20100312 (experimental) $ g++ -dumpmachine x86_64-unknown-linux-gnu $ ld --version | head -1 GNU ld (GNU Binutils) 2.20.1.20100303
LTO and asm specs...
There is one g++ LTO test case (g++.lto/20090303) that fails on sparc, it compiles the intermediate objects with -fPIC but the final compilation creates an executable. The problem is that when LTO re-instantiates the options for the individual builds, the proper ASM specs of the target are not executed, so in this case "-K PIC" is not passed down to the assembler in response to "-fPIC". As a consequence, relocations against _GLOBAL_OFFSET_TABLE_ in code like this: sethi %hi(_GLOBAL_OFFSET_TABLE_), %g1 use the R_SPARC_HI22 relocation instead of R_SPARC_PC22. Thus the program crashes. I couldn't figure out immediately how to fix this as the way LTO does spec overriding and such looked non-trivial. Thanks.