Re: Adding Leon processor to the SPARC list of processors
> Following the recent comments by Eric, the patch now sketches the > following setup: > > If multi-lib is wanted: > configure --with-cpu=leon ... : creates multilib-dir soft|v8 > combinations using [-msoft-float|-mcpu=sparcleonv8] (MULTILIB_OPTIONS = > msoft-float mcpu=sparcleonv8) > > If Single-lib is wanted: > configure --with-cpu=sparcleonv7 --with-float=soft --disable-multilib ... > : (v7 | soft | no-multilib) configure --with-cpu=sparcleonv8 > --with-float=soft --disable-multilib ... : (v8 | soft | no-multilib) > configure --with-cpu=sparcleonv7 --with-float=hard --disable-multilib ... > : (v7 | hard | no-multilib) configure --with-cpu=sparcleonv8 > --with-float=hard --disable-multilib ... : (v8 | hard | no-multilib) > > Using --with-cpu=leon|sparcleonv7|sparcleonv8 the the sparc_cpu is switched > to PROCESSOR_LEON. I'm mostly OK, but I don't think we need sparcleonv7 or sparcleonv8. Attached is another proposal, which: 1. Adds -mtune/--with-tune=leon for all SPARC targets. In particular, this mean that if you configure --target=sparc-{elf,rtems} --with-tune=leon, you get a multilib-ed compiler defaulting to V7/FPU and -mtune=leon, with V8 and NO-FPU libraries. 2. Adds new targets sparc-leon-{elf,linux}: multilib-ed compiler defaulting to V8/FPU and -mtune=leon, with V7 and NO-FPU libraries. 3. Adds new targets sparc-leon3-{elf,linux}: multilib-ed compiler defaulting to V8/FPU and -mtune=leon, with NO-FPU libraries. Singlelib-ed compilers are available through --disable-multilib and --with=cpu={v7,v8} --with-float={soft,hard} --with-tune=leon for sparc-{elf,rtems} or just --with=cpu={v7,v8} --with-float={soft,hard} for sparc-leon*-*. The rationale is that --with-cpu shouldn't change the set of multilibs, it is only the configure-time equivalent of -mcpu. This set of multilibs should only depend on the target and the presence of --disable-multilib. * config.gcc (sparc-*-elf*): Deal with sparc-leon specifically. (sparc-*-linux*): Likewise. (sparc*-*-*): Remove obsolete sparc86x setting. (sparc-leon*): Default to --with-cpu=v8 and --with-tune=leon. * doc/invoke.texi (SPARC Options): Document -mcpu/-mtune=leon. * config/sparc/sparc.h (TARGET_CPU_leon): Define. (TARGET_CPU_sparc86x): Delete. (TARGET_CPU_cypress): Define as alias to TARGET_CPU_v7. (TARGET_CPU_f930): Define as alias to TARGET_CPU_sparclite. (TARGET_CPU_f934): Likewise. (TARGET_CPU_tsc701): Define as alias to TARGET_CPU_sparclet. (CPP_CPU_SPEC): Add entry for -mcpu=leon. (enum processor_type): Add PROCESSOR_LEON. * config/sparc/sparc.c (leon_costs): New cost array. (sparc_option_override): Add entry for TARGET_CPU_leon and -mcpu=leon. Initialize cost array to leon_costs if -mtune=leon. * config/sparc/sparc.md (cpu attribute): Add leon. Include leon.md scheduling description. * config/sparc/leon.md: New file. * config/sparc/t-elf: Do not assemble Solaris startup files. * config/sparc/t-leon: New file. * config/sparc/t-leon3: Likewise. -- Eric Botcazou Index: doc/invoke.texi === --- doc/invoke.texi (revision 167022) +++ doc/invoke.texi (working copy) @@ -16917,8 +16917,8 @@ the rules of the a...@. @opindex mcpu Set the instruction set, register set, and instruction scheduling parameters for machine type @var{cpu_type}. Supported values for @var{cpu_type} are -...@samp{v7}, @samp{cypress}, @samp{v8}, @samp{supersparc}, @samp{sparclite}, -...@samp{f930}, @samp{f934}, @samp{hypersparc}, @samp{sparclite86x}, +...@samp{v7}, @samp{cypress}, @samp{v8}, @samp{supersparc}, @samp{hypersparc}, +...@samp{leon}, @samp{sparclite}, @samp{f930}, @samp{f934}, @samp{sparclite86x}, @samp{sparclet}, @samp{tsc701}, @samp{v9}, @samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara} and @samp{niagara2}. @@ -16931,7 +16931,7 @@ implementations. @smallexample v7: cypress -v8: supersparc, hypersparc +v8: supersparc, hypersparc, leon sparclite: f930, f934, sparclite86x sparclet: tsc701 v9: ultrasparc, ultrasparc3, niagara, niagara2 @@ -16984,9 +16984,9 @@ option @option{-mc...@var{cpu_type}} wou The same values for @option{-mc...@var{cpu_type}} can be used for @option{-mtu...@var{cpu_type}}, but the only useful values are those that select a particular cpu implementation. Those are @samp{cypress}, -...@samp{supersparc}, @samp{hypersparc}, @samp{f930}, @samp{f934}, -...@samp{sparclite86x}, @samp{tsc701}, @samp{ultrasparc}, -...@samp{ultrasparc3}, @samp{niagara}, and @samp{niagara2}. +...@samp{supersparc}, @samp{hypersparc}, @samp{leon}, @samp{f930}, @samp{f934}, +...@samp{sparclite86x}, @samp{tsc701}, @samp{ultrasparc}, @samp{ultrasparc3}, +...@samp{niagara}, and @samp{niagara2}. @item -mv8plu
Loop-iv.c ICEs on subregs
Zdenek, I'm investigating an ICE in loop-iv.c:get_biv_step(). I hope you can shed some light on what the correct fix would be. The ICE happens when processing: == (insn 111 (set (reg:SI 304) (plus (subreg:SI (reg:DI 251) 4) (const_int 1 (insn 177 (set (subreg:SI (reg:DI 251)) (reg:SI 304))) == The code like the above does not occur on current mainline early enough for loop-iv.c to catch it. The subregs above are produced by Tom's (CC'ed) extension elimination pass (scheduled before fwprop1) which is not in mainline yet [*]. The failure is due to assert in loop-iv.c:get_biv_step(): == gcc_assert ((*inner_mode == *outer_mode) != (*extend != UNKNOWN)); == i.e., inner and outer modes can differ iff there's an extend in the chain. Get_biv_step_1() starts with insn 177, then gets to insn 111, then loops back to insn 177 at which point it stops and returns GRD_MAYBE_BIV and sets: * outer_mode == DImode == natural mode of (reg A); * inner_mode == SImode == mode of (subreg (reg A)), set in get_biv_step_1: == if (GET_CODE (next) == SUBREG) { enum machine_mode amode = GET_MODE (next); if (GET_MODE_SIZE (amode) > GET_MODE_SIZE (*inner_mode)) return false; *inner_mode = amode; *inner_step = simplify_gen_binary (PLUS, outer_mode, *inner_step, *outer_step); *outer_step = const0_rtx; *extend = UNKNOWN; } == * extend == UNKNOWN as there are no extensions in the chain. It seems to me that computations of outer_mode and extend are correct, I'm not sure about inner_mode. Zdenek, what do you think is the right way to handle the above case in loop analysis? [*] http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01529.html Thanks, -- Maxim Kuvyrkov CodeSourcery +1-650-331-3385 x724
RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this would not only cost a data load from the target vector, but would also inhibit optimizations that replace division / modulo / multiply with shift or mask operations. So maybe we should look into having a few functional hooks that do common operations, i.e. bits_in_unitsx / BITS_PER_UNIT bits_in_units_ceil (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT bit_unit_remainder x % BITS_PER_UNIT units_in_bitsx * BITS_PER_UNIT Although we currently have some HOST_WIDE_INT uses, I hope using unsigned HOST_WIDE_INT as the argument / return type will generally work. tree.h also defines BITS_PER_UNIT_LOG, which (or its hook equivalent) should probably be used in all the places that use exact_log_2 (BITS_PER_UNIT), and, if it could be relied upon to exist, we could also use it as a substitute for the above hooks. However, this seems a bit iffy - we'd permanently forgo the possibility to have 6 / 7 / 36 bit etc. units. Similar arrangements could be made for BITS_PER_WORD and UNITS_PER_WORD, although these macros seem not quite so prevalent in the tree optimizers.
Re: Method to disable code SSE2 generation but still use -msse2
The last mysterious error message went away when the same code was compiled on a machine with a more recent gcc (4.4.1). Shortly after I hit the next roadblock. Here is foo.c (a modified version of sse2-cmpsd-1.c from the version 4.5.1 testsuite): >8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8 #ifndef CHECK_H #define CHECK_H "sse2-check.h" #endif #ifndef TEST #define TEST sse2_test #endif #include CHECK_H #include static __m128d __attribute__((noinline, unused)) test (__m128d s1, __m128d s2) { printf("test s1.x"); _mm_dump_fd(s1); printf("test s2.x"); _mm_dump_fd(s2); return _mm_add_pd (s1, s2); } static void TEST (void) { union128d u, s1, s2; double e[2]; s1.x = _mm_set_pd (2134.3343,1234.635654); s2.x = _mm_set_pd (41124.234,2344.2354); printf("s10 1 %lf %lf\n",s1.a[0],s1.a[1]); printf("s20 1 %lf %lf\n",s2.a[0],s2.a[1]); printf("s1.x"); _mm_dump_fd(s1.x); printf("s2.x"); _mm_dump_fd(s2.x); u.x = test (s1.x, s2.x); e[0] = s1.a[0] + s2.a[0]; e[1] = s1.a[1] + s2.a[1]; printf("s1.x"); _mm_dump_fd(s1.x); printf("s2.x"); _mm_dump_fd(s2.x); printf("expected e0 e1 %lf %lf\n",e[0],e[1]); printf("result r0 r1 %lf %lf\n",u.a[0],u.a[1]); if (check_union128d (u, e)) abort (); } >8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8>8>8<8 When compiled with -mno-sse2 the run fails. Bizarrely, it seems to be passing data into the test function incorrectly, notice that in test the low double in s2 is the high double in s1, instead of the original low double in s2 from outside the calling function. This erroneous value propagates into my inline code where it is added (correctly, but of course to the wrong final sum since the inputs were wrong). gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG -O1 -o foo_wno foo.c ./foo_wno mm_set_pd, in 2134.334300 1234.635654 mm_set_pd, in 41124.234000 2344.235400 s10 1 1234.635654 2134.334300 s20 1 2344.235400 41124.234000 s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 test s1.xDEBUG m_d_fd: 1234.635654 2134.334300 test s2.xDEBUG m_d_fd: 2134.334300 41124.234000 IN _mm_add_pd __ADEBUG m_d_fd: 1234.635654 2134.334300 __BDEBUG m_d_fd: 2134.334300 41124.234000 s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 expected e0 e1 3578.871054 43258.568300 result r0 r1 3368.969954 43258.568300 Aborted when -msse2 is enabled however, the parameters are passed appropriately into test (and my inlined function), and the program works. Here the pass to the test function is correct, and that propagates into my inline function correctly too: gcc -Wall -msse -msse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG -O1 -o foo_nono foo.c [r...@newsaf i386]# ./foo_nono mm_set_pd, in 2134.334300 1234.635654 mm_set_pd, in 41124.234000 2344.235400 s10 1 1234.635654 2134.334300 s20 1 2344.235400 41124.234000 s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 test s1.xDEBUG m_d_fd: 1234.635654 2134.334300 test s2.xDEBUG m_d_fd: 2344.235400 41124.234000 IN _mm_add_pd __ADEBUG m_d_fd: 1234.635654 2134.334300 __BDEBUG m_d_fd: 2344.235400 41124.234000 s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 expected e0 e1 3578.871054 43258.568300 result r0 r1 3578.871054 43258.568300 Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
I think quite a lot of front end uses of BITS_PER_UNIT should really be TYPE_PRECISION (char_type_node) (which in general I'd consider preferred to CHAR_TYPE_SIZE in the front ends). Though it's pretty poorly defined what datastructures should look like if target "char" in the front ends is wider than the instruction-set unit of BITS_PER_UNIT bits. If something relates to an interface to a lower-level part of the compiler then BITS_PER_UNIT is probably right - but if somethis relates to whether a type is a variant of char, or to alignment of a non-bit-field object (you can't have smaller than char alignment), or things like that, then TYPE_PRECISION (char_type_node) may be better. Note that BITS_PER_UNIT is used in code built for the target (libgcc2.c, dfp-bit.h, fixed-bit.h, fp-bit.h, libobjc/encoding.c, ...), and converting it to a hook requires eliminating those uses. __CHAR_BIT__ is a suitable replacement, at least if the code really cares about char - which is the case whenever the value is multiplied by the result of "sizeof". Some questions about machine modes might most usefully be answered by predefined macros giving properties of particular machine modes. -- Joseph S. Myers jos...@codesourcery.com
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting "Joseph S. Myers" : If something relates to an interface to a lower-level part of the compiler then BITS_PER_UNIT is probably right - but if somethis relates to whether a type is a variant of char, or to alignment of a non-bit-field object (you can't have smaller than char alignment), or things like that, then TYPE_PRECISION (char_type_node) may be better. Yes, I see examples for both in the C++ front end. The tree optimizers seem mostly (or entirely?) concerned with the addressable unit size. Note that BITS_PER_UNIT is used in code built for the target (libgcc2.c, dfp-bit.h, fixed-bit.h, fp-bit.h, libobjc/encoding.c, ...), and converting it to a hook requires eliminating those uses. Full conversion does. For the moment I would be content with a partial conversion so that not every tree optimizer that currently uses BITS_PER_UNIT has to include tm.h itself once the bogus tm.h includes from target.h / function.h / gimple.h are gone.
Re: Method to disable code SSE2 generation but still use -msse2
I have found several ways to "fix" the latest issue, but they all boil down to never passing an __m128d value on the call stack. For instance change static __m128d __attribute__((noinline, unused)) test (__m128d s1, __m128d s2) to static __m128d test (__m128d s1, __m128d s2) and the program works. Similarly, change the function to static __m128d __attribute__((noinline)) test (__m128d *s1, __m128d *s2) { return _mm_add_pd (*s1, *s2); } and it also works. Things I tried to force a 16 byte stack alignment that didn't work: 1 -mstackrealign 2 -mpreferred-stack-boundary=4 3 -mincoming-stack-boundary=4 4 2 and 3 5 1 and 2 and 3 I guess the bigger question is why can an __m128d be passed on the call stack reliably when -msse2 is invoked, but not otherwise? If the compiler cannot do this reliably shouldn't it throw an error or warning? Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: Method to disable code SSE2 generation but still use -msse2
> Things I tried to force a 16 byte stack alignment that didn't work: > > 1 -mstackrealign > 2 -mpreferred-stack-boundary=4 > 3 -mincoming-stack-boundary=4 > 4 2 and 3 > 5 1 and 2 and 3 And this is why they didn't work. Change the test function to static __m128d __attribute__((noinline,aligned (16))) test ( __m128d s1, __m128d s2) { printf("test s1"); _mm_dump_fd(s1); printf("test s2"); _mm_dump_fd(s2); printf("loc s1 %p\n",&s1); printf("loc s2 %p\n",&s2); return _mm_add_pd (s1, s2); } compile and run gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG -O1 -o foo_wno foo.c [r...@newsaf i386]# ./foo_wno mm_set_pd, in 2134.334300 1234.635654 mm_set_pd, in 41124.234000 2344.235400 s10 1 1234.635654 2134.334300 s20 1 2344.235400 41124.234000 s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 test s1DEBUG m_d_fd: 1234.635654 2134.334300 test s2DEBUG m_d_fd: 2134.334300 41124.234000 loc s1 0x7fff6b6ccb10 <-- loc s2 0x7fff6b6ccb00 <-- s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 expected e0 e1 3578.871054 43258.568300 result r0 r1 3368.969954 43258.568300 Aborted s1 and s2 within test are already 16 byte aligned, so the extra alignment switches did not help. Somehow this code u.x = test (s1.x, s2.x); is putting the wrong values for s2 onto the call stack. Bizarre. Either I'm missing something or turning off SSE2 is uncovering a compiler bug. Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: Method to disable code SSE2 generation but still use -msse2
I renamed the test case gccprob.c and made two binaries and two assembler files: gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -o gccprob_wno gccprob.c gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -S -o gccprob_wno.s gccprob.c gcc -Wall -msse -msse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -S -o gccprob_nono.s gccprob.c gcc -Wall -msse -msse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -o gccprob_nono gccprob.c The _wno variants have the problem passing __m128d on the stack, the _nono varients do not. packed up all 5 files and put them here (retrieve only directory, no directory listings in pickup): http://saf.bio.caltech.edu/pub/pickup/gccprob.tar.gz I am not an assembler programmer. If one of you who is could have a look at the two .s files maybe we can get to the bottom of this. Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
GCC Intermodule Analysis for Go
Hello, I have been working on my PhD thesis and I want to focus on the Go language. I know Ian Taylor has done tons of work regarding the Go frontend for gcc. Likewise, I know gcc implements SSA and even link-time optimization. For my specific research I will need to do some intermodule analysis. I know gcc has link-time optimization, however I might, for my purposes, need to add additional information to the object files that would allow my specific optimization of a Go program to aid other compiled modules/translation-units. Ideally, my implementation, I would hope, would translate nice to gogo then to GIMPLE. In the short term I would like to use this intermodule analysis to give enough information to the compiler so that when a module/object-file is recompiled the changed routines and dependent routines would be the only aspects recompiled, instead of having to recompile an entire object file each time a small change is made. Thoughts? Is this even feasible? -Matt
Re: Method to disable code SSE2 generation but still use -msse2
The problem is specific for 64 bit environments, made these: gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -m32 -S -o gccprob_wno32.s gccprob.c gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -m32 -o gccprob_wno32 gccprob.c gcc -Wall -msse -msse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -m32 -o gccprob_nono32 gccprob.c gcc -Wall -msse -msse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG \ -O0 -m32 -S -o gccprob_nono32.s gccprob.c and both binaries work correctly. Added them to the set here: http://saf.bio.caltech.edu/pub/pickup/gccprob.tar.gz Specifics on the environment where the problem is seen: OS: Mandriva Linux release 2010.0 (Official) for x86_64 gcc (GCC) 4.4.1 Dual Dual Core Opteron 280. Arima HDAMAI motherboard. 64 bit targets only, 32 bit is OK. Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
gcc-4.4-20101123 is now available
Snapshot gcc-4.4-20101123 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20101123/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 167096 You'll find: gcc-4.4-20101123.tar.bz2 Complete GCC (includes all of below) MD5=03ae257bfd6a0adde7b2c6fff9a13c28 SHA1=3afa1b3cdab91775e588f34a55a65e1908318fff gcc-core-4.4-20101123.tar.bz2C front end and core compiler MD5=b52fe749825c8a33f4390722f1bee788 SHA1=d2943c6c6f72ebc73dc94e150990f59ea379a120 gcc-ada-4.4-20101123.tar.bz2 Ada front end and runtime MD5=e3a277eb349c166750083ac7d698b868 SHA1=52133c3d40f7f997d676a846cd1999c8421eb4d4 gcc-fortran-4.4-20101123.tar.bz2 Fortran front end and runtime MD5=543a3b27e0701d674239511d8d0021b4 SHA1=e1675a7f47f9a832181a57911906f7043565b46a gcc-g++-4.4-20101123.tar.bz2 C++ front end and runtime MD5=752805fd4dff37ab24ed2afba1d4d626 SHA1=f44c71e9785e8e2a79b188aadee74e033ac4b71d gcc-java-4.4-20101123.tar.bz2Java front end and runtime MD5=7bbeb90b4fd6fbb0ebcb2e484913f4aa SHA1=6e3ec6b34d093bb52448d510b8b3f328f99ceecd gcc-objc-4.4-20101123.tar.bz2Objective-C front end and runtime MD5=ecde0a1ac24b43b8d24ef8f8551c27c6 SHA1=40b6e546b787333f9a28fdbd9d9efbe80cef8add gcc-testsuite-4.4-20101123.tar.bz2 The GCC testsuite MD5=de6a29b6f6fd2e220e6646dd14b6fba7 SHA1=5854bb5ab6d240d057c2fb2b022c4aa4f7198d22 Diffs from 4.4-20101116 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Method to test all sse2 calls?
What is: __builtin_ia32_vec_ext_v2df ??? It wasn't in the original emmintrin.h, so presumably isn't actually part of SSE2, but it is present in the testsuite, and it is not visible to the compiler when -mno-sse2 is set. See for instance the files sse2-vec-#.c. (Randomly selected) Example: sse2-vec-4.c: res[2] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 2); gcc -Wall -msse -mno-sse2 -I. -m32 -lm -DSOFT_SSE2 -o foo sse2-vec-4.c sse2-vec-4.c: In function 'sse2_test': sse2-vec-4.c:27: warning: implicit declaration of function '__builtin_ia32_vec_ext_v8hi' /root/tmp/ccYAq3IB.o: In function `sse2_test': sse2-vec-4.c:(.text+0x58c): undefined reference to `__builtin_ia32_vec_ext_v8hi' . . . /root/tmp/ccYAq3IB.o:sse2-vec-4.c:(.text+0x613): more undefined references to `__builtin_ia32_vec_ext_v8hi' follow collect2: ld returned 1 exit status Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: GCC Intermodule Analysis for Go
Matt Davis writes: > I have been working on my PhD thesis and I want to focus on the Go > language. I know Ian Taylor has done tons of work regarding the Go > frontend for gcc. Likewise, I know gcc implements SSA and even > link-time optimization. For my specific research I will need to do > some intermodule analysis. I know gcc has link-time optimization, > however I might, for my purposes, need to add additional information > to the object files that would allow my specific optimization of a Go > program to aid other compiled modules/translation-units. Ideally, my > implementation, I would hope, would translate nice to gogo then to > GIMPLE. In the short term I would like to use this intermodule > analysis to give enough information to the compiler so that when a > module/object-file is recompiled the changed routines and dependent > routines would be the only aspects recompiled, instead of having to > recompile an entire object file each time a small change is made. > > Thoughts? Is this even feasible? I think the frontend work is entirely feasible in Go. It would be difficult to do entirely correctly in C++ because of the complex name lookup rules. But Go has simple name lookup, so identifying which parts of a program depends on which other parts should be more or less straightforward. As far as translating the information to GIMPLE, and taking advantage of it in the optimizers, it kind of depends on what kind of information you are thinking about. Ian
Re: Method to test all sse2 calls?
"David Mathog" writes: > What is: > > __builtin_ia32_vec_ext_v2df > > ??? It's a gcc builtin function, not to be confused with an SSE intrinsic function. > It wasn't in the original emmintrin.h, so presumably isn't actually > part of SSE2, but it is present in the testsuite, and it is not visible > to the compiler when -mno-sse2 is set. See for instance the files > sse2-vec-#.c. (Randomly selected) Example: > > sse2-vec-4.c: res[2] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 2); Tests that directly invoke __builtin functions are not appropriate for your replacement for emmintrin.h. Ian