Re: Using LTO-enabled libgfortran
On Wed, Aug 17, 2016 at 8:36 AM, Thomas Koenig wrote: > Am 16.08.2016 um 20:57 schrieb Richard Biener: >> >> On August 16, 2016 7:11:26 PM GMT+02:00, Thomas Koenig >> wrote: >>> >>> What would it take to use an LTO-enabled version of gfortran? >>> >>> It could turn out to be quite useful for speeding up programs, >>> especially where I/O or array intrinsics are used. >>> >>> I also expect many issues to surface where libgfortran is >>> playing with types in a way that could break LTO, so I >>> would not expect this to be an easy thing. >>> >>> So, ideas anybody? I don't think any other library included >>> with gcc does this, correct? >> >> >> Correct. My advice is to simply try. > > > I did that, with some interesting results. > > Just putting -flto -ffat-lto-objects into every CFLAG and FFLAG > available into the Makefile of libgfortran (autoconf can come later :-), > recompiling gfortran and trying out the program > > program main > real, dimension(10) :: a > call random_number(a) > write (*,'(E12.5)',advance="no") a > end program main > > led to > > lto1: warning: type of '_gfortran_st_write' does not match original > declaration [-Wlto-type-mismatch] > ../../../trunk/libgfortran/io/transfer.c:3746:1: note: 'st_write' was > previously declared here > ../../../trunk/libgfortran/io/transfer.c:3746:1: note: code may be > misoptimized unless -fno-strict-aliasing is used > lto1: warning: type of '_gfortran_transfer_array_write' does not match > original declaration [-Wlto-type-mismatch] > ../../../trunk/libgfortran/io/transfer.c:2195:1: note: > 'transfer_array_write' was previously declared here > ../../../trunk/libgfortran/io/transfer.c:2195:1: note: code may be > misoptimized unless -fno-strict-aliasing is used > lto1: warning: type of '_gfortran_st_write_done' does not match original > declaration [-Wlto-type-mismatch] > ../../../trunk/libgfortran/io/transfer.c:3756:1: note: 'st_write_done' was > previously declared here > ../../../trunk/libgfortran/io/transfer.c:3756:1: note: code may be > misoptimized unless -fno-strict-aliasing is used > > So, the expected surprises appeared... I suspect the FE is currently just not very careful in exactly replicating the implementations strucure types - sth that shouldn't be too hard to fix. OTOH if this is another case of variable-length vs. fixed-length then it might be harder. > From the disassembly, I could also see that LTO had done some things; > there were references to functions like > _gfortrani_gf_strerror.constprop.18, write_decimal.isra.5.constprop.11, > nml_parse_qualifier.constprop.15 and _gfortran_arandom_r4.constprop.3. Good - that means it "worked" to some extent. > So, worth investigating. I'll open a PR. Thanks. >> Note this will work only for static libgfortran. Also note that since >> the LTO option scheme changed to preserve compile-time optimization >> and target attributes LTOing libgfortran will be less useful than >> before (you won't get any advantage from extra available ISAs). > > > Once every other problem has been solved :-) I am sure we can address > this one. Yeah, I think Honza decided that this usage model for fat objects (shipping libraries with bytecode) wasn't useful and thus the behavior change. I suspect we could add some fancy -flto-allow-option-override mechanics to avoid "fixing" the options in selected cases. It might be a little complicated to get it fool-proof (we can't allow arbitrary changes in late options, esp. for target options). Richard. > Regards > > Thomas
Re: Possible missed optimization opportunity with const?
On 17/08/16 02:21, Toshi Morita wrote: I was involved in a discussion over the semantics of "const" in C, and the following code was posted: #include int foo = 0; const int *pfoo = &foo; void bar (void) { foo +=3D; I assume that's a typo? } int main(void) { int a, b; a = *pfoo; bar(); b = *pfoo; printf("a: %d, b: %d\n", a, b); } This code when compiled with gcc 4.8.2 using the optimization option -O3 produces: a: 0, b: 1 So it appears even though pfoo is a const int *, the value *pfoo is read twice. Would it be valid for the code to print a:0, b: 0? If so, is this a missed optimization opportunity? No, it would not be valid. Declaring pfoo as a "const int*" tells the compiler "I will not change anything via this pointer - and you can optimise based on that promise". It does /not/ tell the compiler "the thing that this points to will not change". So the compiler is correct in reading *pfoo twice.
Re: Possible missed optimization opportunity with const?
In your example the compiler is not given the guarantee that the object 'foo' in question can only be modified through the pointer. We can make such guarantee by adding the `restrict` qualifier to the pointer, like this: const int *restrict pfoo = &foo; With -O3 on GCC 6.1 the modified code produces: a: 1, b: 1 However as long as there is a restrict pointer pointing to an object, modifying it _not_ through that pointer results in undefined behavior. -- Best regards, lh_mouse 2016-08-17 - 发件人:Toshi Morita 发送日期:2016-08-17 08:21 收件人:gcc@gcc.gnu.org 抄送: 主题:Possible missed optimization opportunity with const? I was involved in a discussion over the semantics of "const" in C, and the following code was posted: #include int foo = 0; const int *pfoo = &foo; void bar (void) { foo +=3D; } int main(void) { int a, b; a = *pfoo; bar(); b = *pfoo; printf("a: %d, b: %d\n", a, b); } This code when compiled with gcc 4.8.2 using the optimization option -O3 produces: a: 0, b: 1 So it appears even though pfoo is a const int *, the value *pfoo is read twice. Would it be valid for the code to print a:0, b: 0? If so, is this a missed optimization opportunity? Toshi
Name of libgcc.so.1 with suffix?
Hi, If I run configure with "--program-suffix=6", I get gcc6, gfortran6, etc. When ldd looks for libgcc.so.1 on FreeBSD, she finds the wrong one. % cat foo.f90 program foo print *, 'Hello' end program % gfortran6 -o z foo.f90 && ./z /lib/libgcc_s.so.1: version GCC_4.6.0 required by \ /usr/local/lib/gcc6/libgfortran.so.3 not found % ldconfig -r | grep libgcc 6:-lgcc_s.1 => /lib/libgcc_s.so.1 735:-lgcc_s.1 => /usr/local/lib/gcc6/libgcc_s.so.1 Is it possible to add a suffix to libgcc.so, e.g., libgcc6.so.1? -- Steve
Re: Supporting subreg style patterns
On 08/16/2016 03:10 AM, shmuel gutl wrote: My hardware directly supports instructions of the form subreg:SI(reg:VEC v1,3) = SI:a1 Subregs of hard registers should be avoided. They are primarily useful for pseudo regs. Subregs that aren't lowpart subregs should be avoided also. Except when you have a subreg of a pseudo that maps to multiple hard regs, and can eventually become a lowpart subreg after the pseudo gets allocated to a hard reg and gets simplified. It isn't clear where the subregs are coming from, but what you are doing sounds like a bit-field extract/insert, and these are not operations that the register allocator will add to the code. Depending on what exactly you are trying to do, I have two general suggestions. 1) Define the vector registers as 32-bit registers, and define vector operations as using aligned groups of these 32-bit registers. This exposes the 32-bit registers to the register allocator so that it can use them directly. 2) Use zero_extract and/or vec_select instead of subreg, which requires that you have patterns that emit the zero_extract/vec_select operations, patterns that recognize them, and possibly builtin functions that the user can call to get these zero_extract/vec_select operations emitted into the rtl. There is a named pattern vec_extract that the vectorizer can use to generate these rtl operations. For examples of this, in the aarch64 port, see for instance the aarch64_movdi_* patterns in the aarch64.md file, and the aarch64_get_lane* patterns in the aarch64-simd.md file. Jim
Re: Help with implementing Wine optimization experiment
On 08/15/2016 05:46 AM, Florian Weimer wrote: On 08/14/2016 08:23 AM, Daniel Santos wrote: ms_abi_push_regs: pop%rax push %rdi push %rsi sub$0xa8,%rsp movaps %xmm6,(%rsp) movaps %xmm7,0x10(%rsp) movaps %xmm8,0x20(%rsp) movaps %xmm9,0x30(%rsp) movaps %xmm10,0x40(%rsp) movaps %xmm11,0x50(%rsp) movaps %xmm12,0x60(%rsp) movaps %xmm13,0x70(%rsp) movaps %xmm14,0x80(%rsp) movaps %xmm15,0x90(%rsp) jmp *(%rax) I think this will be quite slow because it breaks the return stack optimization in the CPU. I think you should push the return address and use RET. Florian Looks like I forgot to reply-all on my last reply, but thanks again for the advice here. Would there be any performance hit to reshuffling the push/pops to save the 8 byte alignment padding? My assumption is that the stack will always be 16-byte aligned with the 8-byte return address of the last call on it, so offset by 8 bytes. (Also, not sure that I need the .type directive, was copying other code in libgcc :) .text .global __msabi_save .hidden__msabi_save #ifdef __ELF__ .type__msabi_save,@function #endif /* TODO: implement vmovaps when supported?*/ __msabi_save: #ifdef __x86_64__ pop%rax push %rdi sub$0xa0,%rsp movaps %xmm6,(%rsp) movaps %xmm7,0x10(%rsp) movaps %xmm8,0x20(%rsp) movaps %xmm9,0x30(%rsp) movaps %xmm10,0x40(%rsp) movaps %xmm11,0x50(%rsp) movaps %xmm12,0x60(%rsp) movaps %xmm13,0x70(%rsp) movaps %xmm14,0x80(%rsp) movaps %xmm15,0x90(%rsp) push %rsi push %rax #endif /* __x86_64__ */ ret .text .global __msabi_restore .hidden__msabi_restore #ifdef __ELF__ .type__msabi_restore,@function #endif __msabi_restore: #ifdef __x86_64__ pop%rsi movaps (%rsp),%xmm6 movaps 0x10(%rsp),%xmm7 movaps 0x20(%rsp),%xmm8 movaps 0x30(%rsp),%xmm9 movaps 0x40(%rsp),%xmm10 movaps 0x50(%rsp),%xmm11 movaps 0x60(%rsp),%xmm12 movaps 0x70(%rsp),%xmm13 movaps 0x80(%rsp),%xmm14 movaps 0x90(%rsp),%xmm15 add$0xa0,%rsp pop%rdi #endif /* __x86_64__ */ ret Thanks! Daniel
Re: Help with implementing Wine optimization experiment
I'm stuck on generating a jmp to the epilogue as I can't find any examples of this. This is the summarized version of what I'm doing: rtx msabi_restore_fn, jump_insn; msabi_restore_fn = gen_rtx_SYMBOL_REF (Pmode, "__msabi_restore"); SYMBOL_REF_FLAGS (msabi_restore_fn) |= SYMBOL_FLAG_LOCAL; jump_insn = gen_rtx_SET (VOIDmode, pc_rtx, gen_rtx_MEM (QImode, msabi_restore_fn)); emit_insn (jump_insn); Unfortunately, it dies with: ../a.c: In function ‘my_ms_sysv’: ../a.c:7:1: error: unrecognizable insn: } ^ (insn 15 14 8 2 (set/f (pc) (mem:QI (symbol_ref:DI ("__msabi_restore") [flags 0x2]) [0 S1 A8])) ../a.c:7 -1 (nil)) ../a.c:7:1: internal compiler error: in extract_insn, at recog.c:2343 0xc195e1 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc/rtl-error.c:110 0xc19622 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc/rtl-error.c:118 0xbcd683 extract_insn(rtx_insn*) ../../gcc/recog.c:2343 0xbcd37c extract_constrain_insn(rtx_insn*) ../../gcc/recog.c:2244 0xbdc310 copyprop_hardreg_forward_1 ../../gcc/regcprop.c:793 0xbdd97d execute ../../gcc/regcprop.c:1289 How should I generate this jmp? All of the various helper functions for generating a jump appear to be tailored for using a label and I'm using a symbol. I haven't yet attached all of the various notes to the insn yet. My call to the prologue routine is working great though! Thanks, Daniel