Re: Using LTO-enabled libgfortran

2016-08-17 Thread Richard Biener
On Wed, Aug 17, 2016 at 8:36 AM, Thomas Koenig  wrote:
> Am 16.08.2016 um 20:57 schrieb Richard Biener:
>>
>> On August 16, 2016 7:11:26 PM GMT+02:00, Thomas Koenig
>>  wrote:
>>>
>>> What would it take to use an LTO-enabled version of gfortran?
>>>
>>> It could turn out to be quite useful for speeding up programs,
>>> especially where I/O or array intrinsics are used.
>>>
>>> I also expect many issues to surface where libgfortran is
>>> playing with types in a way that could break LTO, so I
>>> would not expect this to be an easy thing.
>>>
>>> So, ideas anybody?  I don't think any other library included
>>> with gcc does this, correct?
>>
>>
>> Correct.  My advice is to simply try.
>
>
> I did that, with some interesting results.
>
> Just putting -flto -ffat-lto-objects into every CFLAG and FFLAG
> available into the Makefile of libgfortran (autoconf can come later :-),
> recompiling gfortran and trying out the program
>
> program main
>   real, dimension(10) :: a
>   call random_number(a)
>   write (*,'(E12.5)',advance="no") a
> end program main
>
> led to
>
> lto1: warning: type of '_gfortran_st_write' does not match original
> declaration [-Wlto-type-mismatch]
> ../../../trunk/libgfortran/io/transfer.c:3746:1: note: 'st_write' was
> previously declared here
> ../../../trunk/libgfortran/io/transfer.c:3746:1: note: code may be
> misoptimized unless -fno-strict-aliasing is used
> lto1: warning: type of '_gfortran_transfer_array_write' does not match
> original declaration [-Wlto-type-mismatch]
> ../../../trunk/libgfortran/io/transfer.c:2195:1: note:
> 'transfer_array_write' was previously declared here
> ../../../trunk/libgfortran/io/transfer.c:2195:1: note: code may be
> misoptimized unless -fno-strict-aliasing is used
> lto1: warning: type of '_gfortran_st_write_done' does not match original
> declaration [-Wlto-type-mismatch]
> ../../../trunk/libgfortran/io/transfer.c:3756:1: note: 'st_write_done' was
> previously declared here
> ../../../trunk/libgfortran/io/transfer.c:3756:1: note: code may be
> misoptimized unless -fno-strict-aliasing is used
>
> So, the expected surprises appeared...

I suspect the FE is currently just not very careful in exactly replicating the
implementations strucure types - sth that shouldn't be too hard to fix.  OTOH
if this is another case of variable-length vs. fixed-length then it
might be harder.

> From the disassembly, I could also see that LTO had done some things;
> there were references to functions like
> _gfortrani_gf_strerror.constprop.18, write_decimal.isra.5.constprop.11,
> nml_parse_qualifier.constprop.15 and _gfortran_arandom_r4.constprop.3.

Good - that means it "worked" to some extent.

> So, worth investigating. I'll open a PR.

Thanks.

>>  Note this will work only for static libgfortran.  Also note that since
>> the  LTO option scheme changed to preserve compile-time optimization
>> and target attributes LTOing libgfortran will be less useful than
>> before (you won't get any advantage from extra available ISAs).
>
>
> Once every other problem has been solved :-) I am sure we can address
> this one.

Yeah, I think Honza decided that this usage model for fat objects (shipping
libraries with bytecode) wasn't useful and thus the behavior change.  I suspect
we could add some fancy -flto-allow-option-override mechanics to avoid
"fixing" the options in selected cases.  It might be a little complicated to
get it fool-proof (we can't allow arbitrary changes in late options, esp. for
target options).

Richard.

> Regards
>
> Thomas


Re: Possible missed optimization opportunity with const?

2016-08-17 Thread David Brown

On 17/08/16 02:21, Toshi Morita wrote:

I was involved in a discussion over the semantics of "const" in C, and the 
following code was posted:

#include 
int foo = 0;
const int *pfoo = &foo;
void bar (void)
{
 foo +=3D;


I assume that's a typo?


}
int main(void)
{
int a, b;
a = *pfoo;
  bar();
  b = *pfoo;
printf("a: %d, b: %d\n", a, b);
}


This code when compiled with gcc 4.8.2 using the optimization option -O3 
produces:

a: 0, b: 1


So it appears even though pfoo is a const int *, the value *pfoo is read twice.

Would it be valid for the code to print a:0, b: 0?
If so, is this a missed optimization opportunity?



No, it would not be valid.  Declaring pfoo as a "const int*" tells the 
compiler "I will not change anything via this pointer - and you can 
optimise based on that promise".  It does /not/ tell the compiler "the 
thing that this points to will not change".


So the compiler is correct in reading *pfoo twice.



Re: Possible missed optimization opportunity with const?

2016-08-17 Thread lhmouse
In your example the compiler is not given the guarantee that
the object 'foo' in question can only be modified through the pointer.

We can make such guarantee by adding the `restrict` qualifier
to the pointer, like this:

const int *restrict pfoo = &foo;

With -O3 on GCC 6.1 the modified code produces:

a: 1, b: 1

However as long as there is a restrict pointer pointing to an object,
modifying it _not_ through that pointer results in undefined behavior.

--   
Best regards,
lh_mouse
2016-08-17

-
发件人:Toshi Morita 
发送日期:2016-08-17 08:21
收件人:gcc@gcc.gnu.org
抄送:
主题:Possible missed optimization opportunity with const?

I was involved in a discussion over the semantics of "const" in C, and the 
following code was posted: 

#include 
int foo = 0;
const int *pfoo = &foo;
void bar (void)
{
foo +=3D;
}
int main(void)
{
   int a, b;
   a = *pfoo;
 bar();
 b = *pfoo;
   printf("a: %d, b: %d\n", a, b);
}
 

This code when compiled with gcc 4.8.2 using the optimization option -O3 
produces: 

a: 0, b: 1 


So it appears even though pfoo is a const int *, the value *pfoo is read twice. 

Would it be valid for the code to print a:0, b: 0?
If so, is this a missed optimization opportunity?

Toshi 




Name of libgcc.so.1 with suffix?

2016-08-17 Thread Steve Kargl
Hi,

If I run configure with "--program-suffix=6", I get gcc6, gfortran6, etc.
When ldd looks for libgcc.so.1 on FreeBSD, she finds the wrong one.

% cat foo.f90
program foo
   print *, 'Hello'
end program
% gfortran6 -o z foo.f90 && ./z
/lib/libgcc_s.so.1: version GCC_4.6.0 required by \
/usr/local/lib/gcc6/libgfortran.so.3 not found

% ldconfig -r | grep libgcc
6:-lgcc_s.1 => /lib/libgcc_s.so.1
735:-lgcc_s.1 => /usr/local/lib/gcc6/libgcc_s.so.1

Is it possible to add a suffix to libgcc.so, e.g., libgcc6.so.1? 

-- 
Steve


Re: Supporting subreg style patterns

2016-08-17 Thread Jim Wilson

On 08/16/2016 03:10 AM, shmuel gutl wrote:

My hardware directly supports instructions of the form
subreg:SI(reg:VEC v1,3) = SI:a1


Subregs of hard registers should be avoided.  They are primarily useful 
for pseudo regs.  Subregs that aren't lowpart subregs should be avoided 
also.  Except when you have a subreg of a pseudo that maps to multiple 
hard regs, and can eventually become a lowpart subreg after the pseudo 
gets allocated to a hard reg and gets simplified.


It isn't clear where the subregs are coming from, but what you are doing 
sounds like a bit-field extract/insert, and these are not operations 
that the register allocator will add to the code.  Depending on what 
exactly you are trying to do, I have two general suggestions.


1) Define the vector registers as 32-bit registers, and define vector 
operations as using aligned groups of these 32-bit registers.  This 
exposes the 32-bit registers to the register allocator so that it can 
use them directly.


2) Use zero_extract and/or vec_select instead of subreg, which requires 
that you have patterns that emit the zero_extract/vec_select operations, 
patterns that recognize them, and possibly builtin functions that the 
user can call to get these zero_extract/vec_select operations emitted 
into the rtl.  There is a named pattern vec_extract that the vectorizer 
can use to generate these rtl operations.  For examples of this, in the 
aarch64 port, see for instance the aarch64_movdi_* patterns in the 
aarch64.md file, and the aarch64_get_lane* patterns in the 
aarch64-simd.md file.


Jim



Re: Help with implementing Wine optimization experiment

2016-08-17 Thread Daniel Santos

On 08/15/2016 05:46 AM, Florian Weimer wrote:

On 08/14/2016 08:23 AM, Daniel Santos wrote:


ms_abi_push_regs:
pop%rax
push   %rdi
push   %rsi
sub$0xa8,%rsp
movaps %xmm6,(%rsp)
movaps %xmm7,0x10(%rsp)
movaps %xmm8,0x20(%rsp)
movaps %xmm9,0x30(%rsp)
movaps %xmm10,0x40(%rsp)
movaps %xmm11,0x50(%rsp)
movaps %xmm12,0x60(%rsp)
movaps %xmm13,0x70(%rsp)
movaps %xmm14,0x80(%rsp)
movaps %xmm15,0x90(%rsp)
jmp   *(%rax)


I think this will be quite slow because it breaks the return stack 
optimization in the CPU.  I think you should push the return address 
and use RET.


Florian



Looks like I forgot to reply-all on my last reply, but thanks again for 
the advice here. Would there be any performance hit to reshuffling the 
push/pops to save the 8 byte alignment padding? My assumption is that 
the stack will always be 16-byte aligned with the 8-byte return address 
of the last call on it, so offset by 8 bytes. (Also, not sure that I 
need the .type directive, was copying other code in libgcc :)


.text
.global __msabi_save
.hidden__msabi_save

#ifdef __ELF__
.type__msabi_save,@function
#endif

/* TODO: implement vmovaps when supported?*/
__msabi_save:
#ifdef __x86_64__
pop%rax
push   %rdi
sub$0xa0,%rsp
movaps %xmm6,(%rsp)
movaps %xmm7,0x10(%rsp)
movaps %xmm8,0x20(%rsp)
movaps %xmm9,0x30(%rsp)
movaps %xmm10,0x40(%rsp)
movaps %xmm11,0x50(%rsp)
movaps %xmm12,0x60(%rsp)
movaps %xmm13,0x70(%rsp)
movaps %xmm14,0x80(%rsp)
movaps %xmm15,0x90(%rsp)
push   %rsi
push   %rax
#endif /* __x86_64__ */
ret

.text
.global __msabi_restore
.hidden__msabi_restore
#ifdef __ELF__
.type__msabi_restore,@function
#endif

__msabi_restore:
#ifdef __x86_64__
pop%rsi
movaps (%rsp),%xmm6
movaps 0x10(%rsp),%xmm7
movaps 0x20(%rsp),%xmm8
movaps 0x30(%rsp),%xmm9
movaps 0x40(%rsp),%xmm10
movaps 0x50(%rsp),%xmm11
movaps 0x60(%rsp),%xmm12
movaps 0x70(%rsp),%xmm13
movaps 0x80(%rsp),%xmm14
movaps 0x90(%rsp),%xmm15
add$0xa0,%rsp
pop%rdi
#endif /* __x86_64__ */
ret

Thanks!
Daniel


Re: Help with implementing Wine optimization experiment

2016-08-17 Thread Daniel Santos
I'm stuck on generating a jmp to the epilogue as I can't find any 
examples of this. This is the summarized version of what I'm doing:


rtx msabi_restore_fn, jump_insn;

msabi_restore_fn = gen_rtx_SYMBOL_REF (Pmode, "__msabi_restore");
SYMBOL_REF_FLAGS (msabi_restore_fn) |= SYMBOL_FLAG_LOCAL;
jump_insn = gen_rtx_SET (VOIDmode, pc_rtx, gen_rtx_MEM (QImode, 
msabi_restore_fn));

emit_insn (jump_insn);

Unfortunately, it dies with:

../a.c: In function ‘my_ms_sysv’:
../a.c:7:1: error: unrecognizable insn:
 }
 ^
(insn 15 14 8 2 (set/f (pc)
(mem:QI (symbol_ref:DI ("__msabi_restore") [flags 0x2]) [0  S1 
A8])) ../a.c:7 -1

 (nil))
../a.c:7:1: internal compiler error: in extract_insn, at recog.c:2343
0xc195e1 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*)

../../gcc/rtl-error.c:110
0xc19622 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
const*)

../../gcc/rtl-error.c:118
0xbcd683 extract_insn(rtx_insn*)
../../gcc/recog.c:2343
0xbcd37c extract_constrain_insn(rtx_insn*)
../../gcc/recog.c:2244
0xbdc310 copyprop_hardreg_forward_1
../../gcc/regcprop.c:793
0xbdd97d execute
../../gcc/regcprop.c:1289

How should I generate this jmp? All of the various helper functions for 
generating a jump appear to be tailored for using a label and I'm using 
a symbol.
I haven't yet attached all of the various notes to the insn yet. My call 
to the prologue routine is working great though!


Thanks,
Daniel