Re: gcc-4.9.2: Assembly for i386 Target

2015-10-12 Thread Stefan Ring
On Mon, Oct 12, 2015 at 11:11 AM, Abhishek Aggarwal
 wrote:
> I was befuddled by the following 3 assembly instructions (generated
> right in the beginning of 'main' function):
>lea   0x4(%esp), %ecx
>and  0xfff0, %esp
>pushl   -0x4(%ecx)
>
> I am not able to understand the purpose of these 3 instructions. Can
> anyone explain me about them? I didn't observe these 3 instructions
> when I compiled the same code with '-m64' switch instead of '-m32'.

This is for aligning the stack to 16 bytes. IIRC, the x86 ABI on Linux
does not mandate a stack alignment greater than 4 bytes (or 8 -- not
sure about the exact number), whereas the one for x86_64 does.


Re: gcc-4.9.2: Assembly for i386 Target

2015-10-12 Thread Stefan Ring
On Mon, Oct 12, 2015 at 1:06 PM, Abhishek Aggarwal
 wrote:
> @Jonathan: The reason I started this discussion is due to my suspicion
> of a potential bug in gcc-4.9.2. However, I may be wrong. Here is the
> explanation:

I think everything is alright. The code is only emitted for the main
function, and the stack is assumed to be aligned for every other
function. This is probably done because of compatibility
considerations with older environments.

So you can rename your function and watch the instructions disappear.


Re: how to tweak x86 code generation to instrument certain opcodes with CC trap?

2015-10-27 Thread Stefan Ring
On Mon, Oct 26, 2015 at 8:47 PM, Yasser Shalabi  wrote:
> So back to square one. Any tips on what code/config-files I need to
> modify with to get GCC to emit additional opcodes for certain
> instructions?

Maybe you should try cross-compiling. It looks like you have already
succeeded with the instrumentation, it was just that the generated
code is not in good enough shape to run during the build process. This
should not be an issue with a cross-compilation.


Re: [WWWDocs] Deprecate support for non-thumb ARM devices

2016-02-25 Thread Stefan Ring
On Thu, Feb 25, 2016 at 10:20 AM, Richard Earnshaw (lists)
 wrote:
> The point is to permit the compiler to use interworking compatible
> sequences of code when generating ARM code, not to force users to use
> Thumb code.  The necessary instruction (BX) is available in armv5 and
> armv5e, even though Thumb is not supported in those architecture variants.
>
> It might be worth deprecating v5 and v5e at some point in the future: to
> the best of my knowledge no v5 class device without Thumb has ever
> existed - but it's not a decision that needs to be related to this proposal.

Slightly off topic, but related: What does the "e" stand for? Also,
what does "l" stand for in armv5tel, which is what I usually get --
little endian?

I have no idea if there is an authoritative source for these host
specifications and cannot find any. config.guess seems to just rely on
uname -m.


Re: [WWWDocs] Deprecate support for non-thumb ARM devices

2016-02-25 Thread Stefan Ring
On Thu, Feb 25, 2016 at 3:15 PM, David Brown  wrote:
> The "t" is thumb, "e" means "DSP-like extensions", and I suspect the "l"
> is a misprint for "j", meaning the Jazelle (Java) acceleration instructions.

I doubt that as "armv5tejl" is also quite common.


Re: [WWWDocs] Deprecate support for non-thumb ARM devices

2016-02-25 Thread Stefan Ring
On Thu, Feb 25, 2016 at 3:15 PM, David Brown  wrote:
> 

Great link, thanks!


Re: no response to cfarm request

2014-12-16 Thread Stefan Ring
On Tue, Dec 16, 2014 at 10:13 AM, Jay Foad  wrote:
> I've pinged again and waited another week with no response. Is there
> no-one else who can administer compile farm accounts?

Maybe you should try the gcc-cfarm mailing list:
https://mail.gna.org/listinfo/gcc-cfarm-users

It seems very responsive.


Re: Building on gcc112 is stuck in msgfmt

2017-08-29 Thread Stefan Ring
On Tue, Aug 29, 2017 at 9:32 AM, Martin Liška  wrote:
> On 08/28/2017 09:15 PM, Martin Liška wrote:
>> On 08/28/2017 04:06 PM, Jeff Law wrote:
>>> On 08/28/2017 01:16 AM, Martin Liška wrote:
 Hello.

 I've just repeatedly seen stuck in build process:

 make[5]: Entering directory 
 `/home/marxin/Programming/gcc/objdir/powerpc64le-unknown-linux-gnu/libstdc++-v3/po'
 msgfmt -o de.mo ../../../../libstdc++-v3/po/de.po

 49__asm volatile ("sc; mfcr %0"
 Missing separate debuginfos, use: debuginfo-install 
 gettext-0.18.2.1-4.el7.ppc64le
 (gdb) bt
 #0  0x3fff85d8bac8 in sys_futex0 (val=-1, op=128, addr=0x3fff85db0520 
 ) at ../../../libgomp/config/linux/powerpc/futex.h:49
 #1  futex_wait (val=-1, addr=0x3fff85db0520 ) at 
 ../../../libgomp/config/linux/powerpc/futex.h:62
 #2  do_wait (val=-1, addr=) at 
 ../../../libgomp/config/linux/wait.h:67
 #3  gomp_mutex_lock_slow (mutex=0x3fff85db0520 , 
 oldval=) at ../../../libgomp/config/linux/mutex.c:63
 #4  0x3fff85d98b04 in gomp_mutex_lock (mutex=0x3fff85db0520 
 ) at ../../../libgomp/config/linux/mutex.h:57
 #5  goacc_register (disp=0x3fff85db0090 ) at 
 ../../../libgomp/oacc-init.c:74
 #6  0x3fff85d983fc in goacc_host_init () at 
 ../../../libgomp/oacc-host.c:265
 #7  0x3fff85d99c88 in goacc_runtime_initialize () at 
 ../../../libgomp/oacc-init.c:657
 #8  0x3fff85d7882c in initialize_env () at ../../../libgomp/env.c:1340
 #9  0x3fff86525c74 in _dl_init_internal () from /lib64/ld64.so.2
 #10 0x3fff865119cc in _dl_start_user () from /lib64/ld64.so.2

>> I did the same with the same result. Note that I can see the same problem
>> on gcc110 machine :/
>>
[...]
>
> Looks it uses different invocation of futex syscall. In order to have it 
> working I needed to configure gettext w/ --disable-openmp.
> Note that the former invocation of msgfmt contains just a single futex 
> syscall, so should not be blocked by anything else.

Then it looks very much like libgomp got its memory barriers wrong for
powerpc64.


Missed possible branch elimination

2017-10-26 Thread Stefan Ring
While poring over the Transport Tycoon Deluxe disassembly, commonly
known to have been hand-written in assembler, I stumbled across this
tidbit, which I think is kinda neat:

004057F7 83 7D B8 01  cmp dword ptr [ebp-48h],1
004057FB 1B C0sbb eax,eax
004057FD F7 D8neg eax
004057FF 85 05 20 A9 41 00testdword ptr ds:[41A920h],eax
00405805 0F 84 91 00 00 00je  0040589C

which basically says:

if (((DWORD*) $ebp)[-0x12] == 0 && (*(DWORD*) 0x41a920 & 1)) {  }

... leaving aside possible side effects of the memory access, so
treating short circuit eval like an arithmetic operation might not be
legal in this specific case. But for the function

void a(void (*fnc)(void), int *x, int y) { if ((*x == 0) && (y&1)) fnc(); }

& and && should be absolutely equivalent. In fact, gcc realizes this
equivalency and produces the exact same code for both variants. It is
just using the two-branch version instead of a one-branch strategy
(x86_64 now):

 :
   0:   8b 06   mov(%rsi),%eax
   2:   85 c0   test   %eax,%eax
   4:   75 0a   jne10 
   6:   83 e2 01and$0x1,%edx
   9:   74 05   je 10 
   b:   ff e7   jmpq   *%rdi
   d:   0f 1f 00nopl   (%rax)
  10:   c3  retq

I would rather see:

 :
   0:   31 c0   xor%eax,%eax
   2:   8b 0e   mov(%rsi),%ecx
   4:   85 c9   test   %ecx,%ecx
   6:   0f 94 c0sete   %al
   9:   85 c2   test   %eax,%edx
   b:   74 03   je 10 
   d:   ff e7   jmpq   *%rdi
   f:   90  nop
  10:   c3  retq

Either that, or even:

 :
   0:   8b 0e   mov(%rsi),%ecx
   2:   85 c9   test   %ecx,%ecx
   4:   0f 94 c0sete   %al
   7:   84 c2   test   %al,%dl
   9:   74 05   je 10 
   b:   ff e7   jmpq   *%rdi
   d:   0f 1f 00nopl   (%rax)
  10:   c3  retq

Although this is touching an entirely different topic now.

I'm just wondering if it should not rather lean towards eliminating
the branch. I'd
guess that this almost always a worthy goal.


Re: Missed possible branch elimination

2017-11-17 Thread Stefan Ring
On Thu, Oct 26, 2017 at 8:23 PM, Stefan Ring  wrote:
> While poring over the Transport Tycoon Deluxe disassembly, commonly
> known to have been hand-written in assembler, I stumbled across this
> tidbit, which I think is kinda neat:
>
> 004057F7 83 7D B8 01  cmp dword ptr [ebp-48h],1
> 004057FB 1B C0sbb eax,eax
> 004057FD F7 D8neg eax
> 004057FF 85 05 20 A9 41 00testdword ptr ds:[41A920h],eax
> 00405805 0F 84 91 00 00 00je  0040589C
>
> which basically says:
>
> if (((DWORD*) $ebp)[-0x12] == 0 && (*(DWORD*) 0x41a920 & 1)) {  }
>
> ... leaving aside possible side effects of the memory access, so
> treating short circuit eval like an arithmetic operation might not be
> legal in this specific case. But for the function
>
> void a(void (*fnc)(void), int *x, int y) { if ((*x == 0) && (y&1)) fnc(); }
>
> & and && should be absolutely equivalent. In fact, gcc realizes this
> equivalency and produces the exact same code for both variants. It is
> just using the two-branch version instead of a one-branch strategy
> (x86_64 now):
>
>  :
>0:   8b 06   mov(%rsi),%eax
>2:   85 c0   test   %eax,%eax
>4:   75 0a   jne10 
>6:   83 e2 01and$0x1,%edx
>9:   74 05   je 10 
>b:   ff e7   jmpq   *%rdi
>d:   0f 1f 00nopl   (%rax)
>   10:   c3  retq
>
> I would rather see:
>
>  :
>0:   31 c0   xor%eax,%eax
>2:   8b 0e   mov(%rsi),%ecx
>4:   85 c9   test   %ecx,%ecx
>6:   0f 94 c0sete   %al
>9:   85 c2   test   %eax,%edx
>b:   74 03   je 10 
>d:   ff e7   jmpq   *%rdi
>f:   90  nop
>   10:   c3  retq
>
> Either that, or even:
>
>  :
>0:   8b 0e   mov(%rsi),%ecx
>2:   85 c9   test   %ecx,%ecx
>4:   0f 94 c0sete   %al
>7:   84 c2   test   %al,%dl
>9:   74 05   je 10 
>b:   ff e7   jmpq   *%rdi
>d:   0f 1f 00nopl   (%rax)
>   10:   c3  retq
>
> Although this is touching an entirely different topic now.
>
> I'm just wondering if it should not rather lean towards eliminating
> the branch. I'd
> guess that this almost always a worthy goal.

I'm really curious about that. Why is it that two branches instead of
one are preferred, even if the compiler seems to realize equivalence,
given that it replaces an arithmetic operation (the & operator) with a
branch. It would seem that this is a case of if conversion, albeit in
the wrong direction for my taste.

Is this question not appropriate here?
Did nobody know what to make of it? Or my expectations? ;)


RedHat patch not found in mainline gcc

2014-03-17 Thread Stefan Ring
At the company where I work, we have a large program using Boost
Python (1.54). We do our product builds for RHEL 5 and recently
started building using gcc 4.8 from RedHat devtoolset 2 for
performance. This works well, except for one system where it would
deterministically crash. I traced it to an old version of libgcc, and
specifically this patch, which RedHat applied to its 5.5 release in
2009: 
.
I built libgcc myself with and without the patch, with the program
crashing reliably without the patch, and no crash with the patch
applied. Unfortunately, gdb does not show a meaningful stack trace, at
least not the old version from RHEL 5.

When trying to find out a bit more about the patch, I was rather
surprised to see that (1) it is not applied to the mainline gcc code
and (2) it still applies cleanly. Since I don't have a good stack
trace, I cannot even try to build a suitable reproducer at the moment.

Is there a good reason for not having it in mainline gcc? I suppose it
got lost or forgotten somehow, and that it would be good to have it
applied.


Re: RedHat patch not found in mainline gcc

2014-03-18 Thread Stefan Ring
> I don't remember it well, but from re-reading the gcc-patches threads around
> that time like:
> http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00368.html

That thread is from 2009.

> it seems that the actually committed fix for the bug that the
> gcc41-unwind-restore-state.patch was meant to fix was
> http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00617.html committed as
> http://gcc.gnu.org/r118068

And this one from 2006. Are you sure they are related?

> If you are talking about devtoolset, that compiler doesn't come with it's
> own libgcc_s.so.1, uses the system one, just its own libgcc_eh.a, so e.g. if
> you'd use the default -shared-libgcc (for g++), then it should make no
> difference at all, because you'd be using the same system unwinder all the
> time.

I don't quite understand. If -shared-libgcc is default, I certainly
have not touched it. And since patching the system unwinder in
/lib64/libgcc_s-4.1.2-20080825.so.1 is what affects the crashing
behavior, it seems to be using the system unwinder already.

To me, it looks like the newer gcc from devtoolset generates unwind
instruction that the old unwinder does not understand (or doesn't
interpret correctly), whereas with gcc41-unwind-restore-state.patch
applied, it does.


Re: RedHat patch not found in mainline gcc

2014-03-18 Thread Stefan Ring
>> http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00368.html
>
> That thread is from 2009.
>
>> it seems that the actually committed fix for the bug that the
>> gcc41-unwind-restore-state.patch was meant to fix was
>> http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00617.html committed as
>> http://gcc.gnu.org/r118068
>
> And this one from 2006. Are you sure they are related?

Ok, there is a reference in the newer thread, so it looks like you are
right here.