[Bug ld/30930] New: ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-01 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Bug ID: 30930
   Summary: ld-2.41 links mame in a way which gets stuck on
aarch64
   Product: binutils
   Version: 2.41
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: ld
  Assignee: unassigned at sourceware dot org
  Reporter: belegdol at gmail dot com
  Target Milestone: ---

Hello,

it appears that there is an issue with binutils-2.41 on aarch64 which manifests
in mame being linked in a way which in validation gets stuck forever.
I discussed this with mame upstream here:
https://github.com/mamedev/mame/issues/11587
As suggested, I tried building with -fuse-ld=lld which made the problem go
away.
I only have reproduction instructions for Fedora, not sure if other
distributions are affected:
1. Install Fedora on an aarch64 system
2. fedpkg clone --anonymous mame
3. cd mame
4. fedpkg srpm
5. mock -r fedora-rawhide-aarch64 mame-0.259-1.fc40.src.rpm
6. wait

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-01 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #1 from Julian Sikorski  ---
Validation gets stuck on the following function:

#0  0xb5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer ()
#1  0xf5870b2c in call_init (env=, argv=0xf388,
argc=1) at ../csu/libc-start.c:145
#2  __libc_start_main_impl (main=0xaeedadc0 , argc=1,
argv=0xf388, init=, fini=, 
rtld_fini=, stack_end=) at
../csu/libc-start.c:347
#3  0xaef01570 in _start ()

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #3 from Julian Sikorski  ---
I will try, however my problem is that the issue only appears to happen with a
full (as opposed to single-driver) build. It takes close to 3 hours on the only
aarch64 machine I have access to so far, which makes experimentation somewhat
challenging.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #5 from Julian Sikorski  ---
(In reply to Nick Clifton from comment #4)
> You may find it useful to compare a broken-linked-with-ld.bfd binary
> with a working-linked-with-lld binary.  In particular the contents 
> of whatever init sections they have, and the ordering of function
> pointers therein.

I am downloading the broken binary from the test system now. How can I do the
above?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-02 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #6 from Julian Sikorski  ---
(In reply to Sam James from comment #2)
> Could you try give some instructions to reproduce manually from source,
> without using Fedora and Fedora specific tooling?
> 
> Bisecting binutils using 'git bisect run' + timeout would be helpful too if
> you can.

This might work, however I cannot test it unfortunately:
1. git clone https://github.com/mamedev/mame.git
2. Install deps according to the distros mechanism of choice
3. cd mame
4. make -O -j2 V=1 VERBOSE=1 NOWERROR=1 OPTIMIZE=2 PYTHON_EXECUTABLE=python3
QT_HOME=/usr/lib64/qt6 VERBOSE=1 USE_SYSTEM_LIB_ASIO=1 USE_SYSTEM_LIB_EXPAT=1
USE_SYSTEM_LIB_FLAC=1 USE_SYSTEM_LIB_GLM=1 USE_SYSTEM_LIB_JPEG=1
USE_SYSTEM_LIB_PORTAUDIO=1 USE_SYSTEM_LIB_PORTMIDI=1 USE_SYSTEM_LIB_PUGIXML=1
USE_SYSTEM_LIB_RAPIDJSON=1 USE_SYSTEM_LIB_SQLITE3=1 USE_SYSTEM_LIB_UTF8PROC=1
USE_SYSTEM_LIB_ZLIB=1 'SDL_INI_PATH=/etc/mame;' TOOLS=1 'OPT_FLAGS=-O2
-fexceptions -g1 -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang
-Werror=format-security -Werror=implicit-function-declaration
-Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard
-fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer' 'LDOPTS=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1
-specs=/usr/lib/rpm/redhat/redhat-package-notes
5. ./mame -validate

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #8 from Julian Sikorski  ---
(In reply to Nick Clifton from comment #7)
> (In reply to Julian Sikorski from comment #5)
> > (In reply to Nick Clifton from comment #4)
> > > You may find it useful to compare a broken-linked-with-ld.bfd binary
> > > with a working-linked-with-lld binary.  In particular the contents 
> > > of whatever init sections they have, and the ordering of function
> > > pointers therein.
> > 
> > I am downloading the broken binary from the test system now. How can I do
> > the above?
> 
> Well first you can compare the disassembly of the .init section to make sure
> that it is the same in both binaries:
> 
>   objdump -D -j .init mame
> 
> Next I was going to suggest that you check the contents of the .init_array
> section but it appears to be all zeros, which is a bit strange.
> 
> You could be paranoid and check that the hardware property notes are the
> same on both binaries:
> 
>   readelf -n -W mame | grep -e .note.gnu.property -A 4
> 
> But I doubt if that show any discrepancies.
> 
> But I suspect that the only real way you are going to get some traction on
> this problem is if you bring in the glibc folks.  Maybe file a bug report
> telling them that mame is hanging during initialization and that you need
> their help finding out where things have gone wrong ?  Let them know about
> the new version of binutils of course, but do ask them if they can track
> down exactly what the linker has done wrong in order to cause the init code
> to hang.

How would I bring in help from glibc folks? Should I just reassign the bug to
glibc?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #10 from Julian Sikorski  ---
Done: https://bugzilla.redhat.com/show_bug.cgi?id=2241902

I managed to set up an aarch64 rawhide instance on Oracle Cloud but I cannot
connect to it yet :( If I manage to get it working, I can see if I can set up a
bisect mentioned in comment #2.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #11 from Julian Sikorski  ---
With a non-mock, fedpkg compile build on Fedora rawhide aarch running on OCI
the backtrace is slightly different:

#0  0xb5bd4fb0 in
___ZN3emu6detail16device_registrar15register_deviceERNS0_21device_type_impl_baseE_bti_veneer
()
#1  0xaec52368 in device_type_impl_base ()
at ../../../../../src/emu/device.h:240
#2  device_type_impl () at
../../../../../src/emu/device.h:283
#3  __static_initialization_and_destruction_0 () at
../../../../../src/mame/acorn/z88_impexp.cpp:34
#4  _GLOBAL__sub_I_Z88_IMPEXP () at
../../../../../src/mame/acorn/z88_impexp.cpp:278
#5  0xf5870b2c in call_init (env=, argv=0xf258,
argc=2) at ../csu/libc-start.c:145
#6  __libc_start_main_impl (main=0xaeedadc0 , argc=2,
argv=0xf258, init=, fini=, 
rtld_fini=, stack_end=) at
../csu/libc-start.c:347
#7  0xaef01570 in _start

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #13 from Julian Sikorski  ---
Thanks! The patch does not revert cleanly unfortunately and the changes are
complicated enough that I do not feel comfortable running git mergetool. Would
someone please be so kind and provide a patch I can apply against 2.41?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #14 from Julian Sikorski  ---
(In reply to Julian Sikorski from comment #10)
> Done: https://bugzilla.redhat.com/show_bug.cgi?id=2241902
> 
> I managed to set up an aarch64 rawhide instance on Oracle Cloud but I cannot
> connect to it yet :( If I manage to get it working, I can see if I can set
> up a bisect mentioned in comment #2.

Is there a straightforward way of disabling -Werror for non-releases? I got my
cloud instance running but I cannot build a mid-release snapshot due to this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-03 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #17 from Julian Sikorski  ---
(In reply to Nick Clifton from comment #16)
> Created attachment 15152 [details]
> Proposed patch
> 
> (In reply to Julian Sikorski from comment #13)
> > Thanks! The patch does not revert cleanly unfortunately and the changes are
> > complicated enough that I do not feel comfortable running git mergetool.
> > Would someone please be so kind and provide a patch I can apply against 
> > 2.41?
> 
> Please try this patch.

Thanks! With this patch applied the linked mame binary no longer gets stuck.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-04 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #21 from Julian Sikorski  ---
(In reply to Szabolcs Nagy from comment #20)
> seems they made the build use lld, so now i have to undo that.
> will look at it tomorrow

Sorry about that, I should have mentioned it here. You do not need to do git
revert. You can do

$ fedpkg switch-branch f39
$ fedpkg srpm
$ mock -r fedora-rawhide-aarch64 mame-0.259-1.fc39.src.rpm

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-06 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #23 from Julian Sikorski  ---
I was able to complete git bisect in the meantime, it also points to
15b4f66b0a9a3be6caf1898d22a13c39e662006f being the first bad commit.
Interestingly enough, I was not able to reproduce the issue with a simple make
from mame's git snapshot, which indicates that the erroneous behaviour is being
triggered by Fedora RPM packaging options and/or compiler and/or linker flags.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] ld-2.41 links mame in a way which gets stuck on aarch64

2023-10-06 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #24 from Julian Sikorski  ---
I was able to reproduce the problem with the following make call, without the
need to use the RPM tooling:

make -j16 VERBOSE=1 NOWERROR=1 SYMBOLS=1 SYMLEVEL=1 OPTIMIZE=2 OPT_FLAGS="-O2
-fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang
-Werror=format-security -Werror=implicit-function-declaration
-Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard
-fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer" LDOPTS="-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1"

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/30930] Broken BTI veneers: ld-2.41 links mame in a way which gets stuck on aarch64

2023-11-04 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=30930

--- Comment #28 from Julian Sikorski  ---
Thank you. I can confirm that these 5 patches allow mame to link successfully
on aarch64 when applied on top of Fedora binutils-2.41-10.fc40.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/23304] ARM linker fails to combine identical typeinfo sections

2018-06-18 Thread belegdol at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=23304

Julian Sikorski  changed:

   What|Removed |Added

 CC||belegdol at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils