Hey folks, I thought I'll post a follow up here in case it is of wider interest.
First, my colleague Nicolas Savoire did a Git bisect and identified the commit[0] that stopped GCC from choosing AArch64 FP registers for pointer storage. He even created a reproducer[1] on Godbolt that shows the difference between code emitted by GCC 8.5 and 9.3. Second (this might be of interest to Mark) our customer that had issues with crashes using the code version that stored pointers in FP registers did run some experiments. The customer did weeklong runs without crashes on certain AWS Fargate ECS on-demand instances in the US Oregon region, and then switched to AWS Fargate ECS spot-priced instances in the same region. On-demand instances never exhibited crashes when run for a week, but the spot-priced instances would always crash within 24 hours. The customer further used lscpu to try to identify the processor and found that the Amazon scheduler in that region chooses a particular CPU for spot-priced instances, and based on lscpu[2] it looks like Graviton 3, while the never crashing on-demand instances were always getting what looks like Graviton 2 instead. It's rather possible there's a batch of problematic Grav3 CPUs and Amazon just incidentally happens to only have deployed them in the US Oregon region. Within Datadog at least, we fixed the issue by upgrading the toolchain we use for this particular product from GCC 8 to GCC 9, so our code no longer contains instructions that trigger the fault. We probably won't investigate this much further but if anyone working more closely on Graviton CPUs reads this, we're happy to share more details. Attila. --- [0] https://github.com/gcc-mirror/gcc/commit/2eb2847ec54a3262f303f47697c5e5cbe3cc089d [1] https://godbolt.org/z/jWPxMnYE9 [2] https://gist.github.com/szegedi/2ea5dc9a1ca300f58283884a0eaea26b On Mon, Feb 24, 2025 at 6:51 PM Mark Rutland <mark.rutl...@arm.com> wrote: > On Mon, Feb 24, 2025 at 10:46:42AM +0100, Attila Szegedi wrote: > > Hi folks, > > Hi, > > I've been pointed at this thread due to the reference to my Linux patch > series fixing some KVM FPSIMD/SVE/SME issues. > > > I'm looking for a bit of a historic context for a fun GCC behavior we > > stumbled across. For... reasons we build some of our binaries using an > > older version of GCC (8.3.1, yes, we'll be upgrading soon, and no, this > > message is not about helping with an ancient version :-) ) > > > > We noticed that this version of GCC compiling on aarch64 will happily use > > FP registers to temporarily store/load pointers, so there'd be "fmov d9, > > x1" to store a pointer, and then later when it's used as a parameter to a > > function call we'll see "fmov x1, d9" etc. We noticed this while > > investigating some crashes that seemed to always occur in functions > called > > with parameters loaded through this mechanism, on certain specific models > > of aarch64 CPUs. > > Hmmm... IIUC d9 specifically should be preserved by callees per AAPCS64; > do you see this with specific registers? e.g. v8 to v15? > > Are you able to share any more information about the configuration(s) > that you see this with, e.g. > > * Which CPU(s)? > > If you're not able to say which CPU(s) specifically, knowing whether > SVE and/or SME are present would be helpful. > > * Which kernel version(s), assuming this is with Linux? > > If virtualization is involved, knowing the guest and host kernel > versions would be helpful. > > Thanks, > Mark. >