On Tue, Feb 9, 2021 at 5:35 PM Gian-Carlo Pascutto <g...@mozilla.com> wrote:
>
> On 3/02/2021 10:51, Henri Sivonen wrote:
> > I came across 
> > https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level/
> > . Previously, when microbenchmarking Rust code that used count_ones()
> > in an inner loop (can't recall what code this was), I noticed 4x
> > runtime speed when compiling for target_cpu=nehalem and running on a
> > much later CPU.
>
> That's an extreme edge case though.

It is an extreme edge case but it's also a case where run-time
dispatch doesn't make sense. The interesting thing is how much these
plus LLVM using newer instructions on its own would add up around the
code base.

> > I'm wondering:
> >
> > Have we done benchmark comparisons with libxul compiled for the
> > newly-defined x86_64 levels?
>
> No. Should be easy to do

In that case, it seems worth trying.

> but I don't expect much to come off of it. The
> main change (that is broadly applicable, unlike POPCNT) in recent years
> would be AVX. Do we have much floating point code in critical paths? I
> was wondering about the JS' engine usage of double for value storage -
> but it's what comes out of the JIT that matters, right?

AVX is much more recent than what's available after SSE2, which is our
current baseline.

Chrome is moving to SSE3 as the unconditional baseline, which I
personally find surprising:
https://docs.google.com/document/d/1QUzL4MGNqX4wiLvukUwBf6FdCL35kCDoEJTm2wMkahw/edit#

A quick and very unscientific look at Searchfox suggests that
unconditional SSE3 would mainly eliminate conditional/dynamic dispatch
on YUV conversion code paths when it comes to explicit SSE3 usage. No
idea how LLVM would insert SSE3 usage on its own.

> Media codecs don't count - they should detect at runtime. Same applies
> to crypto code, that - I really hope - would be using runtime detection
> for their SIMD implementations or even hardware AES/SHA routines.
>
> > For macOS and Android, do we actively track the baseline CPU age that
> > Firefox-compatible OS versions run on and adjust the compiler options
> > accordingly when we drop compatibility for older OS versions?
>
> Android only recently added 64-bit builds, and 32-bit would be limited
> to ARMv7-A. There used to be people on non-NEON devices, but those are
> probably gone by now. Google says "For NDK r21 and newer Neon is enabled
> by default for all API levels." - note that should be the NDK used for
> 64-bit builds.
>
> So it's possible Android could now assume NEON even on 32-bit, if it
> isn't already. Most of the code that cares (i.e. media) will already be
> doing runtime detection though.

I meant tracking baseline CPU age on the x86/x86_64 Android side. We
have required NEON on Android ARMv7 for quite a while already.

> For macOS Apple Silicon is a hard break. For macOS on x86, I guess AVX
> is also breaking point. There was an open question if any non-AVX
> hardware is still supported on Big Sur because Rosetta doesn't support
> AVX code, but given that we support (much) older macOS releases I don't
> think we can assume AVX presence regardless. We support back to macOS
> 10.12, which runs on "MacBook Late 2009", which was a Core 2 Duo. Guess
> we could assume SSSE3 but nothing more.

That's older than I expected, but it still seems worthwhile to make
our compiler settings for Mac reflect that if they don't already.
Also, doesn't the whole Core 2 Duo family have SSE 4.1?


--
Henri Sivonen
hsivo...@mozilla.com
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to