speed choices for internal functions

Richard Sandiford Wed, 25 Nov 2015 04:20:51 -0800

This series fixes PR 68432, a regression caused by my internal-functions-
for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
condition but conditionally FAIL if we're optimising for size:


  if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
      && !flag_trapping_math)
    {
      if (TARGET_ROUND)
        emit_insn (gen_sse4_1_round<mode>2
                   (operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
      else if (optimize_insn_for_size_p ())
        FAIL;
      else
        ix86_expand_rint (operands[0], operands[1]);
    }

This is going to cause problems if we want to make more decisions
related to optabs at the gimple level.

We've already had the same problem in rtl, where some patterns used
to check optimize_insn_for_size_p in their C condition, and would then
fail to match if an instruction were moved from a hot block to a cold
block.  Now rtl has two attributes, preferred_for_size and
preferred_for_speed, that say whether a particular alternative of a
particular instruction should be used when optimising for size or speed
respectively.  We try to honour a "false" value as much as possible,
but it's not an absolute guarantee.

The point of this series is to extend preferred_for_size and
preferred_for_speed to define_expands, so that the attributes
can be tested for optabs too.

enabled, preferred_for_size and preferred_for_speed are supposed
to be an inavariant property of a given (code, alternative) pair.
They're not supposed to depend on a particular insn or its operands.
However, the attribute functions still take an rtx_insn * as argument,
so mistakes are only caught at runtime if we see a specific instruction
for which the attributes conflict with the cached ones
(recog.c:check_bool_attrs).

Extending the attributes to define_expands means that we finally
need to "fix" that and make the attributes take a (code, alternative)
pair instead.  Most ports were already structured to allow that.
The two exceptions are ARM and NDS32.

The problem for NDS32 is that "enabled" depends on "length", which
needs access to an instruction in order to calculate branch lengths.
This isn't a problem in practice because all instructions with
operand-dependent lengths force "enabled" to 1.  However,
it's easier to enforce the restriction at compile time if we
have an "is_16bit" attribute, to go along with the TARGET_16_BIT
condition that controls whether 16-bit instructions can be used.

The problem for ARM is that "enabled" depends on "type" and
"use_literal_pool", both of which use match_operand tests in some cases.
I think the "type" match_operands were actually OK for "enabled" in
practice, but I think the "use_literal_pool" ones are a genuine bug.
They are used when we have one alternative that accepts both memory and
immediate operands.  The alternative is supposed to be disabled for
immediate operands when arm_disable_literal_pool is true, but not
disabled for memory operands.  However, the "enabled" cache only cares
about the alternative number, so we can end up disabling memory operands
if we cached based on an immediate operand, or allow immediate operands
if we cached based on a memory operand.  The series fixes that by
splitting alternatives where necessary.

Due to the define_subst patches, it's already syntactically possible to
attach attributes to define_expands, but genattrtab.c currently ignores
them.  The series restricts these attributes to the new "code,alternative"
style and then handles them in the same way as define_insn attributes.

I realise this is rather invasive for stage 3, but I think it's
worth fixing the bug "properly" rather than papering over it.
The ARM "use_literal_pool" bug described above shows how easy
it is for the enabled attribute to go wrong in subtle ways.

The series is split into five parts:

  (1) Make the ARM and NDS32 changes described above
  (2) Add support for "code,alternative" attributes
  (3) Make all ports use them for enabled, preferred_for_size and
      preferred_for_speed
  (4) Use preferred_for_size and preferred_for_speed to decide whether
      to use internal functions
  (5) Convert the internal-function-related i386 patterns to use
      preferred_for_size instead of FAILing.

(3) is a purely mechanical change.  I think it counts as obvious if the
other parts are OK.

Tested by building GCC before and after the series on:

    aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
    arm-linux-gnueabihf avr-rtems bfin-elf c6x-elf cr16-elf cris-elf
    epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
    ia64-linux-gnu iq2000-elf lm32-elf m32c-elf m32r-elf
    m68k-linux-gnu mcore-elf mep-elf microblaze-elf mips-linux-gnu
    mmix mn10300-elf moxie-elf msp430-elf nds32le-elf
    hppa64-hp-hpux11.23 nios2-linux-gnu nvptx-none pdp11
    powerpc-linux-gnu powerpc-eabispe rl78-elf rx-elf s390-linux-gnu
    sh-linux-gnu sparc-linux-gnu spu-elf tilegx-elf tilepro-elf
    xstormy16-elf v850-elf vax-netbsdelf visium-elf xtensa-elf
    x86_64-darwin

and comparing the assembly output for gcc.dg, g++.dg and gcc.c-torture
at -O2.  There were no differences besides the usual timestamps.

Also tested on x86_64-linux-gnu and arm-linux-gnueabihf.  I will test
on powerpc64-linux-gnu as well before committing.  OK to install?

Thanks,
Richard

[PR68432 00/26] Handle size/speed choices for internal functions

Reply via email to