Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion

Uros Bizjak Wed, 20 Jul 2011 00:12:57 -0700

Hello!

> > ? ? ? ?* a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2.
>
> And this is necessary because...???
>
> Why not just make it a separate pass in ix86-reorg that uses LCM? Look at 
> mode switching for an example.


I was also expecting that vzeroupper would be inserted in the same way
as I387 mode switching instructions are inserted. To expand on
Steven's suggestion, please see i386.h for OPTIMIZE_MODE_SWITCHING and
following macros.

At the moment, there are 4 separate entities that handle (four
independent) insertions for mode switching for x87 for each mode of
fistp or frndint instruction. Mode insertions will actually insert
calculations of x87 control word (CW) at optimal points and push this
new CW (together with old CW) to known stack slot to be consumed by
fistp/frndint insn.

You can add a new entitiy to enum ix86_entity (say, AVX_VZEROUPPER)
and update OPTIMIZE_MODE_SWITCHING to perform mode insertion for
AVX_VZEROUPPER entitiy when needed. Various modes for AVX_VZEROUPPER
are defined in NUM_MODES_FOR_MODE_SWITCHING, mode transition in
MODE_NEEDED and insn insertions in EMIT_MODE_SET.

Please note that LCM handles all entities in parallel, so there is no
need for extra passes. The real worker for mode switching is
ix86_mode_needed, but don't forget that you can disable mode switching
pass per-function when not needed through OPTIMIZE_MODE_SWITCHING
macro.

FYI: Existing x87 CW initialization insertion works this way:
- fistp/frndint is inserted into insn stream and corresponding
OPTIMIZE_MODE_SWITCHING flag is set.
- inserted insn has i386_cw attribute that defines requested mode in
which the insn operate. Based on this attribute, MODE_NEEDED handles
mode transitions (please note that there are four independent
entities) for each entitiy.
- EMIT_MODE_SET emits CW initializations. These are further optimized
by follow-up optimization passes, so two consecutive initializations
at the same place are CSEd, etc.

Uros.

Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion

Reply via email to