Hello,

2019-02-01  H.J. Lu  <hongjiu...@intel.com>
            Hongtao Liu  <hongtao....@intel.com>
            Sunil K Pandey  <sunil.k.pan...@intel.com>

        PR target/87007
        * config/i386/i386-passes.def: Add
        pass_remove_partial_avx_dependency.
        * config/i386/i386-protos.h
        (make_pass_remove_partial_avx_dependency): New.
        * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
        New function.
        (pass_data_remove_partial_avx_dependency): New.
        (pass_remove_partial_avx_dependency): Likewise.
        (make_pass_remove_partial_avx_dependency): Likewise.
        * config/i386/i386.md (partial_xmm_update): New attribute.
        (*extendsfdf2): Add partial_xmm_update.
        (truncdfsf2): Likewise.
        (*float<SWI48:mode><MODEF:mode>2): Likewise.
        (SF/DF conversion splitters): Disabled for TARGET_AVX.

gcc/testsuite/

2019-02-01  H.J. Lu  <hongjiu...@intel.com>
            Hongtao Liu  <hongtao....@intel.com>
            Sunil K Pandey  <sunil.k.pan...@intel.com>

        PR target/87007
        * gcc.target/i386/pr87007-1.c: New test.
        * gcc.target/i386/pr87007-2.c: Likewise.


It seems to me that more systematic way would be to use mode switching
pass that uses the LCM framework and possibly tweak LCM to do the right
thing with respect to loops (easy solution would be to lift insertion
points to the dominators with smaller frequency even if there may be path
that does not execute the instruction needing the pxor).

Teaching LCM framework is however more intrusive than self contained
minipass and Since the patch solves a regression and is self contained I
guess we should go ahead with it for this release and look for more
systematic solutions later.

Patch is OK with the following change.

+static unsigned int
+remove_partial_avx_dependency (void)
+{
+  timevar_push (TV_MACH_DEP);
+
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_set_flags (DF_DEFER_INSN_RESCAN);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_md_add_problem ();
+  df_analyze ();

Please delay the initialization after you hit first instruction that
needs processing.  The pass is run unconditionally and in many functions
it will do noting. Can you also gate the pass to run only of AVX is
enabled?

Patch is OK with this change. Please way a day for possible Uros' or RM
reactions.  Sorry for the delayed reaction.
Honza

Reply via email to