On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool <seg...@kernel.crashing.org> wrote: > > Hi! > > On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote: > > This patch extend change_zero_ext to change illegitimate constant > > into constant pool, this will enable simplification of below: > > It should be in a separate function. recog_for_combine will call both. > But not both for the same RTL! This is important. Originally, combine > tried only one thing for every combination of input instructions it got. > Every extra attempt causes quite a bit more garbage (not to mention it > takes time as well, recog isn't super cheap), so we should try to not > make recog_for_combine exponential in the variants it tries.. > > And of course the function name should always be descriptive of what a > function does :-) > > > change_zero_ext (rtx pat) > > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) > > { > > rtx x = **iter; > > scalar_int_mode mode, inner_mode; > > + machine_mode const_mode = GET_MODE (x); > > + > > + /* Change illegitimate constant into memref of constant pool. */ > > + if (CONSTANT_P (x) > > + && !const_vec_duplicate_p (x) > > This is x86-specific? It makes no sense in generic code, anyway. Or if > it does it needs a big fat comment :-) > > > + && const_mode != BLKmode > > + && GET_CODE (x) != HIGH > > + && GET_MODE_SIZE (const_mode).is_constant () > > + && !targetm.legitimate_constant_p (const_mode, x) > > + && !targetm.cannot_force_const_mem (const_mode, x)) > > You should only test that it did not recog, and then force a constant > to memory. You do not want to do this for every constant (rotate by 42 > won't likely match better if you force 42 to memory) so some > sophistication will help here, but please do not make it target- > specific. > > > + { > > + x = force_const_mem (GET_MODE (x), x); > > That mode is const_mode. > > > Ideally you will have some example where it benefits some other target, > too. Running recog twice for a big fraction of all combine attempts, > for no benefit at all, is not a good idea. The zext* thing is there > because combine *itself* creates a lot of extra zext*, whether those > exist for the target or not. So this isn't obvious precedent (and that > wouldn't mean it is a good idea anyway ;-) ) When trying to construct a testcase for another backend, I realized that the issue can also be solved by folding _mm_shuffle_ps into gimple vec_perm_expr. Let me try that way.
> > > Segher -- BR, Hongtao