Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Oct 29, 2020 at 2:46 AM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Hongtao Liu <crazy...@gmail.com> writes:
>> > On Tue, Oct 27, 2020 at 7:13 PM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> >>
>> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > Hi:
>> >> >   For inline asm, there could be an operand like (not (mem:)), it's
>> >> > not a valid operand for normal memory constraint.
>> >> >   Bootstrap is ok, regression test is ok for make check
>> >> > RUNTESTFLAGS="--target_board='unix{-m32,}'"
>> >> >
>> >> > gcc/ChangeLog
>> >> >         PR target/97540
>> >> >         * ira.c: (ira_setup_alts): Extract memory from operand only
>> >> >         for special memory constraint.
>> >> >         * recog.c (asm_operand_ok): Ditto.
>> >> >         * lra-constraints.c (process_alt_operands): MEM_P is
>> >> >         required for normal memory constraint.
>> >> >
>> >> > gcc/testsuite/ChangeLog
>> >> >         * gcc.target/i386/pr97540.c: New test.
>> >>
>> >> Sorry to stick my oar in, but I think we should reconsider the
>> >> bcst_mem_operand approach.  It seems like these patches (and the
>> >> previous one) are fighting against the principle that operands
>> >> cannot be arbitrary expressions.
>> >>
>> >> This kind of thing was attempted long ago (even before my time!)
>> >> for SIGN_EXTEND on MIPS.  It ended up causing more problems than
>> >> it solved and in the end it had to be taken out.  I'm worried that
>> >> we might end up going through the same cycle again.
>> >>
>> >> Also, this LRA code is extremely performance-sensitive in terms
>> >> of compile time: it's often at the top or near the top of the profile.
>> >> So adding calls to new functions like extract_mem_from_operand for
>> >> a fairly niche case probably isn't a good trade-off.
>> >>
>> >> I think we should instead find a nice(?) syntax for generating separate
>> >> patterns for the two bcst_vector_operand alternatives from a single
>> >> .md pattern.  That would fit the existing model much more closely.
>> >>
>> >
>> > We have define_subst for RTL template transformations, but it's not
>> > suitable for this case(too many define_subst templates need to
>> > be added, and it doesn't show any advantage compared to adding
>> > separate bcst patterns.). I don't find other workable existing syntax for 
>> > it.
>>
>> Yeah, I think it would need to be new syntax.  I was wondering if it
>> would help if we had somethine like (strawman suggestion):
>>
>>           (one_of 0
>>             [(match_operand:VI_AVX2 1 "vector_operand" "...")
>>              (vec_duplicate:VI_AVX2
>>                (match_operand:<...> 1 "..." "..."))]
>>
>> where all instances of (one_of N ...) for a given N are required
>> to have the same number of alternatives.
>>
>> This could be handled in a similar way to define_subst, with the
>> one_of being expanded before the main generator routines see it.
>>
>> But maybe it wouldn't help that much.  E.g. for:
>>
>> (define_insn "*<plusminus_insn><mode>3"
>>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
>>         (plusminus:VI_AVX2
>>           (match_operand:VI_AVX2 1 "bcst_vector_operand" "<comm>0,v")
>>           (match_operand:VI_AVX2 2 "bcst_vector_operand" "xBm,vmBr")))]
>>
>> the vec_duplicate version should only really have one alternative.
> It would be guaranteed by it's attribute (set_attr "isa" "noavx,avx"),
> since bcst_mem_operand implies avx512f, which of course implies avx,
> therefore the first alternative would never be enabled under avx.

Ah, OK.

>> I guess we could handle that by using a:
>>
>>   (one_of 0
>>     [(set_attr "enabled" "*,*")
>>      (set_attr "enabled" "0,*")])
>>
>> or some variant of that that uses a derived attribute.  But it feels
>> a bit clunky…
>>
>> Without that, I guess the only pattern that would benefit directly is:
>>
>> (define_insn "avx512dq_mul<mode>3<mask_name>"
>>   [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
>>         (mult:VI8_AVX512VL
>>           (match_operand:VI8_AVX512VL 1 "bcst_vector_operand" "%v")
>>           (match_operand:VI8_AVX512VL 2 "bcst_vector_operand" "vmBr")))]
>>
>> > So suppose I should revert my former 2 patches and add separate bcst 
>> > patterns.
>>
>> Are there going to more patterns that need bcst_vector_operand,
>
> Almost all AVX512 instructions need corresponding bcst patterns except
> for those with 8-bit/16-bit data elements.

OK, that changes things.  Sorry, not knowing the architecture,
I wasn't sure how far this was from being complete.

>> or is the current set complete?
>>
>> I definitely think we should have a better way of handling this in the
>> .md files, and I'd be happy to hack something up on the generator side
>> (given that I'm being the awkward one here).  But I guess the answer to
>> the question above will decide whether it make things better or not.
>>
>> FWIW, I think having separate patterns (whether they're produced from
>> one .md construct or from several) might better optimisation results.
>
> With proper extending for special memory constraint, they should be 
> equivalent.
> At least unit tests under i386 backend would generate embedded
> broadcast instruction as before.

I guess my main objection is that we have a special memory constraint
that isn't in fact matching a MEM (at least not directly).  That seems
odd and feels like it's going to come back to bite us.

>From an RTL perspective, the MEM in the bcst_memory_operand isn't that
special.  It's really just a normal memory that happens to appear in a
VEC_DUPLICATE.

With the MIPS thing, the SIGN_EXTEND was around a register rather
than a memory, and the register constraints applied to the register
as normal.  In other words, the SIGN_EXTEND was not part of the
constraint: target-independent code removed the SIGN_EXTEND
before matching against the constraints.

I think we could view the bcst_mem_operand case in a similar way.
The constraint process ends up having two steps: an extraction
step (for the REG inside a SIGN_EXTEND for MIPS, for the MEM
inside a VEC_DUPLICATE for AVX512) and the normal matching step on
the result.  This could either happen on a constraint-by-constraint
basis or (perhaps more efficiently) as a separate step before
applying the constraints.  Not sure off-hand which is better,
would need to think more about it.

But I think this extraction process should be described directly in the
.md file somehow.  I've had a few ideas but I'm not happy enough with
any of them yet to post.

Thanks,
Richard

Reply via email to