Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Thu, Oct 29, 2020 at 2:46 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Hongtao Liu <crazy...@gmail.com> writes: >> > On Tue, Oct 27, 2020 at 7:13 PM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> >> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> > Hi: >> >> > For inline asm, there could be an operand like (not (mem:)), it's >> >> > not a valid operand for normal memory constraint. >> >> > Bootstrap is ok, regression test is ok for make check >> >> > RUNTESTFLAGS="--target_board='unix{-m32,}'" >> >> > >> >> > gcc/ChangeLog >> >> > PR target/97540 >> >> > * ira.c: (ira_setup_alts): Extract memory from operand only >> >> > for special memory constraint. >> >> > * recog.c (asm_operand_ok): Ditto. >> >> > * lra-constraints.c (process_alt_operands): MEM_P is >> >> > required for normal memory constraint. >> >> > >> >> > gcc/testsuite/ChangeLog >> >> > * gcc.target/i386/pr97540.c: New test. >> >> >> >> Sorry to stick my oar in, but I think we should reconsider the >> >> bcst_mem_operand approach. It seems like these patches (and the >> >> previous one) are fighting against the principle that operands >> >> cannot be arbitrary expressions. >> >> >> >> This kind of thing was attempted long ago (even before my time!) >> >> for SIGN_EXTEND on MIPS. It ended up causing more problems than >> >> it solved and in the end it had to be taken out. I'm worried that >> >> we might end up going through the same cycle again. >> >> >> >> Also, this LRA code is extremely performance-sensitive in terms >> >> of compile time: it's often at the top or near the top of the profile. >> >> So adding calls to new functions like extract_mem_from_operand for >> >> a fairly niche case probably isn't a good trade-off. >> >> >> >> I think we should instead find a nice(?) syntax for generating separate >> >> patterns for the two bcst_vector_operand alternatives from a single >> >> .md pattern. That would fit the existing model much more closely. >> >> >> > >> > We have define_subst for RTL template transformations, but it's not >> > suitable for this case(too many define_subst templates need to >> > be added, and it doesn't show any advantage compared to adding >> > separate bcst patterns.). I don't find other workable existing syntax for >> > it. >> >> Yeah, I think it would need to be new syntax. I was wondering if it >> would help if we had somethine like (strawman suggestion): >> >> (one_of 0 >> [(match_operand:VI_AVX2 1 "vector_operand" "...") >> (vec_duplicate:VI_AVX2 >> (match_operand:<...> 1 "..." "..."))] >> >> where all instances of (one_of N ...) for a given N are required >> to have the same number of alternatives. >> >> This could be handled in a similar way to define_subst, with the >> one_of being expanded before the main generator routines see it. >> >> But maybe it wouldn't help that much. E.g. for: >> >> (define_insn "*<plusminus_insn><mode>3" >> [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v") >> (plusminus:VI_AVX2 >> (match_operand:VI_AVX2 1 "bcst_vector_operand" "<comm>0,v") >> (match_operand:VI_AVX2 2 "bcst_vector_operand" "xBm,vmBr")))] >> >> the vec_duplicate version should only really have one alternative. > It would be guaranteed by it's attribute (set_attr "isa" "noavx,avx"), > since bcst_mem_operand implies avx512f, which of course implies avx, > therefore the first alternative would never be enabled under avx.
Ah, OK. >> I guess we could handle that by using a: >> >> (one_of 0 >> [(set_attr "enabled" "*,*") >> (set_attr "enabled" "0,*")]) >> >> or some variant of that that uses a derived attribute. But it feels >> a bit clunky… >> >> Without that, I guess the only pattern that would benefit directly is: >> >> (define_insn "avx512dq_mul<mode>3<mask_name>" >> [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v") >> (mult:VI8_AVX512VL >> (match_operand:VI8_AVX512VL 1 "bcst_vector_operand" "%v") >> (match_operand:VI8_AVX512VL 2 "bcst_vector_operand" "vmBr")))] >> >> > So suppose I should revert my former 2 patches and add separate bcst >> > patterns. >> >> Are there going to more patterns that need bcst_vector_operand, > > Almost all AVX512 instructions need corresponding bcst patterns except > for those with 8-bit/16-bit data elements. OK, that changes things. Sorry, not knowing the architecture, I wasn't sure how far this was from being complete. >> or is the current set complete? >> >> I definitely think we should have a better way of handling this in the >> .md files, and I'd be happy to hack something up on the generator side >> (given that I'm being the awkward one here). But I guess the answer to >> the question above will decide whether it make things better or not. >> >> FWIW, I think having separate patterns (whether they're produced from >> one .md construct or from several) might better optimisation results. > > With proper extending for special memory constraint, they should be > equivalent. > At least unit tests under i386 backend would generate embedded > broadcast instruction as before. I guess my main objection is that we have a special memory constraint that isn't in fact matching a MEM (at least not directly). That seems odd and feels like it's going to come back to bite us. >From an RTL perspective, the MEM in the bcst_memory_operand isn't that special. It's really just a normal memory that happens to appear in a VEC_DUPLICATE. With the MIPS thing, the SIGN_EXTEND was around a register rather than a memory, and the register constraints applied to the register as normal. In other words, the SIGN_EXTEND was not part of the constraint: target-independent code removed the SIGN_EXTEND before matching against the constraints. I think we could view the bcst_mem_operand case in a similar way. The constraint process ends up having two steps: an extraction step (for the REG inside a SIGN_EXTEND for MIPS, for the MEM inside a VEC_DUPLICATE for AVX512) and the normal matching step on the result. This could either happen on a constraint-by-constraint basis or (perhaps more efficiently) as a separate step before applying the constraints. Not sure off-hand which is better, would need to think more about it. But I think this extraction process should be described directly in the .md file somehow. I've had a few ideas but I'm not happy enough with any of them yet to post. Thanks, Richard