Michael Matz <m...@suse.de> writes: > Hi, > > On Thu, 23 Nov 2017, Richard Sandiford wrote: > >> > I don't want variable-size vector special-casing everywhere. I want >> > it to be somehow naturally integrating with existing stuff. >> >> It's going to be a special case whatever happens though. > > It wouldn't have to be this way. It's like saying that loops with a > constant upper bound should be represented in a different way than loops > with an invariant upper bound. That would seem like a bad idea.
The difference is that with a loop, each iteration follows a set pattern. But: (1) for constant-length VEC_PERM_EXPRs, each element of the permutation vector is independent of the others: you can't predict what the selector for element i is given the selectors for the other elements. (2) for variable-length permutes, the elements *do* have to follow a set pattern that can be extended indefinitely. Or do you mean that we should use the new representation of interleave masks even for constant-length vectors, rather than using a VECTOR_CST? I suppose that would be more consistent, but we'd then have to check when generating a VEC_PERM_EXPR of a VECTOR_CST whether it should be represented in this new way instead. I think we then lose the benefit using a single tree code. The decision for VEC_DUPLICATE_CST and VEC_SERIES_CST was to restrict them only to variable-length vectors. >> If it's a VEC_PERM_EXPR then it'll be a new form of VEC_PERM_EXPR. > > No, it'd be a VEC_PERM_EXPR where the magic mask is generated by a new > EXPR type, instead of being a mere constant. (See [1] below) >> The advantage of the internal functions and optabs is that they map to a >> concept that already exists. The code that generates the permutation >> already knows that it's generating an interleave lo/hi, and like you >> say, it used to do that directly via special tree codes. I agree that >> having a VEC_PERM_EXPR makes more sense for the constant-length case, >> but the concept is still there. >> >> And although using VEC_PERM_EXPR in gimple makes sense, I think not >> having the optabs is a step backwards, because it means that every >> target with interleave lo/hi has to duplicate the detection logic. > > The middle end can provide helper routines to make detection easy. The > RTL expander could also match VEC_PERM_EXPR to specific optabs, if we > really really want to add optab over optab for each specific kind of > permutation in the future. Adding a new optab doesn't seem like a big deal to me, if it's for something that target-independent code already needs to worry about. (The reason we need these specific optabs is because the target- independent code already generates these particular permutes.) The overhead attached to adding an optab isn't really any higher than adding a new detector function, especially on targets that don't implement the optab. > In a way the difference boils down to have > PERM(x,y, TYPE) > (with TYPE being, say, HI_LO, EXTR_EVEN/ODD, REVERSE, and what not) > vs. > PERM_HI_LO(x,y) > PERM_EVEN(x,y) > PERM_ODD(x,y) > PERM_REVERSE(x,y) > ... > > The former way seems saner for an intermediate representation. In this > specific case TYPE would be detected by magicness of the constant, and if > extended to SVE by magicness of the definition of the variably-sized > invariant. [1] That sounds similar to the way that COND_EXPR and VEC_COND_EXPR can embed the comparison in the first operand. I think Richard has complained about that in the past (and it does cause some ugliness in the way mask types are calculated during vectorisation). >> > I'm not suggesting to expose it as an operation. I'm suggesting that >> > if the target can vec_perm_const_ok () with an "interleave/extract" >> > permutation then we should be able to represent that with >> > VEC_PERM_EXPR and thus also represent the permutation vector. >> >> But vec_perm_const_ok () takes a fixed-length mask, so it can't be >> used here. It would need to be a new hook (and thus a new special >> case for variable-length vectors). > > Why do you reject extending vec_perm_const_ok to _do_ take an invarant > mask? What kind of interface were you thinking of though? Note that the current interface is independent of the tree or rtl levels, since it's called by both gimple optimisers and vec_perm_const expanders. I assume we'd want to keep that. >> > But is there more? Possibly a generic permute but for it you'd have >> > to explicitely construct a permutation vector using some primitives >> > like that "series" instruction? So for that case it's reasonable to >> > have GIMPLE like >> > >> > perm_vector_1 = VEC_SERIES_EXRP <...> >> > ... >> > v_2 = VEC_PERM_EXPR <.., .., perm_vector_1>; >> > >> > that is, it's not required to pretend the VEC_PERM_EXRP is a single >> > instruction or the permutation vector is "constant"? >> >> ...by taking this approach, we're saying that we need to ensure that >> there is always a way of representing every directly-supported variable- >> length permutation mask as a constant, so that it doesn't get split from >> VEC_PERM_EXPR. > > I'm having trouble understanding this. Why would splitting away the > defintion of perm_vector_1 from VEC_PERM_EXPR be a problem? It's still > the same VEC_SERIES_EXRP, and hence still recognizable as a special > permutation (if it is one). The optimizer won't touch VEC_SERIES_EXRP, or > if they do (e.g. combine two of them), and they feed a VEC_PERM_EXPR they > will make sure the combined result still is supported by the target. The problem with splitting it out is that it just becomes any old gassign, and you don't normally have to check the uses of an SSA_NAME before optimising the definition. > In a way, on targets which support only specific forms of permutation for > the vector type in question, this invariant mask won't be explicitely > generated in code, it's an abstract tag in the IR to specific the type of > the transformation. Hence moving the def for that tag around is no > problem. But why should the def exist as a separate gimple statement in that case? If it's one operation then it seems better to keep it as one operation, both for code-gen and for optimisation. >> I don't see why that's better than having internal >> functions. > > The real difference isn't internal functions vs. expression nodes, but > rather multiple node types vs. a single node type. But maybe more specifically: multiple node types that each have a single form vs. one node type that has multiple forms. And at that level it seems like it's a difference between: TREE_CODE (gimple_assign_rhs_code (x) == VEC_PERM_EXPR) vs. gimple_vec_perm_p (x) Thanks, Richard