on 2020/5/27 下午6:02, Richard Sandiford wrote: > "Kewen.Lin" <li...@linux.ibm.com> writes: >> Hi Richard, >> >> Thanks for your comments! >> >> on 2020/5/26 锟斤拷锟斤拷8:49, Richard Sandiford wrote: >>> "Kewen.Lin" <li...@linux.ibm.com> writes: >>>> @@ -626,6 +645,12 @@ public: >>>> /* True if have decided to use a fully-masked loop. */ >>>> bool fully_masked_p; >>>> >>>> + /* Records whether we still have the option of using a length access >>>> loop. */ >>>> + bool can_with_length_p; >>>> + >>>> + /* True if have decided to use length access for the loop fully. */ >>>> + bool fully_with_length_p; >>> >>> Rather than duplicate the flags like this, I think we should have >>> three bits of information: >>> >>> (1) Can the loop operate on partial vectors? Starts off optimistically >>> assuming "yes", gets set to "no" when we find a counter-example. >>> >>> (2) If we do decide to use partial vectors, will we need loop masks? >>> >>> (3) If we do decide to use partial vectors, will we need lengths? >>> >>> Vectorisation using partial vectors succeeds if (1) && ((2) != (3)) >>> >>> LOOP_VINFO_CAN_FULLY_MASK_P currently tracks (1) and >>> LOOP_VINFO_MASKS currently tracks (2). In pathological cases it's >>> already possible to have (1) && !(2), see r9-6240 for an example. >>> >>> With the new support, LOOP_VINFO_LENS tracks (3). >>> >>> So I don't think we need the can_with_length_p. What is now >>> LOOP_VINFO_CAN_FULLY_MASK_P can continue to track (1) for both >>> approaches, with the final choice of approach only being made >>> at the end. Maybe it would be worth renaming it to something >>> more generic though, now that we have two approaches to partial >>> vectorisation. >> >> I like this idea! I could be wrong, but I'm afraid that we >> can not have one common flag to be shared for both approaches, >> the check criterias could be different for both approaches, one >> counter example for length could be acceptable for masking, such >> as length can only allow CONTIGUOUS related modes, but masking >> can support more. When we see acceptable VMAT_LOAD_STORE_LANES, >> we leave LOOP_VINFO_CAN_FULLY_MASK_P true, later should length >> checking turn it to false? I guess no, assuming still true, then >> LOOP_VINFO_CAN_FULLY_MASK_P will mean only partial vectorization >> for masking, not for both. We can probably clean LOOP_VINFO_LENS >> when the length checking is false, but we just know the vec is empty, >> not sure we are unable to do partial vectorization with length, >> when we see LOOP_VINFO_CAN_FULLY_MASK_P true, we could still >> record length into it if possible. > > Let's call the flag in (1) CAN_USE_PARTIAL_VECTORS_P rather than > CAN_FULLY_MASK_P to (try to) avoid any confusion from the current name. > > What I meant is that each vectorizable_* routine has the responsibility > of finding a way of coping with partial vectorisation, or setting > CAN_USE_PARTIAL_VECTORS_P to false if it can't. > > vectorizable_load chooses the VMAT first, and then decides based on that > whether partial vectorisation is supported. There's no influence in > the other direction (partial vectorisation doesn't determine the VMAT). > > So once it has chosen a VMAT, vectorizable_load needs to try to find a way > of handling the operation with partial vectorisation. Currently the only > way of doing that for VMAT_LOAD_STORE_LANES is using masks. So at the > moment there are two possible outcomes: > > - The target supports the necessary IFN_MASK_LOAD_LANES function. > If so, we can use partial vectorisation for the statement, so we > leave CAN_USE_PARTIAL_VECTORS_P true and record the necessary masks > in LOOP_VINFO_MASKS. > > - The target doesn't support the necessary IFN_MASK_LOAD_LANES function. > If so, we can't use partial vectorisation, so we clear > CAN_USE_PARTIAL_VECTORS_P. > > That's how things work at the moment. It would work in the same way > for lengths if we ever supported IFN_LEN_LOAD_LANES: we'd check whether > IFN_LEN_LOAD_LANES is available and record the length in LOOP_VINFO_LENS > if so. If partial vectorisation isn't supported (via masks or lengths), > we'd continue to clear CAN_USE_PARTIAL_VECTORS_P. > > But equally, if we never add support for IFN_LEN_LOAD_LANES, the current > code continues to work with length-based approaches. We'll continue to > clear CAN_USE_PARTIAL_VECTORS_P for VMAT_LOAD_STORE_LANES when the > target provides no IFN_MASK_LOAD_LANES function. >
Thanks a lot for your detailed explanation! This proposal looks good based on the current implementation of both masking and length. I may think too much, but I had a bit concern as below when some targets have both masking and length supports in future, such as ppc adds masking support like SVE. I assumed that you meant each vectorizable_* routine should record the objs for any available partial vectorisation approaches. If one target supports both, we would have both recorded but decide not to do partial vectorisation finally since both have records. The target can disable length like through optab to resolve it, but there is one possibility that the masking support can be imperfect initially since ISA support could be gradual, it further leads some vectorizable_* check or final verification to fail for masking, and length approach may work here but it gets disabled. We can miss to use partial vectorisation here. The other assumption is that each vectorizable_* routine record the first available partial vectorisation approach, let's assume masking takes preference, then it's fine to record just one here even if one target supports both approaches, but we still have the possiblity to miss the partial vectorisation chance as some check/verify fail with masking but fine with length. Does this concern make sense? BR, Kewen > As I say, this is all predicated on the assumption that we don't need > to mix both masks and lengths in the same loop, and so can decide not > to use partial vectorisation when both masks and lengths have been > recorded. This is a check that would happen at the end, after all > statements have been analysed. > > (There's no reason in principle why we *couldn't* support both > approaches in the same loop, but it's not worth adding the code > for that until there's a use case.) > > Thanks, > Richard >