"Kewen.Lin" <li...@linux.ibm.com> writes:
> Hi Richard,
>
> Thanks for your comments!
>
> on 2020/5/26 锟斤拷锟斤拷8:49, Richard Sandiford wrote:
>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>> @@ -626,6 +645,12 @@ public:
>>>    /* True if have decided to use a fully-masked loop.  */
>>>    bool fully_masked_p;
>>>  
>>> +  /* Records whether we still have the option of using a length access 
>>> loop.  */
>>> +  bool can_with_length_p;
>>> +
>>> +  /* True if have decided to use length access for the loop fully.  */
>>> +  bool fully_with_length_p;
>> 
>> Rather than duplicate the flags like this, I think we should have
>> three bits of information:
>> 
>> (1) Can the loop operate on partial vectors?  Starts off optimistically
>>     assuming "yes", gets set to "no" when we find a counter-example.
>> 
>> (2) If we do decide to use partial vectors, will we need loop masks?
>> 
>> (3) If we do decide to use partial vectors, will we need lengths?
>> 
>> Vectorisation using partial vectors succeeds if (1) && ((2) != (3))
>> 
>> LOOP_VINFO_CAN_FULLY_MASK_P currently tracks (1) and
>> LOOP_VINFO_MASKS currently tracks (2).  In pathological cases it's
>> already possible to have (1) && !(2), see r9-6240 for an example.
>> 
>> With the new support, LOOP_VINFO_LENS tracks (3).
>> 
>> So I don't think we need the can_with_length_p.  What is now
>> LOOP_VINFO_CAN_FULLY_MASK_P can continue to track (1) for both
>> approaches, with the final choice of approach only being made
>> at the end.  Maybe it would be worth renaming it to something
>> more generic though, now that we have two approaches to partial
>> vectorisation.
>
> I like this idea!  I could be wrong, but I'm afraid that we
> can not have one common flag to be shared for both approaches,
> the check criterias could be different for both approaches, one
> counter example for length could be acceptable for masking, such
> as length can only allow CONTIGUOUS related modes, but masking
> can support more.  When we see acceptable VMAT_LOAD_STORE_LANES,
> we leave LOOP_VINFO_CAN_FULLY_MASK_P true, later should length
> checking turn it to false?  I guess no, assuming still true, then 
> LOOP_VINFO_CAN_FULLY_MASK_P will mean only partial vectorization
> for masking, not for both.  We can probably clean LOOP_VINFO_LENS
> when the length checking is false, but we just know the vec is empty,
> not sure we are unable to do partial vectorization with length,
> when we see LOOP_VINFO_CAN_FULLY_MASK_P true, we could still
> record length into it if possible.

Let's call the flag in (1) CAN_USE_PARTIAL_VECTORS_P rather than
CAN_FULLY_MASK_P to (try to) avoid any confusion from the current name.

What I meant is that each vectorizable_* routine has the responsibility
of finding a way of coping with partial vectorisation, or setting
CAN_USE_PARTIAL_VECTORS_P to false if it can't.

vectorizable_load chooses the VMAT first, and then decides based on that
whether partial vectorisation is supported.  There's no influence in
the other direction (partial vectorisation doesn't determine the VMAT).

So once it has chosen a VMAT, vectorizable_load needs to try to find a way
of handling the operation with partial vectorisation.  Currently the only
way of doing that for VMAT_LOAD_STORE_LANES is using masks.  So at the
moment there are two possible outcomes:

- The target supports the necessary IFN_MASK_LOAD_LANES function.
  If so, we can use partial vectorisation for the statement, so we
  leave CAN_USE_PARTIAL_VECTORS_P true and record the necessary masks
  in LOOP_VINFO_MASKS.

- The target doesn't support the necessary IFN_MASK_LOAD_LANES function.
  If so, we can't use partial vectorisation, so we clear
  CAN_USE_PARTIAL_VECTORS_P.

That's how things work at the moment.  It would work in the same way
for lengths if we ever supported IFN_LEN_LOAD_LANES: we'd check whether
IFN_LEN_LOAD_LANES is available and record the length in LOOP_VINFO_LENS
if so.  If partial vectorisation isn't supported (via masks or lengths),
we'd continue to clear CAN_USE_PARTIAL_VECTORS_P.

But equally, if we never add support for IFN_LEN_LOAD_LANES, the current
code continues to work with length-based approaches.  We'll continue to
clear CAN_USE_PARTIAL_VECTORS_P for VMAT_LOAD_STORE_LANES when the
target provides no IFN_MASK_LOAD_LANES function.

As I say, this is all predicated on the assumption that we don't need
to mix both masks and lengths in the same loop, and so can decide not
to use partial vectorisation when both masks and lengths have been
recorded.  This is a check that would happen at the end, after all
statements have been analysed.

(There's no reason in principle why we *couldn't* support both
approaches in the same loop, but it's not worth adding the code
for that until there's a use case.)

Thanks,
Richard

Reply via email to