RFC: vectorizer cost model

Linthicum, Tony Fri, 16 Feb 2007 09:05:55 -0800

Hello all,

Dorit Nuzman and I have been collaborating on a plan for a cost model
for the vectorizer.  Included below is an overview of the design for 
the initial implementation.  We would welcome any input those of you 
on the list might have.


Thanks in advance for your help.

Dorit and Tony

----------------------------------------------------------------------

In order to make the best optimization decisions possible, the
vectorizer requires cost information.  This cost model will allow it
do two things: 1) avoid vectorizing unprofitable loops and, 2) to
choose the most profitable method for vectorization when multiple
alternatives are available, e.g. alternative methods for misalignment
handling, and different schemes for reduction prologs and epilogs.  We
describe a first implementation of such a cost model below.

We want to enable a framework quickly so that others can build on it
(and help improve it).  With that in mind, the initial implementation
of the cost model will focus on avoiding unprofitable vectorization
only, assuming current heuristics/decisions on vectorization schemes.
As a fisrt step, we will use a simplistic cost analysis for each
statement.  That is, a cost of "1" will be associated with each
statement in the scalar and vector loops (as well as with each stmt in
peel/prolog/epilog code).  Later, we will refine this model, most
probably by adding an interface to access target machine data for this
purpose.  At that time we will discuss that particular interface
further on the list as that may prove useful outside of the
vectorizer.

Also coming up as a later step, we will add the functionality to query
the cost model as to the costs of different alternatives for
vectorizing individual statments, e.g. misaligned accesses vs peeling.
Our first implementation, though, when given a vectorized loop will
answer one basic question: Is the cost of the vectorized loop less
than the cost of the scalar version of the loop?  More precisely, we
want the minimum number of iterations (niters) for which:

  vec_loop_cost < scalar_loop_cost + delta

where:

  vect_loop_cost = sum_of_non_vect_loop_costs +
                   (niters/VF * sum_of_vect_loop_costs) + peel_cost +
                   versioning_cost

  scalar_loop_cost = sum_of_non_loop_costs +
                     (niters * sum_of_loop_costs)

  delta = TBD (represents things like cost of any code size increase,
          etc).

There are some alternatives to some of the above as well as some open
questions that should be mentioned here.  Among these are:

  * Would using a tree-level API like estimate_num_insns be superior
    to using a simple cost of 1 per statment?

  * What is the best way to access target level cost information?

  * How do we measure the costs of any code size increase or the
    benefits of any code size decrease?

  * How do we assign costs to any necessary guard code?

  * How do we compare the costs of if-converted vectorized code to
    it's scalar counterpart?

We would appreciate any comments, thoughts or ideas on any of these
topics and on the plan for the cost model in general.

For those interested, we have included some specifics on the first
implementation of the cost model as described above.  The interface to
the cost model will be a a query function that will access statement
level data to return costs associated with the entire loop.  The data
for a given statement will be gathered during the analysis phase of
the vectorizer via the vectorizeable_* functions
(e.g. vectorizable_operation, vectorizable_load, etc ...).  These
gathered costs will be associated with the statement through new
fields in the stmt_vinfo structure.  The intent is that this query
function can be called for a loop after the vectorizer's analysis has
been done, but before any transformations have been performed.

Each type of vector statement, as defined by stmt_vec_info_type, will
have the follwing costs associated with them:

  * loop_costs:     cost of any statements generated inside the loop
                    to vectorize the given statement.
  * non_loop_costs: cost of any statements generated outside
                    the loop to vectorize this statement.

Other cost model data will be gathered from the loop_vec_info
structure for the loop.  These include:

  * peeling_for_alignment: used to calculate peeling costs.
  * vectorization_factor:  used to calculate overall vector loop cost.
  * num_iters:             used to calculate overall vector loop cost.

As stated above, vectorization cost data will be gathered on a per
statement basis during the vector analysis phase in the vectorizable_*
functions.  If multiple vector statements are generated from a single
statement in order to vectorize it, the costs for each vector
statement will be summed.  If the statement(s) occur within the
vectorized loop, then they will be held in the newly added loop_costs
field.  If any statements are generated in loop prolog or epilog code
(e.g. vectorized reduction prolog and epilog code), then that data
will be stored in the non_loop_costs field.

The loop_vec_info structure will be the starting point for all loop
level cost analysis.  Loop level costs will be calculated utilizing
both data already stored in the structure itself, and by iterating
over the statement level data described above.  As indicated, the cost
of peeling will be calculated using the peeling factor stored here,
and the number of iterations and the vectorization factor will be
utilized along with the statement level data to calculate the overall
cost of the vector loop as compared to the scalar loop.  These costs
will then be used to determine the minimum trip count needed for the
vector loop to be profitable.

RFC: vectorizer cost model

Reply via email to