The following fig (1) shows an implementation of the SSQ kernel from the BLAS 
Library in ATLAS. 
Fig(2) shows the conversions of the IF-THEN-ELSE in Fig(1) to vectorized code. 
Normally in the automatic vectorization the IF-THEN-ELSE is 
vectorized  only after the IF-CONVERSION that  converts control flow to Data 
flow and then raise the opportunity of vectorization.

The Fig(1) and Fig(2) is taken from the article

" Vectorization Past Dependent Branches through speculation" Qing Yi etal.

The following conversion given in the above article checks the IF-THEN and 
IF-ELSE separately and checks for the candidates of vectorization.
If the IF-THEN can be vectorized and IF-ELSE cannot be vectorized and the whole 
 conversion given in the Fig (2) shows how the conditional branches
can be  vectorized.

In the below case the IF-THEN-ELSE need not be IF-CONVERTED and the conditional 
branches is vectorized  by the transformation shown in FIG (2).

ssq = *ssq0;
scal = *scal0;

For(i=0;I < N ;i++)
{
    ax= x[i];
    ax = ABS & Ax;
    If( ax <=scal)
    {
         t0= ax/scal;
         ssq+= t0 * t0;
    }
   Else
   {
       t0= scal/ax;
       t0 = t0 * t0;
        t1= ssq * t0;
        ssq = 1.0 + t1;
       scal = ax;
   }
}
 *ssq0 = ssq;
*scal0 = scal;

FIG ( 1)

  Transformed to 

VECTOR_PROLOGUE:
VABS = {ABS, ABS, ABS, ABS};
Vssq = {ssq0, 0.0, 0.0, 0.0};
Vscal =  { scal, scal, scal , scal};

VECTOR_LOOP:
For(i=0;I < N4 ;i++)
{
    Vax= x[i: I+3];
    Vax = VABS & Vax;
    If( VEC_ANY(Vax  > Vscal)
      GOTO SCALAR_RESTART;
    
      Vt0= vax/vscal;
       Vssq+= Vt0 *V t0;
       continue;
    
   SCALAR_RESTART:
  // Vector to Scalar.

   Ssq= sum(vssq[0:3]);

// Scalar Loop
For(j=0;I < 4 ;j++)
{
    ax= x[i+j];
    ax = AbS & Ax;
    If( ax >scal)
    {
         t0= scal/ax;
       t0 = t0 * t0;
        t1= ssq * t0;
        ssq = 1.0 + t1;
       scal = ax;
    }
    Else{
         t0= ax/scal;
         ssq+= t0 * t0;
    }
 }
  Vssq= { ssq, 0.0, 0.0, 0.0};
  Vscal = { scal, scal, scal, scal}

}
VECTOR_EPILOGUE:

ssq = sum(Vssq[0:3]);
scal = Vscal[0];

  FIG(2).

This looks interesting to me  considering the IF-THEN-ELSE inside the Loop to 
be vectorized as given above without IF-Converting.
I am not sure how many such sequences of the code will be triggered for SPEC 
benchmark but it looks interesting to me to be 
implemented.

Thoughts Please ?

Thanks & Regards
Ajit

Reply via email to