The following fig (1) shows an implementation of the SSQ kernel from the BLAS Library in ATLAS. Fig(2) shows the conversions of the IF-THEN-ELSE in Fig(1) to vectorized code. Normally in the automatic vectorization the IF-THEN-ELSE is vectorized only after the IF-CONVERSION that converts control flow to Data flow and then raise the opportunity of vectorization.
The Fig(1) and Fig(2) is taken from the article " Vectorization Past Dependent Branches through speculation" Qing Yi etal. The following conversion given in the above article checks the IF-THEN and IF-ELSE separately and checks for the candidates of vectorization. If the IF-THEN can be vectorized and IF-ELSE cannot be vectorized and the whole conversion given in the Fig (2) shows how the conditional branches can be vectorized. In the below case the IF-THEN-ELSE need not be IF-CONVERTED and the conditional branches is vectorized by the transformation shown in FIG (2). ssq = *ssq0; scal = *scal0; For(i=0;I < N ;i++) { ax= x[i]; ax = ABS & Ax; If( ax <=scal) { t0= ax/scal; ssq+= t0 * t0; } Else { t0= scal/ax; t0 = t0 * t0; t1= ssq * t0; ssq = 1.0 + t1; scal = ax; } } *ssq0 = ssq; *scal0 = scal; FIG ( 1) Transformed to VECTOR_PROLOGUE: VABS = {ABS, ABS, ABS, ABS}; Vssq = {ssq0, 0.0, 0.0, 0.0}; Vscal = { scal, scal, scal , scal}; VECTOR_LOOP: For(i=0;I < N4 ;i++) { Vax= x[i: I+3]; Vax = VABS & Vax; If( VEC_ANY(Vax > Vscal) GOTO SCALAR_RESTART; Vt0= vax/vscal; Vssq+= Vt0 *V t0; continue; SCALAR_RESTART: // Vector to Scalar. Ssq= sum(vssq[0:3]); // Scalar Loop For(j=0;I < 4 ;j++) { ax= x[i+j]; ax = AbS & Ax; If( ax >scal) { t0= scal/ax; t0 = t0 * t0; t1= ssq * t0; ssq = 1.0 + t1; scal = ax; } Else{ t0= ax/scal; ssq+= t0 * t0; } } Vssq= { ssq, 0.0, 0.0, 0.0}; Vscal = { scal, scal, scal, scal} } VECTOR_EPILOGUE: ssq = sum(Vssq[0:3]); scal = Vscal[0]; FIG(2). This looks interesting to me considering the IF-THEN-ELSE inside the Loop to be vectorized as given above without IF-Converting. I am not sure how many such sequences of the code will be triggered for SPEC benchmark but it looks interesting to me to be implemented. Thoughts Please ? Thanks & Regards Ajit