Re: SPEC 456.hmmer vectorization question

2017-03-07 Thread Michael Matz
Hi Steve,

On Mon, 6 Mar 2017, Steve Ellcey wrote:

> I was looking at the spec 456.hmmer benchmark and this email string
> from Jeff Law and Micheal Matz:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html
> 
> and was wondering if anyone was looking at what more it would take
> for GCC to vectorize the loop in P7Viterbi.

It takes what I wrote in there.  There are two important things that need 
to happen to get the best performance (at least from an analysis I did in 
2011, but nothing material should have changed since then):

(1) loop distribution to make some memory streams vectorizable (and leave 
the others in non-vectorized form).
(1a) loop splitting based on conditional (to remove the k mc[k])  mc[k] = sc;
  if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
  if ((sc = xmb  + bp[k]) > mc[k])  mc[k] = sc;
  mc[k] += ms[k];
  if (mc[k] < -INFTY) mc[k] = -INFTY;
}

for (k = 1; k < M; k++) {
  dc[k] = dc[k-1] + tpdd[k-1];
  if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc;
  if (dc[k] < -INFTY) dc[k] = -INFTY;
}

for (k = 1; k < M; k++) {
ic[k] = mpp[k] + tpmi[k];
if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc;
ic[k] += is[k];
if (ic[k] < -INFTY) ic[k] = -INFTY;
}
/* last iteration of original loop */
k = M;
  mc[k] = mpp[k-1]   + tpmm[k-1];
  if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
  if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
  if ((sc = xmb  + bp[k]) > mc[k])  mc[k] = sc;
  mc[k] += ms[k];
  if (mc[k] < -INFTY) mc[k] = -INFTY;

  dc[k] = dc[k-1] + tpdd[k-1];
  if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc;
  if (dc[k] < -INFTY) dc[k] = -INFTY;

(Note again, that this is only valid with disambiguation).  Adding 
restrict qualifiers at the top of routine like so:

#define R __restrict
  int  * R mc, * R dc, * R ic;/* pointers to rows of mmx, dmx, imx */
  int  * R ms, * R is; /* pointers to msc[i], isc[i] */
  int  * R mpp, * R mpc, * R ip;  /* ptrs to mmx[i-1], mmx[i], imx[i-1] */
  int  * R bp;   /* ptr into bsc[] */
  int  * R ep;  /* ptr into esc[] */
  int  * R dpp; /* ptr into dmx[i-1] (previous row) */
  int  * R tpmm, * R tpmi, * R tpmd, * R tpim, * R tpii, * R tpdm, * R tpdd; /* 
ptrs into tsc */

helps to vectorize this.  To get the final rest of performance also this 
transformation needs to happen on the dc[] loop:

dctemp=dc[0];
for (k = 1; k < M; k++) {
  dctemp = dctemp + tpdd[k-1];
  if ((sc = mc[k-1] + tpmd[k-1]) > dctemp) dctemp = sc;
  if (dctemp < -INFTY) dctemp = -INFTY;
  dc[k] = dctemp;
}

Our loop distribution should actually already be able to split off the 
three memory streams when restrict is added everywhere, at the 2011 time 
frame it didn't do it nevertheless (and I haven't looked if it would be 
able to do that now).

predictive commoning could do the dc[] transformation (part (2)), except 
that it can't without disambiguation.  That adding restrict doesn't help 
here is PR50419, but ultimately it would have to work on the disambiguated 
loop (without the restrict pointer).

So really the prerequisite to optimize hmmer is loop disambiguation, even 
with the many streams (and hence conditionals) that are there.  And it 
needs to happen well before the loop vectorizer, because loop splitting 
and distribution, _and_ predictive commoning have the disambiguation as 
prerequisite in this testcase.

After that loop distribution needs to be looked at why it doesn't want to 
distribute the streams, and then a variant of PR50419 needs to be fixed 
based on disambiguation info (not based on restrict).  For that we need 
infrastructure that would enable us to disambiguate mem accesses after 
loop nest versioning happened in the "good" version.


Ciao,
Michael.


diagnostics: %<%s%> vs. %qs

2017-03-07 Thread Roland Illig
Hi,

in the diagnostics the %qs specifier is used in most of the cases. But
there are some cases left where the more complicated %<%s%> is used. Is
there a good reason to prefer the complicated spelling?

Same for %<%T%> and %qT, and similar letters.

Regards,
Roland


gcc-5-20170307 is now available

2017-03-07 Thread gccadmin
Snapshot gcc-5-20170307 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20170307/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 245963

You'll find:

 gcc-5-20170307.tar.bz2   Complete GCC

  SHA256=301cbc0b886c6c1afdd86eb242dafe0667d8dc2837d441500da52ceb2d56dfcb
  SHA1=0b9ca28f5a60f75a7bdc8306917af278e4d34be4

Diffs from 5-20170228 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.