On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote: > On 16/09/15 15:28, Bill Schmidt wrote: > > 2015-09-16 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > > > * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN, > > UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL, > > UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL, > > UNSPEC_REDUC_UMIN_SCAL): New enumerated constants. > > (reduc_smax_v2di): New define_expand. > > (reduc_smax_scal_v2di): Likewise. > > (reduc_smin_v2di): Likewise. > > (reduc_smin_scal_v2di): Likewise. > > (reduc_umax_v2di): Likewise. > > (reduc_umax_scal_v2di): Likewise. > > (reduc_umin_v2di): Likewise. > > (reduc_umin_scal_v2di): Likewise. > > (reduc_smax_v4si): Likewise. > > (reduc_smin_v4si): Likewise. > > (reduc_umax_v4si): Likewise. > > (reduc_umin_v4si): Likewise. > > (reduc_smax_v8hi): Likewise. > > (reduc_smin_v8hi): Likewise. > > (reduc_umax_v8hi): Likewise. > > (reduc_umin_v8hi): Likewise. > > (reduc_smax_v16qi): Likewise. > > (reduc_smin_v16qi): Likewise. > > (reduc_umax_v16qi): Likewise. > > (reduc_umin_v16qi): Likewise. > > (reduc_smax_scal_<mode>): Likewise. > > (reduc_smin_scal_<mode>): Likewise. > > (reduc_umax_scal_<mode>): Likewise. > > (reduc_umin_scal_<mode>): Likewise. > > You shouldn't need the non-_scal reductions. Indeed, they shouldn't be used > if > the _scal are present. The non-_scal's were previously defined as producing a > vector with one element holding the result and the other elements all zero, > and > this was only ever used with a vec_extract immediately after; the _scal > pattern > now includes the vec_extract as well. Hence the non-_scal patterns are > deprecated / considered legacy, as per md.texi.
Thanks -- I had misread the description of the non-scalar versions, missing the part where the other elements are zero. What I really want/need is an optab defined as computing the maximum value in all elements of the vector. This seems like a strange thing to want, but Alan Hayward's proposed patch will cause us to generate the scalar version, followed by a broadcast of the vector. Since our patterns already generate the maximum value in all positions, this creates an unnecessary extract followed by an unnecessary broadcast. As discussed elsewhere, we *could* remove the unnecessary code by recognizing this in simplify-rtx, etc., but the vectorization cost modeling would be wrong. It would have still told us to model this as a vec_to_scalar for the reduc_max_scal, and a vec_stmt for the broadcast. This would overcount the cost of the reduction compared to what we would actually generate. To get this right for all targets, one could envision having a new optab for a reduction-to-vector, which most targets wouldn't implement, but PowerPC and AArch32, at least, would. If a target has a reduction-to-vector, the vectorizer would have to generate a different GIMPLE code that mapped to this; otherwise it would do the REDUC_MAX_EXPR and the broadcast. This obviously starts to get complicated, since adding a GIMPLE code certainly has a nontrivial cost. :/ Perhaps the practical thing is to have the vectorizer also do an add_stmt_cost with some new token that indicates the cost model should make an adjustment if the back end doesn't need the extract/broadcast. Targets like PowerPC and AArch32 could then subtract the unnecessary cost, and remove the unnecessary code in simplify-rtx. Copying Richi and ARM folks for opinions on the best design. I want to be able to model this stuff as accurately as possible, but obviously we need to avoid unnecessary effects on other architectures. In any case, I will remove implementing the deprecated optabs, and I'll also try to look at Alan L's patch shortly. Thanks, Bill > > I proposed a patch to migrate PPC off the old patterns, but have forgotten to > ping it recently - last at > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html ... (ping?!) > > --Alan >