On Tue, Sep 08, 2015 at 09:21:08AM +0100, James Greenhalgh wrote:
> On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
> > On 04/09/15 13:32, James Greenhalgh wrote:
> > > In that case, these should be implemented as inline assembly blocks. As it
> > > stands, the code generation for these intrinsics will be very poor with 
> > > this
> > > patch applied.
> > >
> > > I'm going to hold off OKing this until I see a follow-up to fix the code
> > > generation, either replacing those particular intrinsics with inline asm,
> > > or doing the more comprehensive fix in the back-end.
> > >
> > > Thanks,
> > > James
> > 
> > In that case, here is the follow-up now ;). This fixes each of the following
> > functions to generate a single instruction followed by ret:
> >   * vld1_dup_f16, vld1q_dup_f16
> >   * vset_lane_f16, vsetq_lane_f16
> >   * vget_lane_f16, vgetq_lane_f16
> >   * For IN of type either float16x4_t or float16x8_t, and constant C:
> > return (float16x4_t) {in[C], in[C], in[C], in[C]};
> >   * Similarly,
> > return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], 
> > in[C]};
> > (These correspond intuitively to what one might expect for "vdup_lane_f16",
> > "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics,
> > although such intrinsics do not actually exist.)
> > 
> > This patch does not deal with equivalents to vdup_n_s16 and other intrinsics
> > that load immediates, rather than using elements of pre-existing vectors.
> 
> What is code generation like for these then? if I remeber correctly it
> was the vdup_n_f16 implementation that looked most objectionable before.

Ah, I see what you are saying here. You mean: if there were intrinsics
equivalent to vdup_n_s16 (which there are not), then this patch would not
handle them. I was confused as vld1_dup_f16 does not use an element of a
pre-existing vector, and may well load an immediate, but is handled by
your patch.

Sorry for the noise.

James


Reply via email to