On Tue, Sep 08, 2015 at 09:21:08AM +0100, James Greenhalgh wrote: > On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote: > > On 04/09/15 13:32, James Greenhalgh wrote: > > > In that case, these should be implemented as inline assembly blocks. As it > > > stands, the code generation for these intrinsics will be very poor with > > > this > > > patch applied. > > > > > > I'm going to hold off OKing this until I see a follow-up to fix the code > > > generation, either replacing those particular intrinsics with inline asm, > > > or doing the more comprehensive fix in the back-end. > > > > > > Thanks, > > > James > > > > In that case, here is the follow-up now ;). This fixes each of the following > > functions to generate a single instruction followed by ret: > > * vld1_dup_f16, vld1q_dup_f16 > > * vset_lane_f16, vsetq_lane_f16 > > * vget_lane_f16, vgetq_lane_f16 > > * For IN of type either float16x4_t or float16x8_t, and constant C: > > return (float16x4_t) {in[C], in[C], in[C], in[C]}; > > * Similarly, > > return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], > > in[C]}; > > (These correspond intuitively to what one might expect for "vdup_lane_f16", > > "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics, > > although such intrinsics do not actually exist.) > > > > This patch does not deal with equivalents to vdup_n_s16 and other intrinsics > > that load immediates, rather than using elements of pre-existing vectors. > > What is code generation like for these then? if I remeber correctly it > was the vdup_n_f16 implementation that looked most objectionable before.
Ah, I see what you are saying here. You mean: if there were intrinsics equivalent to vdup_n_s16 (which there are not), then this patch would not handle them. I was confused as vld1_dup_f16 does not use an element of a pre-existing vector, and may well load an immediate, but is handled by your patch. Sorry for the noise. James