Re: [PATCH][AArch64] Improve code generation for float16 vector code

James Greenhalgh Tue, 08 Sep 2015 01:21:47 -0700

On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
> On 04/09/15 13:32, James Greenhalgh wrote:
> > In that case, these should be implemented as inline assembly blocks. As it
> > stands, the code generation for these intrinsics will be very poor with this
> > patch applied.
> >
> > I'm going to hold off OKing this until I see a follow-up to fix the code
> > generation, either replacing those particular intrinsics with inline asm,
> > or doing the more comprehensive fix in the back-end.
> >
> > Thanks,
> > James
> 
> In that case, here is the follow-up now ;). This fixes each of the following
> functions to generate a single instruction followed by ret:
>   * vld1_dup_f16, vld1q_dup_f16
>   * vset_lane_f16, vsetq_lane_f16
>   * vget_lane_f16, vgetq_lane_f16
>   * For IN of type either float16x4_t or float16x8_t, and constant C:
> return (float16x4_t) {in[C], in[C], in[C], in[C]};
>   * Similarly,
> return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], in[C]};
> (These correspond intuitively to what one might expect for "vdup_lane_f16",
> "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics,
> although such intrinsics do not actually exist.)
> 
> This patch does not deal with equivalents to vdup_n_s16 and other intrinsics
> that load immediates, rather than using elements of pre-existing vectors.


What is code generation like for these then? if I remeber correctly it
was the vdup_n_f16 implementation that looked most objectionable before.

> I'd welcome thoughts/opinions on what testcase would be appropriate.
> Correctness of all the intrinsics is already tested by the advsimd-intrinsics
> testsuite, and the only way I can see to verify code generation, is to
> scan-assembler looking for particular instructions; do we wish to see more
> scan-assembler tests?

I think these are fine without a test case, as you say corectness is
already handled elsewhere.

> Bootstrapped + check-gcc on aarch64-none-linux-gnu.

OK,

Thanks,
James

> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>,
>       aarch64_dup_lane<mode>, aarch64_dup_lane_<vswap_width_name><mode>,
>       aarch64_simd_vec_set<mode>, vec_set<mode>, vec_perm_const<mode>,
>       vec_init<mode>, *aarch64_simd_ld1r<mode>, vec_extract<mode>): Add
>       V4HF and V8HF variants to iterator.
> 
>       * config/aarch64/aarch64.c (aarch64_evpc_dup): Add V4HF and V8HF cases.
> 
>       * config/aarch64/iterators.md (VDQF_F16): New.
>       (VSWAP_WIDTH, vswap_width_name): Add V4HF and V8HF cases.

Re: [PATCH][AArch64] Improve code generation for float16 vector code

Reply via email to