On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote: > On 04/09/15 13:32, James Greenhalgh wrote: > > In that case, these should be implemented as inline assembly blocks. As it > > stands, the code generation for these intrinsics will be very poor with this > > patch applied. > > > > I'm going to hold off OKing this until I see a follow-up to fix the code > > generation, either replacing those particular intrinsics with inline asm, > > or doing the more comprehensive fix in the back-end. > > > > Thanks, > > James > > In that case, here is the follow-up now ;). This fixes each of the following > functions to generate a single instruction followed by ret: > * vld1_dup_f16, vld1q_dup_f16 > * vset_lane_f16, vsetq_lane_f16 > * vget_lane_f16, vgetq_lane_f16 > * For IN of type either float16x4_t or float16x8_t, and constant C: > return (float16x4_t) {in[C], in[C], in[C], in[C]}; > * Similarly, > return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], in[C]}; > (These correspond intuitively to what one might expect for "vdup_lane_f16", > "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics, > although such intrinsics do not actually exist.) > > This patch does not deal with equivalents to vdup_n_s16 and other intrinsics > that load immediates, rather than using elements of pre-existing vectors.
What is code generation like for these then? if I remeber correctly it was the vdup_n_f16 implementation that looked most objectionable before. > I'd welcome thoughts/opinions on what testcase would be appropriate. > Correctness of all the intrinsics is already tested by the advsimd-intrinsics > testsuite, and the only way I can see to verify code generation, is to > scan-assembler looking for particular instructions; do we wish to see more > scan-assembler tests? I think these are fine without a test case, as you say corectness is already handled elsewhere. > Bootstrapped + check-gcc on aarch64-none-linux-gnu. OK, Thanks, James > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>, > aarch64_dup_lane<mode>, aarch64_dup_lane_<vswap_width_name><mode>, > aarch64_simd_vec_set<mode>, vec_set<mode>, vec_perm_const<mode>, > vec_init<mode>, *aarch64_simd_ld1r<mode>, vec_extract<mode>): Add > V4HF and V8HF variants to iterator. > > * config/aarch64/aarch64.c (aarch64_evpc_dup): Add V4HF and V8HF cases. > > * config/aarch64/iterators.md (VDQF_F16): New. > (VSWAP_WIDTH, vswap_width_name): Add V4HF and V8HF cases.