https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- This is a target issue dealing with how uint64_t ->float/double conversions are done. On aarch64 for cvt_f64_std we get good code at -O3: cvt_f64_std(std::array<double, 16ul>&, std::array<unsigned long, 16ul> const&): ldp q7, q6, [x1] ldp q5, q4, [x1, 32] ldp q3, q2, [x1, 64] ldp q1, q0, [x1, 96] ucvtf v7.2d, v7.2d ucvtf v6.2d, v6.2d ucvtf v5.2d, v5.2d ucvtf v4.2d, v4.2d ucvtf v3.2d, v3.2d ucvtf v2.2d, v2.2d stp q7, q6, [x0] ucvtf v1.2d, v1.2d stp q5, q4, [x0, 32] ucvtf v0.2d, v0.2d stp q3, q2, [x0, 64] stp q1, q0, [x0, 96] ret The other function is: cvt_f32_std(std::array<float, 16ul>&, std::array<unsigned long, 16ul> const&): ldp x3, x2, [x1] ucvtf s7, x2 ucvtf s3, x3 ldp x3, x2, [x1, 32] ins v3.s[1], v7.s[0] ucvtf s6, x2 ucvtf s2, x3 ldp x3, x2, [x1, 64] ins v2.s[1], v6.s[0] ucvtf s5, x2 ucvtf s1, x3 ldp x3, x2, [x1, 96] ins v1.s[1], v5.s[0] ucvtf s4, x2 ucvtf s0, x3 ldr x2, [x1, 48] ldr x3, [x1, 16] ucvtf s17, x2 ldr x2, [x1, 112] ucvtf s18, x3 ldr x3, [x1, 80] ins v0.s[1], v4.s[0] ucvtf s4, x2 ucvtf s16, x3 ldr x2, [x1, 24] ldr x3, [x1, 56] ucvtf s7, x2 ldr x2, [x1, 88] ucvtf s6, x3 ldr x1, [x1, 120] ucvtf s5, x2 ins v3.s[2], v18.s[0] ins v2.s[2], v17.s[0] ins v1.s[2], v16.s[0] ins v0.s[2], v4.s[0] ucvtf s4, x1 ins v3.s[3], v7.s[0] ins v2.s[3], v6.s[0] ins v1.s[3], v5.s[0] ins v0.s[3], v4.s[0] stp q3, q2, [x0] stp q1, q0, [x0, 32] ret