On Sunday, 7 March 2021 at 22:54:32 UTC, tsbockman wrote:
import std.meta : Repeat;
void euclideanDistanceFixedSizeArray(V)(ref Repeat!(3, const(V)) a, ref Repeat!(3, const(V)) b, out V result)
    if(is(V : __vector(float[length]), size_t length))
...

Resulting asm with is(V == __vector(float[16])):

.LCPI1_0:
        .long   0x7fc00000
pure nothrow @nogc void app.euclideanDistanceFixedSizeArray!(__vector(float[16])).euclideanDistanceFixedSizeArray(ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), out __vector(float[16])):
        mov     rax, qword ptr [rsp + 8]
        vbroadcastss    zmm0, dword ptr [rip + .LCPI1_0]
...

Apparently the optimizer is too stupid to skip the redundant float.nan broadcast when result is an `out` parameter, so just make it `ref V result` instead for better code gen:

pure nothrow @nogc void app.euclideanDistanceFixedSizeArray!(__vector(float[16])).euclideanDistanceFixedSizeArray(ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref __vector(float[16])):
        mov     rax, qword ptr [rsp + 8]
        vmovaps zmm0, zmmword ptr [rax]
        vmovaps zmm1, zmmword ptr [r9]
        vmovaps zmm2, zmmword ptr [r8]
        vsubps  zmm0, zmm0, zmmword ptr [rcx]
        vmulps  zmm0, zmm0, zmm0
        vsubps  zmm1, zmm1, zmmword ptr [rdx]
        vsubps  zmm2, zmm2, zmmword ptr [rsi]
        vaddps  zmm0, zmm0, zmm0
        vfmadd231ps     zmm0, zmm1, zmm1
        vfmadd231ps     zmm0, zmm2, zmm2
        vmovaps zmmword ptr [rdi], zmm0
        vsqrtps zmm0, zmm0
        vmovaps zmmword ptr [rdi], zmm0
        vzeroupper
        ret

Reply via email to