> By using xmm# you're not taking into account any x86inc SWAPing, so this is
> using xmm0 and xmm1 where the single scalar float input arguments reside (at
> least on unix64), instead of xm0 and xm1 (xmm16 and xmm17) where the
> broadcasted scalars were stored.
> This, again, only worked by chance on unix64 because you're using scalar 
> fmadd,
> and shouldn't work at all on win64.
> 
> Also, all these as is are being encoded as VEX, not EVEX, but it should be 
> fine
> leaving them untouched instead of using xm#, since they will be shorter (five
> bytes instead of six for some) by using the lower, non callee-saved regs.

Thanks for the help. I'm not familiar with WIN64 asm. So what I need to do is 
change the WIN64 swap from:
SWAP xmm0, xmm2
SWAP xmm1, xmm3
To:
VBROADCASTSS m0, xmm2
VBROADCASTSS m1, xmm3

Is that correct?

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with 
subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to