> By using xmm# you're not taking into account any x86inc SWAPing, so this is > using xmm0 and xmm1 where the single scalar float input arguments reside (at > least on unix64), instead of xm0 and xm1 (xmm16 and xmm17) where the > broadcasted scalars were stored. > This, again, only worked by chance on unix64 because you're using scalar > fmadd, > and shouldn't work at all on win64. > > Also, all these as is are being encoded as VEX, not EVEX, but it should be > fine > leaving them untouched instead of using xm#, since they will be shorter (five > bytes instead of six for some) by using the lower, non callee-saved regs.
Thanks for the help. I'm not familiar with WIN64 asm. So what I need to do is change the WIN64 swap from: SWAP xmm0, xmm2 SWAP xmm1, xmm3 To: VBROADCASTSS m0, xmm2 VBROADCASTSS m1, xmm3 Is that correct? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".