Hi Strager,

I think the solution is for dll's delaylib trampoline to
save xmm1 on the stack before calling __delayLoadHelper2.
I made a patch which does this, and it fixes the bug for my
code.

Thanks very much for taking the time to track down the cause
of this problem. and for creating a patch. :-)

See attached patch. I think my patch has two problems:

1. AVX/vmovupd/ymm might not be usable on the target
    machine, but saving just xmm isn't enough. Should we
    perform a CPUID check?

This would only work if the dlltool is run on the same machine,
or same type of machine, as the target machine.  Probably a
safer solution would be to add a new command line option to
select the extended trampoline.  Then it is up to the user to
select the correct trampoline type.

To be really paranoid if the new option is not enabled and
dlltool is running on an x86_64 host, then it could run a
CPUID check and if extended registers are available, issue
a warning message to the user, reminding them of the possible
problem.


2. We store unaligned with vmovupd. Storing aligned with
    vmovapd would be better. I haven't looked into how to
    align ymm registers when storing on the stack.

"Better" as in better performance, yes ?  I think that in this
case safer is more important than faster, so sticking with
unaligned moves should be OK.


I'd love to get this bug fixed so others don't spend two
days debugging assembly code!

Would you be willing to work on the improvements suggested
above and submitting a revised patch ?  The catch here is
that such a patch would need a copyright assignment from
you before we could accept it.  The links below should provide
more details on this.

Cheers
  Nick

https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html#Legally-Significant
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=doc/Copyright/request-assign.future;hb=HEAD


Reply via email to