[PATCH] D72114: [MS] Overhaul how clang passes overaligned args on x86_32

Anton Yudintsev via Phabricator via cfe-commits Mon, 06 Jan 2020 23:40:42 -0800

AntonYudintsev added inline comments.


================
Comment at: clang/test/CodeGen/x86_32-arguments-win32.c:77
+// CHECK-LABEL: define dso_local void @receive_vec_256(<8 x float> inreg %x, 
<8 x float> inreg %y, <8 x float> inreg %z, <8 x float>* %0, <8 x float>* %1)
+// CHECK-LABEL: define dso_local void @receive_vec_512(<16 x float> inreg %x, 
<16 x float> inreg %y, <16 x float> inreg %z, <16 x float>* %0, <16 x float>* 
%1)
+// CHECK-LABEL: define dso_local void @receive_vec_1024(<32 x float>* %0, <32 
x float>* %1, <32 x float>* %2, <32 x float>* %3, <32 x float>* %4)
----------------
rjmccall wrote:
> rnk wrote:
> > rjmccall wrote:
> > > rnk wrote:
> > > > craig.topper wrote:
> > > > > What happens in the backend with inreg if 512-bit vectors aren't 
> > > > > legal?
> > > > LLVM splits the vector up using the largest legal vector size. As many 
> > > > pieces as possible are passed in available XMM/YMM registers, and the 
> > > > rest are passed in memory. MSVC, of course, assumes the user wanted the 
> > > > larger vector size, and uses whatever vector instructions it needs to 
> > > > move the arguments around.
> > > > 
> > > > Previously, I advocated for a model where calling an Intel intrinsic 
> > > > function had the effect of implicitly marking the caller with the 
> > > > target attributes of the intrinsic. This falls down if the user tries 
> > > > to write a single function that conditionally branches between code 
> > > > that uses different instruction set extensions. You can imagine the 
> > > > SSE2 codepath accidentally using AVX instructions because the compiler 
> > > > thinks they are better. I'm told that ICC models CPU 
> > > > micro-architectural features in the CFG, but I don't ever expect that 
> > > > LLVM will do that. If we're stuck with per-function CPU feature 
> > > > settings, it seems like it would be nice to try to do what the user 
> > > > asked by default, and warn the user if we see them doing a cpuid check 
> > > > in a function that has been implicitly blessed with some target 
> > > > attributes. You could imagine doing a similar thing when large vector 
> > > > type variables are used: if a large vector argument or local is used, 
> > > > implicitly enable the appropriate target features to move vectors of 
> > > > that size around.
> > > > 
> > > > This idea didn't get anywhere, and the current situation has persisted.
> > > > 
> > > > ---
> > > > 
> > > > You know, maybe we should just keep clang the way it is, and just set 
> > > > up a warning in the backend that says "hey, I split your large vector. 
> > > > You probably didn't want that." And then we just continue doing what we 
> > > > do now. Nobody likes backend warnings, but it seems better than the 
> > > > current direction of the frontend knowing every detail of x86 vector 
> > > > extensions.
> > > If target attributes affect ABI, it seems really dangerous to implicitly 
> > > set attributes based on what intrinsics are called.
> > > 
> > > The local CPU-testing problem seems similar to the problems with local 
> > > `#pragma STDC FENV_ACCESS` blocks that the constrained-FP people are 
> > > looking into.  They both have a "this operation is normally fully 
> > > optimizable, but we might need to be more careful in specific functions" 
> > > aspect to them.  I wonder if there's a reasonable way to unify the 
> > > approaches, or at least benefit from lessons learned.
> > I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody 
> > actually want the behavior that LLVM implements today where large vectors 
> > get split across registers and memory? I think most users who pass a 512 
> > bit vector want it to either be passed in ZMM registers or fail to compile. 
> > Why do we even support passing 1024 bit vectors? Could we make that an 
> > error?
> > 
> > Anyway, major redesigns aside, should clang do something when large vectors 
> > are passed? Maybe we should warn here? Passing by address is usually the 
> > safest way to pass something, so that's an option. Implementing splitting 
> > logic in clang doesn't seem worth it.
> > I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody 
> > actually want the behavior that LLVM implements today where large vectors 
> > get split across registers and memory?
> 
> I take it you're implying that the actual (Windows-only?) platform ABI 
> doesn't say anything about this because other compilers don't allow large 
> vectors.  How large are the vectors we do have ABI rules for?  Do they have 
> the problem as the SysV ABI where the ABI rules are sensitive to compiler 
> flags?
> 
> Anyway, I didn't realize the i386 Windows ABI *ever* used registers for 
> arguments.  (Whether you can convince LLVM to do so  for a function signature 
> that Clang isn't supposed to emit for ABI-conforming functions is a separate 
> story.)   You're saying it uses them for vectors?  Presumably up to some 
> limit, and only when they're non-variadic arguments?
> You're saying it uses them for vectors? Presumably up to some limit, and only 
> when they're non-variadic arguments?

Sorry to cut in (I am the one who report the issue, and so looking forward for 
this patch to be merged).
Yes, MSVC/x86 win ABI uses three registers for first three arguments.

https://godbolt.org/z/PZ3dBa


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72114/new/

https://reviews.llvm.org/D72114



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72114: [MS] Overhaul how clang passes overaligned args on x86_32

Reply via email to