from:"Andrea Di Biagio via cfe\-commits"

r330408 - Fix -Wunused-variable warnings after r330377.

2018-04-20 Thread Andrea Di Biagio via cfe-commits

Author: adibiagio
Date: Fri Apr 20 02:47:03 2018
New Revision: 330408

URL: http://llvm.org/viewvc/llvm-project?rev=330408&view=rev
Log:
Fix -Wunused-variable warnings after r330377.

Modified:
cfe/trunk/lib/Analysis/ConstructionContext.cpp

Modified: cfe/trunk/lib/Analysis/ConstructionContext.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Analysis/ConstructionContext.cpp?rev=330408&r1=330407&r2=330408&view=diff
==
--- cfe/trunk/lib/Analysis/ConstructionContext.cpp (original)
+++ cfe/trunk/lib/Analysis/ConstructionContext.cpp Fri Apr 20 02:47:03 2018
@@ -80,7 +80,7 @@ const ConstructionContext *ConstructionC
   return create(C, BTE, MTE);
 }
 // This is a constructor into a function argument. Not implemented yet.
-if (auto *CE = dyn_cast(ParentLayer->getTriggerStmt()))
+if (isa(ParentLayer->getTriggerStmt()))
   return nullptr;
 // This is C++17 copy-elided construction into return statement.
 if (auto *RS = dyn_cast(ParentLayer->getTriggerStmt())) {
@@ -118,7 +118,7 @@ const ConstructionContext *ConstructionC
   return create(C, RS);
 }
 // This is a constructor into a function argument. Not implemented yet.
-if (auto *CE = dyn_cast(TopLayer->getTriggerStmt()))
+if (isa(TopLayer->getTriggerStmt()))
   return nullptr;
 llvm_unreachable("Unexpected construction context with statement!");
   } else if (const CXXCtorInitializer *I = TopLayer->getTriggerInit()) {


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

r328968 - Fix unused variable warning introduced at revision 328910.

2018-04-02 Thread Andrea Di Biagio via cfe-commits

Author: adibiagio
Date: Mon Apr  2 05:04:37 2018
New Revision: 328968

URL: http://llvm.org/viewvc/llvm-project?rev=328968&view=rev
Log:
Fix unused variable warning introduced at revision 328910.

Modified:
cfe/trunk/lib/Analysis/LiveVariables.cpp

Modified: cfe/trunk/lib/Analysis/LiveVariables.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Analysis/LiveVariables.cpp?rev=328968&r1=328967&r2=328968&view=diff
==
--- cfe/trunk/lib/Analysis/LiveVariables.cpp (original)
+++ cfe/trunk/lib/Analysis/LiveVariables.cpp Mon Apr  2 05:04:37 2018
@@ -383,8 +383,7 @@ void TransferFunctions::VisitDeclRefExpr
   bool InAssignment = LV.inAssignment[DR];
   if (const auto *BD = dyn_cast(D)) {
 if (!InAssignment)
-  val.liveBindings =
-  LV.BSetFact.add(val.liveBindings, cast(D));
+  val.liveBindings = LV.BSetFact.add(val.liveBindings, BD);
   } else if (const auto *VD = dyn_cast(D)) {
 if (!InAssignment && !isAlwaysAlive(VD))
   val.liveDecls = LV.DSetFact.add(val.liveDecls, VD);


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D26335: [ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291

2016-11-09 Thread Andrea Di Biagio via cfe-commits

andreadb added inline comments.

Comment at: lib/Headers/x86intrin.h:49
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))
+__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; }
+#ifdef __x86_64__

hans wrote:
> probinson wrote:
> > hans wrote:
> > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a 
> > > faster encoding for BSF, but now we're putting a conditional in the way, 
> > > so this will be slower actually. On the other hand, the alternative is 
> > > weird too :-/
> > I thought there was a peephole to notice a guard like this and do the right 
> > thing? In which case having the guard is fine.
> But for non-BMI targets it won't be able to remove the guard (unless it can 
> prove the argument is zero).
> 
> MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg 
> wants.
I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) 
is well defined on zero input. Changing that semantic for non-BMI targets 
sounds a bit odd/confusing to me.

I guess ffmpeg is reliant on the fact that other compilers would either 
preserve the intrinsic call until instruction selection, or fold it to a 
meaningful value (i.e. 32 in this case) when the input is known to be zero. If 
we remove the zero check from tzcnt_u32 (for non BMI targets), we let the 
optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when 
the input is zero.

As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h. 
Maybe we should do something similar?

https://reviews.llvm.org/D26335

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

r249142 - Make test more resilient to FastIsel changes. NFC.

2015-10-02 Thread Andrea Di Biagio via cfe-commits

Author: adibiagio
Date: Fri Oct  2 10:10:22 2015
New Revision: 249142

URL: http://llvm.org/viewvc/llvm-project?rev=249142&view=rev
Log:
Make test more resilient to FastIsel changes. NFC.

Currently FastISel doesn't know how to select vector bitcasts.
During instruction selection, fast-isel always falls back to SelectionDAG 
every time it encounters a vector bitcast.
As a consequence of this, all the 'packed vector shift by immedate count'
test cases in avx2-builtins.c are optimized by the DAGCombiner.
In particular, the DAGCombiner would always fold trivial stack loads of
constant shift counts into the operands of packed shift builtins.

This behavior would start changing as soon as I reapply revision 249121.
That revision would teach x86 fast-isel how to select bitcasts between vector
types of the same size.

As a consequence of that change, fast-isel would less often fall back to
SelectionDAG. More importantly, DAGCombiner would no longer be able to 
simplify the code by folding the stack reload of a constant.

No functional change.

Modified:
cfe/trunk/test/CodeGen/avx2-builtins.c

Modified: cfe/trunk/test/CodeGen/avx2-builtins.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/avx2-builtins.c?rev=249142&r1=249141&r2=249142&view=diff
==
--- cfe/trunk/test/CodeGen/avx2-builtins.c (original)
+++ cfe/trunk/test/CodeGen/avx2-builtins.c Fri Oct  2 10:10:22 2015
@@ -574,7 +574,7 @@ __m256i test_mm256_bslli_epi128(__m256i
 
 __m256i test_mm256_slli_epi16(__m256i a) {
   // CHECK: @llvm.x86.avx2.pslli.w
-  // CHECK-ASM: vpsllw $3, %ymm{{.*}}, %ymm{{.*}}
+  // CHECK-ASM: vpsllw {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}}
   return _mm256_slli_epi16(a, 3);
 }
 
@@ -586,7 +586,7 @@ __m256i test_mm256_sll_epi16(__m256i a,
 
 __m256i test_mm256_slli_epi32(__m256i a) {
   // CHECK: @llvm.x86.avx2.pslli.d
-  // CHECK-ASM: vpslld $3, %ymm{{.*}}, %ymm{{.*}}
+  // CHECK-ASM: vpslld {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}}
   return _mm256_slli_epi32(a, 3);
 }
 
@@ -610,7 +610,7 @@ __m256i test_mm256_sll_epi64(__m256i a,
 
 __m256i test_mm256_srai_epi16(__m256i a) {
   // CHECK: @llvm.x86.avx2.psrai.w
-  // CHECK-ASM: vpsraw $3, %ymm{{.*}}, %ymm{{.*}}
+  // CHECK-ASM: vpsraw {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}}
   return _mm256_srai_epi16(a, 3);
 }
 
@@ -622,7 +622,7 @@ __m256i test_mm256_sra_epi16(__m256i a,
 
 __m256i test_mm256_srai_epi32(__m256i a) {
   // CHECK: @llvm.x86.avx2.psrai.d
-  // CHECK-ASM: vpsrad $3, %ymm{{.*}}, %ymm{{.*}}
+  // CHECK-ASM: vpsrad {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}}
   return _mm256_srai_epi32(a, 3);
 }
 
@@ -646,7 +646,7 @@ __m256i test_mm256_bsrli_epi128(__m256i
 
 __m256i test_mm256_srli_epi16(__m256i a) {
   // CHECK: @llvm.x86.avx2.psrli.w
-  // CHECK-ASM: vpsrlw $3, %ymm{{.*}}, %ymm{{.*}}
+  // CHECK-ASM: vpsrlw {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}}
   return _mm256_srli_epi16(a, 3);
 }
 
@@ -658,7 +658,7 @@ __m256i test_mm256_srl_epi16(__m256i a,
 
 __m256i test_mm256_srli_epi32(__m256i a) {
   // CHECK: @llvm.x86.avx2.psrli.d
-  // CHECK-ASM: vpsrld $3, %ymm{{.*}}, %ymm{{.*}}
+  // CHECK-ASM: vpsrld {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}}
   return _mm256_srli_epi32(a, 3);
 }
 


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D13861: [x86] fix wrong maskload/store intrinsic definitions in avxintrin.h (follow up of D13776).

2015-10-19 Thread Andrea Di Biagio via cfe-commits

andreadb created this revision.
andreadb added reviewers: bruno, mkuper, delena, qcolombet.
andreadb added a subscriber: cfe-commits.

Hi,

This patch is a follow up of D13776.
According to the Intel documentation, the mask operand of a maskload and 
maskstore intrinsics is always a vector of packed integer/long integer values.
This patch introduces the following two changes:
 1) It fixes the avx maskload/store intrinsic definitions in avxintrin.h;
 2) It changes BuiltinsX86.def to match the correct gcc definitions for avx 
maskload/store (see D13776 for more details).

Please let me know if okay to submit.

Thanks,
Andrea

http://reviews.llvm.org/D13861

Files:
  include/clang/Basic/BuiltinsX86.def
  lib/Headers/avxintrin.h
  test/CodeGen/builtins-x86.c

Index: test/CodeGen/builtins-x86.c
===
--- test/CodeGen/builtins-x86.c
+++ test/CodeGen/builtins-x86.c
@@ -465,14 +465,14 @@
   __builtin_ia32_movntdq256(tmp_V4LLip, tmp_V4LLi);
   __builtin_ia32_movntpd256(tmp_dp, tmp_V4d);
   __builtin_ia32_movntps256(tmp_fp, tmp_V8f);
-  tmp_V2d = __builtin_ia32_maskloadpd(tmp_V2dCp, tmp_V2d);
-  tmp_V4f = __builtin_ia32_maskloadps(tmp_V4fCp, tmp_V4f);
-  tmp_V4d = __builtin_ia32_maskloadpd256(tmp_V4dCp, tmp_V4d);
-  tmp_V8f = __builtin_ia32_maskloadps256(tmp_V8fCp, tmp_V8f);
-  __builtin_ia32_maskstorepd(tmp_V2dp, tmp_V2d, tmp_V2d);
-  __builtin_ia32_maskstoreps(tmp_V4fp, tmp_V4f, tmp_V4f);
-  __builtin_ia32_maskstorepd256(tmp_V4dp, tmp_V4d, tmp_V4d);
-  __builtin_ia32_maskstoreps256(tmp_V8fp, tmp_V8f, tmp_V8f);
+  tmp_V2d = __builtin_ia32_maskloadpd(tmp_V2dCp, tmp_V2LLi);
+  tmp_V4f = __builtin_ia32_maskloadps(tmp_V4fCp, tmp_V4i);
+  tmp_V4d = __builtin_ia32_maskloadpd256(tmp_V4dCp, tmp_V4LLi);
+  tmp_V8f = __builtin_ia32_maskloadps256(tmp_V8fCp, tmp_V8i);
+  __builtin_ia32_maskstorepd(tmp_V2dp, tmp_V2LLi, tmp_V2d);
+  __builtin_ia32_maskstoreps(tmp_V4fp, tmp_V4i, tmp_V4f);
+  __builtin_ia32_maskstorepd256(tmp_V4dp, tmp_V4LLi, tmp_V4d);
+  __builtin_ia32_maskstoreps256(tmp_V8fp, tmp_V8i, tmp_V8f);
 
 #ifdef USE_3DNOW
   tmp_V8c = __builtin_ia32_pavgusb(tmp_V8c, tmp_V8c);
Index: lib/Headers/avxintrin.h
===
--- lib/Headers/avxintrin.h
+++ lib/Headers/avxintrin.h
@@ -835,53 +835,53 @@
 
 /* Conditional load ops */
 static __inline __m128d __DEFAULT_FN_ATTRS
-_mm_maskload_pd(double const *__p, __m128d __m)
+_mm_maskload_pd(double const *__p, __m128i __m)
 {
-  return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2df)__m);
+  return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2di)__m);
 }
 
 static __inline __m256d __DEFAULT_FN_ATTRS
-_mm256_maskload_pd(double const *__p, __m256d __m)
+_mm256_maskload_pd(double const *__p, __m256i __m)
 {
   return (__m256d)__builtin_ia32_maskloadpd256((const __v4df *)__p,
-   (__v4df)__m);
+   (__v4di)__m);
 }
 
 static __inline __m128 __DEFAULT_FN_ATTRS
-_mm_maskload_ps(float const *__p, __m128 __m)
+_mm_maskload_ps(float const *__p, __m128i __m)
 {
-  return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4sf)__m);
+  return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4si)__m);
 }
 
 static __inline __m256 __DEFAULT_FN_ATTRS
-_mm256_maskload_ps(float const *__p, __m256 __m)
+_mm256_maskload_ps(float const *__p, __m256i __m)
 {
-  return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, (__v8sf)__m);
+  return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, (__v8si)__m);
 }
 
 /* Conditional store ops */
 static __inline void __DEFAULT_FN_ATTRS
-_mm256_maskstore_ps(float *__p, __m256 __m, __m256 __a)
+_mm256_maskstore_ps(float *__p, __m256i __m, __m256 __a)
 {
-  __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8sf)__m, (__v8sf)__a);
+  __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8si)__m, (__v8sf)__a);
 }
 
 static __inline void __DEFAULT_FN_ATTRS
-_mm_maskstore_pd(double *__p, __m128d __m, __m128d __a)
+_mm_maskstore_pd(double *__p, __m128i __m, __m128d __a)
 {
-  __builtin_ia32_maskstorepd((__v2df *)__p, (__v2df)__m, (__v2df)__a);
+  __builtin_ia32_maskstorepd((__v2df *)__p, (__v2di)__m, (__v2df)__a);
 }
 
 static __inline void __DEFAULT_FN_ATTRS
-_mm256_maskstore_pd(double *__p, __m256d __m, __m256d __a)
+_mm256_maskstore_pd(double *__p, __m256i __m, __m256d __a)
 {
-  __builtin_ia32_maskstorepd256((__v4df *)__p, (__v4df)__m, (__v4df)__a);
+  __builtin_ia32_maskstorepd256((__v4df *)__p, (__v4di)__m, (__v4df)__a);
 }
 
 static __inline void __DEFAULT_FN_ATTRS
-_mm_maskstore_ps(float *__p, __m128 __m, __m128 __a)
+_mm_maskstore_ps(float *__p, __m128i __m, __m128 __a)
 {
-  __builtin_ia32_maskstoreps((__v4sf *)__p, (__v4sf)__m, (__v4sf)__a);
+  __builtin_ia32_maskstoreps((__v4sf *)__p, (__v4si)__m, (__v4sf)__a);
 }
 
 /* Cacheability support ops */
Index: include/clang/Basic/BuiltinsX86.def
==

r250816 - [x86] Fix maskload/store intrinsic definitions in avxintrin.h

2015-10-20 Thread Andrea Di Biagio via cfe-commits

Author: adibiagio
Date: Tue Oct 20 06:19:54 2015
New Revision: 250816

URL: http://llvm.org/viewvc/llvm-project?rev=250816&view=rev
Log:
[x86] Fix maskload/store intrinsic definitions in avxintrin.h

According to the Intel documentation, the mask operand of a maskload and
maskstore intrinsics is always a vector of packed integer/long integer values.
This patch introduces the following two changes:
 1. It fixes the avx maskload/store intrinsic definitions in avxintrin.h.
 2. It changes BuiltinsX86.def to match the correct gcc definitions for avx
maskload/store (see D13861 for more details).

Differential Revision: http://reviews.llvm.org/D13861

Modified:
cfe/trunk/include/clang/Basic/BuiltinsX86.def
cfe/trunk/lib/Headers/avxintrin.h
cfe/trunk/test/CodeGen/builtins-x86.c

Modified: cfe/trunk/include/clang/Basic/BuiltinsX86.def
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/BuiltinsX86.def?rev=250816&r1=250815&r2=250816&view=diff
==
--- cfe/trunk/include/clang/Basic/BuiltinsX86.def (original)
+++ cfe/trunk/include/clang/Basic/BuiltinsX86.def Tue Oct 20 06:19:54 2015
@@ -503,14 +503,14 @@ TARGET_BUILTIN(__builtin_ia32_lddqu256,
 TARGET_BUILTIN(__builtin_ia32_movntdq256, "vV4LLi*V4LLi", "", "avx")
 TARGET_BUILTIN(__builtin_ia32_movntpd256, "vd*V4d", "", "avx")
 TARGET_BUILTIN(__builtin_ia32_movntps256, "vf*V8f", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskloadpd, "V2dV2dC*V2d", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskloadps, "V4fV4fC*V4f", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskloadpd256, "V4dV4dC*V4d", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskloadps256, "V8fV8fC*V8f", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskstorepd, "vV2d*V2dV2d", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskstoreps, "vV4f*V4fV4f", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskstorepd256, "vV4d*V4dV4d", "", "avx")
-TARGET_BUILTIN(__builtin_ia32_maskstoreps256, "vV8f*V8fV8f", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskloadpd, "V2dV2dC*V2LLi", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskloadps, "V4fV4fC*V4i", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskloadpd256, "V4dV4dC*V4LLi", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskloadps256, "V8fV8fC*V8i", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskstorepd, "vV2d*V2LLiV2d", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskstoreps, "vV4f*V4iV4f", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskstorepd256, "vV4d*V4LLiV4d", "", "avx")
+TARGET_BUILTIN(__builtin_ia32_maskstoreps256, "vV8f*V8iV8f", "", "avx")
 
 // AVX2
 TARGET_BUILTIN(__builtin_ia32_mpsadbw256, "V32cV32cV32cIc", "", "avx2")

Modified: cfe/trunk/lib/Headers/avxintrin.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/avxintrin.h?rev=250816&r1=250815&r2=250816&view=diff
==
--- cfe/trunk/lib/Headers/avxintrin.h (original)
+++ cfe/trunk/lib/Headers/avxintrin.h Tue Oct 20 06:19:54 2015
@@ -835,53 +835,53 @@ _mm256_storeu_si256(__m256i *__p, __m256
 
 /* Conditional load ops */
 static __inline __m128d __DEFAULT_FN_ATTRS
-_mm_maskload_pd(double const *__p, __m128d __m)
+_mm_maskload_pd(double const *__p, __m128i __m)
 {
-  return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2df)__m);
+  return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2di)__m);
 }
 
 static __inline __m256d __DEFAULT_FN_ATTRS
-_mm256_maskload_pd(double const *__p, __m256d __m)
+_mm256_maskload_pd(double const *__p, __m256i __m)
 {
   return (__m256d)__builtin_ia32_maskloadpd256((const __v4df *)__p,
-   (__v4df)__m);
+   (__v4di)__m);
 }
 
 static __inline __m128 __DEFAULT_FN_ATTRS
-_mm_maskload_ps(float const *__p, __m128 __m)
+_mm_maskload_ps(float const *__p, __m128i __m)
 {
-  return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4sf)__m);
+  return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4si)__m);
 }
 
 static __inline __m256 __DEFAULT_FN_ATTRS
-_mm256_maskload_ps(float const *__p, __m256 __m)
+_mm256_maskload_ps(float const *__p, __m256i __m)
 {
-  return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, 
(__v8sf)__m);
+  return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, 
(__v8si)__m);
 }
 
 /* Conditional store ops */
 static __inline void __DEFAULT_FN_ATTRS
-_mm256_maskstore_ps(float *__p, __m256 __m, __m256 __a)
+_mm256_maskstore_ps(float *__p, __m256i __m, __m256 __a)
 {
-  __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8sf)__m, (__v8sf)__a);
+  __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8si)__m, (__v8sf)__a);
 }
 
 static __inline void __DEFAULT_FN_ATTRS
-_mm_maskstore_pd(double *__p, __m128d __m, __m128d __a)
+_mm_maskstore_pd(double *__p, __m128i __m, __m128d __a)
 {
-  __builtin_ia32_maskstorepd((__v2df *)__p, (__v2df)__m, (__v2df)__a);
+  __builtin_

Re: [PATCH] D13861: [x86] fix wrong maskload/store intrinsic definitions in avxintrin.h (follow up of D13776).

2015-10-20 Thread Andrea Di Biagio via cfe-commits

andreadb added a comment.

In http://reviews.llvm.org/D13861#270811, @delena wrote:

> LGTM


Thanks Elena!
Committed revision250816.


Repository:
  rL LLVM

http://reviews.llvm.org/D13861



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

2016-07-25 Thread Andrea Di Biagio via cfe-commits

Hi Simon (and all),

I noticed that this patch changes the definition of intrinsic _mm_cvtsd2_ss
in emmintrin.h. Is that intentional? My understanding is that your patch
should have only addressed float-to-integer conversions.

Was this change to _mm_cvtsd_ss motivated by the fact that (V)CVTSD2SS
depends on the rounding mode (control bits in the MXCSR register) for
inexact conversions? That would explain why the LLVM part (r275981) also
added a codegen test for that double-to-float conversion (I guess that none
of the reviewer spotted that extra test).

The problem I am seeing is that your change is causing a crash in the
backend (at -O1 and above). See the reproducible below:

/
#include 

__m128 test(__m128 a, const __m128d &b) {
  return _mm_cvtsd_ss(a, b);
}
/

Alternatively, here is the LLVM IR:

define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture readonly
%b) {
entry:
  %0 = load <2 x double>, <2 x double>* %b, align 16
  %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2 x
double> %0)
  ret <4 x float> %1
}

; Function Attrs: nounwind readnone
declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)

With your patch, we now always generate a call to @llvm.x86.sse2.cvtsd2ss
when expanding the builtin call from _mm_cvtsd_ss.

ISel would then select `Int_CVTSD2SSrm` instruction, which however is
`CodeGenOnly`. Unfortunately that pseudo is never further expanded/replaced
before compilation reaches the object emitter stage. So, before we execute
Pass 'X86 Assembly/Object Emitter' we see machine code like this:

BB#0: derived from LLVM BB %entry
Live Ins: %RDI %XMM0
%XMM0 = Int_VCVTSD2SSrm %XMM0, %RDI, 1, %noreg, 0,
%noreg; mem:LD16[%b]
RETQ %XMM0

.. which then causes  the following assertion failure:

[2016-07-25 13:25:11.830351700] 0x7bad5f87e0 Executing Pass 'X86
Assembly / Object Emitter' on Function 'test'...
Cannot encode all operands of:   >

Overall, I agree that the compiler shouldn't make assumptions on the
rounding mode when coverting from double-to-float. However, the RM variant
of Int_VCVTSD2SS doesn't seem to be correctly handled/expanded by the
backend.

Can we revert the change to the double-to-single convert? Alternatively,
can we fix the codegen issue exposed by this change (and add extra test
coverage)?
My opinion is that the change to the double-to-float conversion intrinsic
should have been part of a separate patch.

Thanks,
Andrea

On Fri, Jul 22, 2016 at 2:18 PM, Hans Wennborg via cfe-commits <
cfe-commits@lists.llvm.org> wrote:

> On Thu, Jul 21, 2016 at 6:34 PM, Robinson, Paul via cfe-commits
>  wrote:
> >
> >
> >> -Original Message-
> >> From: cfe-commits [mailto:cfe-commits-boun...@lists.llvm.org] On
> Behalf Of
> >> Simon Pilgrim via cfe-commits
> >> Sent: Wednesday, July 20, 2016 3:18 AM
> >> To: cfe-commits@lists.llvm.org
> >> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion
> intrinsics
> >> instead of using generic IR
> >>
> >> Author: rksimon
> >> Date: Wed Jul 20 05:18:01 2016
> >> New Revision: 276102
> >>
> >> URL: http://llvm.org/viewvc/llvm-project?rev=276102&view=rev
> >> Log:
> >> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using
> >> generic IR
> >>
> >> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and
> VCVTTPD2DQ
> >> truncating conversions with generic IR instead.
> >>
> >> It turns out that the behaviour of these intrinsics is different enough
> >> from generic IR that this will cause problems, INF/NAN/out of range
> values
> >> are guaranteed to result in a 0x8000 value - which plays havoc with
> >> constant folding which converts them to either zero or UNDEF. This is
> also
> >> an issue with the scalar implementations (which were already generic IR
> >> and what I was trying to match).
> >
> > Are the problems enough that this should be merged to the 3.9 release
> branch?
> > --paulr
>
> IIUC, this is the Clang-side of r275981, and if we merge that this
> should probably be merged too.
>
> Thanks,
> Hans
> ___
> cfe-commits mailing list
> cfe-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

r330408 - Fix -Wunused-variable warnings after r330377.

r328968 - Fix unused variable warning introduced at revision 328910.

[PATCH] D26335: [ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291

r249142 - Make test more resilient to FastIsel changes. NFC.

[PATCH] D13861: [x86] fix wrong maskload/store intrinsic definitions in avxintrin.h (follow up of D13776).

r250816 - [x86] Fix maskload/store intrinsic definitions in avxintrin.h

Re: [PATCH] D13861: [x86] fix wrong maskload/store intrinsic definitions in avxintrin.h (follow up of D13776).

Re: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

8 matches

Site Navigation

Mail list logo

Footer information