r330408 - Fix -Wunused-variable warnings after r330377.
Author: adibiagio Date: Fri Apr 20 02:47:03 2018 New Revision: 330408 URL: http://llvm.org/viewvc/llvm-project?rev=330408&view=rev Log: Fix -Wunused-variable warnings after r330377. Modified: cfe/trunk/lib/Analysis/ConstructionContext.cpp Modified: cfe/trunk/lib/Analysis/ConstructionContext.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Analysis/ConstructionContext.cpp?rev=330408&r1=330407&r2=330408&view=diff == --- cfe/trunk/lib/Analysis/ConstructionContext.cpp (original) +++ cfe/trunk/lib/Analysis/ConstructionContext.cpp Fri Apr 20 02:47:03 2018 @@ -80,7 +80,7 @@ const ConstructionContext *ConstructionC return create(C, BTE, MTE); } // This is a constructor into a function argument. Not implemented yet. -if (auto *CE = dyn_cast(ParentLayer->getTriggerStmt())) +if (isa(ParentLayer->getTriggerStmt())) return nullptr; // This is C++17 copy-elided construction into return statement. if (auto *RS = dyn_cast(ParentLayer->getTriggerStmt())) { @@ -118,7 +118,7 @@ const ConstructionContext *ConstructionC return create(C, RS); } // This is a constructor into a function argument. Not implemented yet. -if (auto *CE = dyn_cast(TopLayer->getTriggerStmt())) +if (isa(TopLayer->getTriggerStmt())) return nullptr; llvm_unreachable("Unexpected construction context with statement!"); } else if (const CXXCtorInitializer *I = TopLayer->getTriggerInit()) { ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r328968 - Fix unused variable warning introduced at revision 328910.
Author: adibiagio Date: Mon Apr 2 05:04:37 2018 New Revision: 328968 URL: http://llvm.org/viewvc/llvm-project?rev=328968&view=rev Log: Fix unused variable warning introduced at revision 328910. Modified: cfe/trunk/lib/Analysis/LiveVariables.cpp Modified: cfe/trunk/lib/Analysis/LiveVariables.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Analysis/LiveVariables.cpp?rev=328968&r1=328967&r2=328968&view=diff == --- cfe/trunk/lib/Analysis/LiveVariables.cpp (original) +++ cfe/trunk/lib/Analysis/LiveVariables.cpp Mon Apr 2 05:04:37 2018 @@ -383,8 +383,7 @@ void TransferFunctions::VisitDeclRefExpr bool InAssignment = LV.inAssignment[DR]; if (const auto *BD = dyn_cast(D)) { if (!InAssignment) - val.liveBindings = - LV.BSetFact.add(val.liveBindings, cast(D)); + val.liveBindings = LV.BSetFact.add(val.liveBindings, BD); } else if (const auto *VD = dyn_cast(D)) { if (!InAssignment && !isAlwaysAlive(VD)) val.liveDecls = LV.DSetFact.add(val.liveDecls, VD); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26335: [ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291
andreadb added inline comments. Comment at: lib/Headers/x86intrin.h:49 +static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__)) +__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; } +#ifdef __x86_64__ hans wrote: > probinson wrote: > > hans wrote: > > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a > > > faster encoding for BSF, but now we're putting a conditional in the way, > > > so this will be slower actually. On the other hand, the alternative is > > > weird too :-/ > > I thought there was a peephole to notice a guard like this and do the right > > thing? In which case having the guard is fine. > But for non-BMI targets it won't be able to remove the guard (unless it can > prove the argument is zero). > > MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg > wants. I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) is well defined on zero input. Changing that semantic for non-BMI targets sounds a bit odd/confusing to me. I guess ffmpeg is reliant on the fact that other compilers would either preserve the intrinsic call until instruction selection, or fold it to a meaningful value (i.e. 32 in this case) when the input is known to be zero. If we remove the zero check from tzcnt_u32 (for non BMI targets), we let the optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when the input is zero. As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h. Maybe we should do something similar? https://reviews.llvm.org/D26335 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r249142 - Make test more resilient to FastIsel changes. NFC.
Author: adibiagio Date: Fri Oct 2 10:10:22 2015 New Revision: 249142 URL: http://llvm.org/viewvc/llvm-project?rev=249142&view=rev Log: Make test more resilient to FastIsel changes. NFC. Currently FastISel doesn't know how to select vector bitcasts. During instruction selection, fast-isel always falls back to SelectionDAG every time it encounters a vector bitcast. As a consequence of this, all the 'packed vector shift by immedate count' test cases in avx2-builtins.c are optimized by the DAGCombiner. In particular, the DAGCombiner would always fold trivial stack loads of constant shift counts into the operands of packed shift builtins. This behavior would start changing as soon as I reapply revision 249121. That revision would teach x86 fast-isel how to select bitcasts between vector types of the same size. As a consequence of that change, fast-isel would less often fall back to SelectionDAG. More importantly, DAGCombiner would no longer be able to simplify the code by folding the stack reload of a constant. No functional change. Modified: cfe/trunk/test/CodeGen/avx2-builtins.c Modified: cfe/trunk/test/CodeGen/avx2-builtins.c URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/avx2-builtins.c?rev=249142&r1=249141&r2=249142&view=diff == --- cfe/trunk/test/CodeGen/avx2-builtins.c (original) +++ cfe/trunk/test/CodeGen/avx2-builtins.c Fri Oct 2 10:10:22 2015 @@ -574,7 +574,7 @@ __m256i test_mm256_bslli_epi128(__m256i __m256i test_mm256_slli_epi16(__m256i a) { // CHECK: @llvm.x86.avx2.pslli.w - // CHECK-ASM: vpsllw $3, %ymm{{.*}}, %ymm{{.*}} + // CHECK-ASM: vpsllw {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}} return _mm256_slli_epi16(a, 3); } @@ -586,7 +586,7 @@ __m256i test_mm256_sll_epi16(__m256i a, __m256i test_mm256_slli_epi32(__m256i a) { // CHECK: @llvm.x86.avx2.pslli.d - // CHECK-ASM: vpslld $3, %ymm{{.*}}, %ymm{{.*}} + // CHECK-ASM: vpslld {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}} return _mm256_slli_epi32(a, 3); } @@ -610,7 +610,7 @@ __m256i test_mm256_sll_epi64(__m256i a, __m256i test_mm256_srai_epi16(__m256i a) { // CHECK: @llvm.x86.avx2.psrai.w - // CHECK-ASM: vpsraw $3, %ymm{{.*}}, %ymm{{.*}} + // CHECK-ASM: vpsraw {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}} return _mm256_srai_epi16(a, 3); } @@ -622,7 +622,7 @@ __m256i test_mm256_sra_epi16(__m256i a, __m256i test_mm256_srai_epi32(__m256i a) { // CHECK: @llvm.x86.avx2.psrai.d - // CHECK-ASM: vpsrad $3, %ymm{{.*}}, %ymm{{.*}} + // CHECK-ASM: vpsrad {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}} return _mm256_srai_epi32(a, 3); } @@ -646,7 +646,7 @@ __m256i test_mm256_bsrli_epi128(__m256i __m256i test_mm256_srli_epi16(__m256i a) { // CHECK: @llvm.x86.avx2.psrli.w - // CHECK-ASM: vpsrlw $3, %ymm{{.*}}, %ymm{{.*}} + // CHECK-ASM: vpsrlw {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}} return _mm256_srli_epi16(a, 3); } @@ -658,7 +658,7 @@ __m256i test_mm256_srl_epi16(__m256i a, __m256i test_mm256_srli_epi32(__m256i a) { // CHECK: @llvm.x86.avx2.psrli.d - // CHECK-ASM: vpsrld $3, %ymm{{.*}}, %ymm{{.*}} + // CHECK-ASM: vpsrld {{\$3|%xmm[0-9]+}}, %ymm{{.*}}, %ymm{{.*}} return _mm256_srli_epi32(a, 3); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D13861: [x86] fix wrong maskload/store intrinsic definitions in avxintrin.h (follow up of D13776).
andreadb created this revision. andreadb added reviewers: bruno, mkuper, delena, qcolombet. andreadb added a subscriber: cfe-commits. Hi, This patch is a follow up of D13776. According to the Intel documentation, the mask operand of a maskload and maskstore intrinsics is always a vector of packed integer/long integer values. This patch introduces the following two changes: 1) It fixes the avx maskload/store intrinsic definitions in avxintrin.h; 2) It changes BuiltinsX86.def to match the correct gcc definitions for avx maskload/store (see D13776 for more details). Please let me know if okay to submit. Thanks, Andrea http://reviews.llvm.org/D13861 Files: include/clang/Basic/BuiltinsX86.def lib/Headers/avxintrin.h test/CodeGen/builtins-x86.c Index: test/CodeGen/builtins-x86.c === --- test/CodeGen/builtins-x86.c +++ test/CodeGen/builtins-x86.c @@ -465,14 +465,14 @@ __builtin_ia32_movntdq256(tmp_V4LLip, tmp_V4LLi); __builtin_ia32_movntpd256(tmp_dp, tmp_V4d); __builtin_ia32_movntps256(tmp_fp, tmp_V8f); - tmp_V2d = __builtin_ia32_maskloadpd(tmp_V2dCp, tmp_V2d); - tmp_V4f = __builtin_ia32_maskloadps(tmp_V4fCp, tmp_V4f); - tmp_V4d = __builtin_ia32_maskloadpd256(tmp_V4dCp, tmp_V4d); - tmp_V8f = __builtin_ia32_maskloadps256(tmp_V8fCp, tmp_V8f); - __builtin_ia32_maskstorepd(tmp_V2dp, tmp_V2d, tmp_V2d); - __builtin_ia32_maskstoreps(tmp_V4fp, tmp_V4f, tmp_V4f); - __builtin_ia32_maskstorepd256(tmp_V4dp, tmp_V4d, tmp_V4d); - __builtin_ia32_maskstoreps256(tmp_V8fp, tmp_V8f, tmp_V8f); + tmp_V2d = __builtin_ia32_maskloadpd(tmp_V2dCp, tmp_V2LLi); + tmp_V4f = __builtin_ia32_maskloadps(tmp_V4fCp, tmp_V4i); + tmp_V4d = __builtin_ia32_maskloadpd256(tmp_V4dCp, tmp_V4LLi); + tmp_V8f = __builtin_ia32_maskloadps256(tmp_V8fCp, tmp_V8i); + __builtin_ia32_maskstorepd(tmp_V2dp, tmp_V2LLi, tmp_V2d); + __builtin_ia32_maskstoreps(tmp_V4fp, tmp_V4i, tmp_V4f); + __builtin_ia32_maskstorepd256(tmp_V4dp, tmp_V4LLi, tmp_V4d); + __builtin_ia32_maskstoreps256(tmp_V8fp, tmp_V8i, tmp_V8f); #ifdef USE_3DNOW tmp_V8c = __builtin_ia32_pavgusb(tmp_V8c, tmp_V8c); Index: lib/Headers/avxintrin.h === --- lib/Headers/avxintrin.h +++ lib/Headers/avxintrin.h @@ -835,53 +835,53 @@ /* Conditional load ops */ static __inline __m128d __DEFAULT_FN_ATTRS -_mm_maskload_pd(double const *__p, __m128d __m) +_mm_maskload_pd(double const *__p, __m128i __m) { - return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2df)__m); + return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2di)__m); } static __inline __m256d __DEFAULT_FN_ATTRS -_mm256_maskload_pd(double const *__p, __m256d __m) +_mm256_maskload_pd(double const *__p, __m256i __m) { return (__m256d)__builtin_ia32_maskloadpd256((const __v4df *)__p, - (__v4df)__m); + (__v4di)__m); } static __inline __m128 __DEFAULT_FN_ATTRS -_mm_maskload_ps(float const *__p, __m128 __m) +_mm_maskload_ps(float const *__p, __m128i __m) { - return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4sf)__m); + return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4si)__m); } static __inline __m256 __DEFAULT_FN_ATTRS -_mm256_maskload_ps(float const *__p, __m256 __m) +_mm256_maskload_ps(float const *__p, __m256i __m) { - return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, (__v8sf)__m); + return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, (__v8si)__m); } /* Conditional store ops */ static __inline void __DEFAULT_FN_ATTRS -_mm256_maskstore_ps(float *__p, __m256 __m, __m256 __a) +_mm256_maskstore_ps(float *__p, __m256i __m, __m256 __a) { - __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8sf)__m, (__v8sf)__a); + __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8si)__m, (__v8sf)__a); } static __inline void __DEFAULT_FN_ATTRS -_mm_maskstore_pd(double *__p, __m128d __m, __m128d __a) +_mm_maskstore_pd(double *__p, __m128i __m, __m128d __a) { - __builtin_ia32_maskstorepd((__v2df *)__p, (__v2df)__m, (__v2df)__a); + __builtin_ia32_maskstorepd((__v2df *)__p, (__v2di)__m, (__v2df)__a); } static __inline void __DEFAULT_FN_ATTRS -_mm256_maskstore_pd(double *__p, __m256d __m, __m256d __a) +_mm256_maskstore_pd(double *__p, __m256i __m, __m256d __a) { - __builtin_ia32_maskstorepd256((__v4df *)__p, (__v4df)__m, (__v4df)__a); + __builtin_ia32_maskstorepd256((__v4df *)__p, (__v4di)__m, (__v4df)__a); } static __inline void __DEFAULT_FN_ATTRS -_mm_maskstore_ps(float *__p, __m128 __m, __m128 __a) +_mm_maskstore_ps(float *__p, __m128i __m, __m128 __a) { - __builtin_ia32_maskstoreps((__v4sf *)__p, (__v4sf)__m, (__v4sf)__a); + __builtin_ia32_maskstoreps((__v4sf *)__p, (__v4si)__m, (__v4sf)__a); } /* Cacheability support ops */ Index: include/clang/Basic/BuiltinsX86.def ==
r250816 - [x86] Fix maskload/store intrinsic definitions in avxintrin.h
Author: adibiagio Date: Tue Oct 20 06:19:54 2015 New Revision: 250816 URL: http://llvm.org/viewvc/llvm-project?rev=250816&view=rev Log: [x86] Fix maskload/store intrinsic definitions in avxintrin.h According to the Intel documentation, the mask operand of a maskload and maskstore intrinsics is always a vector of packed integer/long integer values. This patch introduces the following two changes: 1. It fixes the avx maskload/store intrinsic definitions in avxintrin.h. 2. It changes BuiltinsX86.def to match the correct gcc definitions for avx maskload/store (see D13861 for more details). Differential Revision: http://reviews.llvm.org/D13861 Modified: cfe/trunk/include/clang/Basic/BuiltinsX86.def cfe/trunk/lib/Headers/avxintrin.h cfe/trunk/test/CodeGen/builtins-x86.c Modified: cfe/trunk/include/clang/Basic/BuiltinsX86.def URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/BuiltinsX86.def?rev=250816&r1=250815&r2=250816&view=diff == --- cfe/trunk/include/clang/Basic/BuiltinsX86.def (original) +++ cfe/trunk/include/clang/Basic/BuiltinsX86.def Tue Oct 20 06:19:54 2015 @@ -503,14 +503,14 @@ TARGET_BUILTIN(__builtin_ia32_lddqu256, TARGET_BUILTIN(__builtin_ia32_movntdq256, "vV4LLi*V4LLi", "", "avx") TARGET_BUILTIN(__builtin_ia32_movntpd256, "vd*V4d", "", "avx") TARGET_BUILTIN(__builtin_ia32_movntps256, "vf*V8f", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskloadpd, "V2dV2dC*V2d", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskloadps, "V4fV4fC*V4f", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskloadpd256, "V4dV4dC*V4d", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskloadps256, "V8fV8fC*V8f", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskstorepd, "vV2d*V2dV2d", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskstoreps, "vV4f*V4fV4f", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskstorepd256, "vV4d*V4dV4d", "", "avx") -TARGET_BUILTIN(__builtin_ia32_maskstoreps256, "vV8f*V8fV8f", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskloadpd, "V2dV2dC*V2LLi", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskloadps, "V4fV4fC*V4i", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskloadpd256, "V4dV4dC*V4LLi", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskloadps256, "V8fV8fC*V8i", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskstorepd, "vV2d*V2LLiV2d", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskstoreps, "vV4f*V4iV4f", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskstorepd256, "vV4d*V4LLiV4d", "", "avx") +TARGET_BUILTIN(__builtin_ia32_maskstoreps256, "vV8f*V8iV8f", "", "avx") // AVX2 TARGET_BUILTIN(__builtin_ia32_mpsadbw256, "V32cV32cV32cIc", "", "avx2") Modified: cfe/trunk/lib/Headers/avxintrin.h URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/avxintrin.h?rev=250816&r1=250815&r2=250816&view=diff == --- cfe/trunk/lib/Headers/avxintrin.h (original) +++ cfe/trunk/lib/Headers/avxintrin.h Tue Oct 20 06:19:54 2015 @@ -835,53 +835,53 @@ _mm256_storeu_si256(__m256i *__p, __m256 /* Conditional load ops */ static __inline __m128d __DEFAULT_FN_ATTRS -_mm_maskload_pd(double const *__p, __m128d __m) +_mm_maskload_pd(double const *__p, __m128i __m) { - return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2df)__m); + return (__m128d)__builtin_ia32_maskloadpd((const __v2df *)__p, (__v2di)__m); } static __inline __m256d __DEFAULT_FN_ATTRS -_mm256_maskload_pd(double const *__p, __m256d __m) +_mm256_maskload_pd(double const *__p, __m256i __m) { return (__m256d)__builtin_ia32_maskloadpd256((const __v4df *)__p, - (__v4df)__m); + (__v4di)__m); } static __inline __m128 __DEFAULT_FN_ATTRS -_mm_maskload_ps(float const *__p, __m128 __m) +_mm_maskload_ps(float const *__p, __m128i __m) { - return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4sf)__m); + return (__m128)__builtin_ia32_maskloadps((const __v4sf *)__p, (__v4si)__m); } static __inline __m256 __DEFAULT_FN_ATTRS -_mm256_maskload_ps(float const *__p, __m256 __m) +_mm256_maskload_ps(float const *__p, __m256i __m) { - return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, (__v8sf)__m); + return (__m256)__builtin_ia32_maskloadps256((const __v8sf *)__p, (__v8si)__m); } /* Conditional store ops */ static __inline void __DEFAULT_FN_ATTRS -_mm256_maskstore_ps(float *__p, __m256 __m, __m256 __a) +_mm256_maskstore_ps(float *__p, __m256i __m, __m256 __a) { - __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8sf)__m, (__v8sf)__a); + __builtin_ia32_maskstoreps256((__v8sf *)__p, (__v8si)__m, (__v8sf)__a); } static __inline void __DEFAULT_FN_ATTRS -_mm_maskstore_pd(double *__p, __m128d __m, __m128d __a) +_mm_maskstore_pd(double *__p, __m128i __m, __m128d __a) { - __builtin_ia32_maskstorepd((__v2df *)__p, (__v2df)__m, (__v2df)__a); + __builtin_
Re: [PATCH] D13861: [x86] fix wrong maskload/store intrinsic definitions in avxintrin.h (follow up of D13776).
andreadb added a comment. In http://reviews.llvm.org/D13861#270811, @delena wrote: > LGTM Thanks Elena! Committed revision250816. Repository: rL LLVM http://reviews.llvm.org/D13861 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
Hi Simon (and all), I noticed that this patch changes the definition of intrinsic _mm_cvtsd2_ss in emmintrin.h. Is that intentional? My understanding is that your patch should have only addressed float-to-integer conversions. Was this change to _mm_cvtsd_ss motivated by the fact that (V)CVTSD2SS depends on the rounding mode (control bits in the MXCSR register) for inexact conversions? That would explain why the LLVM part (r275981) also added a codegen test for that double-to-float conversion (I guess that none of the reviewer spotted that extra test). The problem I am seeing is that your change is causing a crash in the backend (at -O1 and above). See the reproducible below: / #include __m128 test(__m128 a, const __m128d &b) { return _mm_cvtsd_ss(a, b); } / Alternatively, here is the LLVM IR: define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture readonly %b) { entry: %0 = load <2 x double>, <2 x double>* %b, align 16 %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2 x double> %0) ret <4 x float> %1 } ; Function Attrs: nounwind readnone declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>) With your patch, we now always generate a call to @llvm.x86.sse2.cvtsd2ss when expanding the builtin call from _mm_cvtsd_ss. ISel would then select `Int_CVTSD2SSrm` instruction, which however is `CodeGenOnly`. Unfortunately that pseudo is never further expanded/replaced before compilation reaches the object emitter stage. So, before we execute Pass 'X86 Assembly/Object Emitter' we see machine code like this: BB#0: derived from LLVM BB %entry Live Ins: %RDI %XMM0 %XMM0 = Int_VCVTSD2SSrm %XMM0, %RDI, 1, %noreg, 0, %noreg; mem:LD16[%b] RETQ %XMM0 .. which then causes the following assertion failure: [2016-07-25 13:25:11.830351700] 0x7bad5f87e0 Executing Pass 'X86 Assembly / Object Emitter' on Function 'test'... Cannot encode all operands of: > Overall, I agree that the compiler shouldn't make assumptions on the rounding mode when coverting from double-to-float. However, the RM variant of Int_VCVTSD2SS doesn't seem to be correctly handled/expanded by the backend. Can we revert the change to the double-to-single convert? Alternatively, can we fix the codegen issue exposed by this change (and add extra test coverage)? My opinion is that the change to the double-to-float conversion intrinsic should have been part of a separate patch. Thanks, Andrea On Fri, Jul 22, 2016 at 2:18 PM, Hans Wennborg via cfe-commits < cfe-commits@lists.llvm.org> wrote: > On Thu, Jul 21, 2016 at 6:34 PM, Robinson, Paul via cfe-commits > wrote: > > > > > >> -Original Message- > >> From: cfe-commits [mailto:cfe-commits-boun...@lists.llvm.org] On > Behalf Of > >> Simon Pilgrim via cfe-commits > >> Sent: Wednesday, July 20, 2016 3:18 AM > >> To: cfe-commits@lists.llvm.org > >> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion > intrinsics > >> instead of using generic IR > >> > >> Author: rksimon > >> Date: Wed Jul 20 05:18:01 2016 > >> New Revision: 276102 > >> > >> URL: http://llvm.org/viewvc/llvm-project?rev=276102&view=rev > >> Log: > >> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using > >> generic IR > >> > >> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and > VCVTTPD2DQ > >> truncating conversions with generic IR instead. > >> > >> It turns out that the behaviour of these intrinsics is different enough > >> from generic IR that this will cause problems, INF/NAN/out of range > values > >> are guaranteed to result in a 0x8000 value - which plays havoc with > >> constant folding which converts them to either zero or UNDEF. This is > also > >> an issue with the scalar implementations (which were already generic IR > >> and what I was trying to match). > > > > Are the problems enough that this should be merged to the 3.9 release > branch? > > --paulr > > IIUC, this is the Clang-side of r275981, and if we merge that this > should probably be merged too. > > Thanks, > Hans > ___ > cfe-commits mailing list > cfe-commits@lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits > ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits