https://bugs.freedesktop.org/show_bug.cgi?id=52209
--- Comment #7 from Roland Scheidegger <srol...@vmware.com> 2012-07-17 23:25:56 PDT --- Since the test doesn't use any sized vectors depending on cpu caps LP_NATIVE_VECTOR_WIDTH shouldn't affect anything. Here's the IR of a test which fails: define void @fetch_r32g32_sscaled_unorm8(<4 x i8>*, i8*, i32, i32) { entry: %4 = bitcast i8* %1 to <2 x i32>* %5 = load <2 x i32>* %4, align 4 %6 = shufflevector <2 x i32> %5, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 2> %7 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %6, <4 x i32> zeroinitializer) %8 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %7, <4 x i32> <i32 1, i32 1, i32 1, i32 1>) %9 = ashr <4 x i32> %8, <i32 -1, i32 -1, i32 -1, i32 -1> %10 = sub <4 x i32> %8, %9 %11 = extractelement <4 x i32> %10, i32 0 %12 = extractelement <4 x i32> %10, i32 1 %13 = extractelement <4 x i32> %10, i32 2 %14 = extractelement <4 x i32> %10, i32 3 %15 = bitcast i32 %11 to <2 x i16> %16 = bitcast i32 %12 to <2 x i16> %17 = shufflevector <2 x i16> %15, <2 x i16> %16, <2 x i32> <i32 0, i32 2> %18 = bitcast i32 %13 to <2 x i16> %19 = bitcast i32 %14 to <2 x i16> %20 = shufflevector <2 x i16> %18, <2 x i16> %19, <2 x i32> <i32 0, i32 2> %21 = bitcast <2 x i16> %17 to <4 x i8> %22 = bitcast <2 x i16> %20 to <4 x i8> %23 = shufflevector <4 x i8> %21, <4 x i8> %22, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %24 = shl <4 x i8> %23, <i8 8, i8 8, i8 8, i8 8> %25 = sub <4 x i8> %24, %23 %26 = bitcast <4 x i8> %25 to i32 %27 = and i32 %26, 65535 %28 = or i32 bitcast (<4 x i8> <i8 0, i8 0, i8 0, i8 -1> to i32), %27 %29 = bitcast i32 %28 to <4 x i8> store <4 x i8> %29, <4 x i8>* %0 ret void } With llvm 3.1 it passes but not with 2.9/3.0. But there's more to it, with 2.9 AND a cpu which isn't sse41-capable it also passes (and on top of it the code generated is way _better_ despite it can't use the pminsd/pmaxsd intrinsics but those aren't the issue). So with sse41 or avx capable cpu llvm 3.1 generates correct but crappy code, whereas it is crappy and wrong with 2.9/3.0. Only if you have a not-sse41 capable cpu it produces correct and good code... I believe the issue here is use of the non-native vectors toward the end (2x16, 4x8) since llvm uses padded vector elements for them (a 4xi8 vector looks like 4xi32) so it has to do lots of weird shuffles (those harmless looking bitcasts cause lots of unpacks, shuffles etc.). Well that's the explanation for the crappy code (probably some optimization wasn't available without sse41 which turned out to be much better in the end). Fortunately it shouldn't happen with llvmpipe since we don't generally use such vectors (we always fetch multiple of 4 values). This doesn't explain why it isn't correct though. Maybe we're relying somewhere on some properties of those values when resizing which don't hold true if the vector elements aren't packed but padded. There's another issue with this code, which may or may not be related to this bug: %9 = ashr <4 x i32> %8, <i32 -1, i32 -1, i32 -1, i32 -1> (the uscaled formats will have a lshr instead). This shuffle is illegal since shuffles with values larger or equal than vector width (which this is) are undefined in llvm (ok not illegal just the result is undefined). However, llvm itself doesn't care and with sse2 it just happily issues the psrad 255 instruction, which has defined (and reasonable) behavior (for the non-vector domain the hardware will just use the last count bits which would still work). This comes from lp_build_conv(), line 594 (since src_shift is zero, and src_offset is 0 and dst_offset is 1). So something seems wrong with this calculation, maybe we'd need to do something different if destination is normalized format instead. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev