[libc] [llvm] [compiler-rt] [libcxx] [lld] [lldb] [flang] [clang-tools-extra] [libcxxabi] [clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2024-01-25 Thread Justin Fargnoli via cfe-commits
https://github.com/justinfargnoli edited https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: clang-format failure on GitHub is weird -- it just silently exits with an error. I ran the same command locally and fixed one place it was not happy about. The buildkite failure somewhere in RISC-V appears to be unrelated. https://github.com/llvm/llvm-project/pull/67866 ___

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: Found another issue. We merge four independent byte loads with `align 1` into a 32-bit load, which fails at runtime on misaligned pointers. ``` %t0 = type { [17 x i8] } @shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1 define <4 x i8> @i

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1248 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3 +; ## Support i16x2 instructions +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda -mcpu=sm_90 -mattr=+ptx80 \ +; RUN: -O0 -disable-post-ra -frame-pointer=

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Benjamin Kramer via cfe-commits
https://github.com/d0k approved this pull request. https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > I see one suspicious failure in tensorflow tests. I suspect I've messed > > something up in v4i8 comparison. > > Yup, there is a problem: > > ``` > Successfully custom legalized node > ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, > Constant:i16<-128>, Consta

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -2150,58 +2179,94 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const { return DAG.getBuildVector(Node->getValueType(0), dl, Ops); } -// We can init constant f16x2 with a single .b32 move. Normally it +// We can init constant f16x2/v2i16/v4i

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B deleted https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -2150,58 +2179,94 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const { return DAG.getBuildVector(Node->getValueType(0), dl, Ops); } -// We can init constant f16x2 with a single .b32 move. Normally it +// We can init constant f16x2/v2i16/v4i

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I see one suspicious failure in tensorflow tests. I suspect I've messed > something up in v4i8 comparison. Yup, there is a problem: ``` Successfully custom legalized node ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>, Const

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: I see one suspicious failure in tensorflow tests. I suspect I've messed something up in v4i8 comparison. https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1248 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3 +; ## Support i16x2 instructions +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda -mcpu=sm_90 -mattr=+ptx80 \ +; RUN: -O0 -disable-post-ra -frame-pointer=

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-05 Thread Thomas Raoux via cfe-commits
https://github.com/ThomasRaoux approved this pull request. Looks like it required quite a lot of cases to be handled :( Thanks for doing this, it solves some of the problems triton had with latest LLVM. Changes look good to me. https://github.com/llvm/llvm-project/pull/67866 __

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-05 Thread Thomas Raoux via cfe-commits
@@ -0,0 +1,1248 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3 +; ## Support i16x2 instructions +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda -mcpu=sm_90 -mattr=+ptx80 \ +; RUN: -O0 -disable-post-ra -frame-pointer=

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-05 Thread Thomas Raoux via cfe-commits
ThomasRaoux wrote: I ran the patch on our triton kernels and I don't see any functional problems left. https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listi