Thanks @Laurawly for the proposal. I agree this is something that is good to 
have as they brings the best of all worlds.

Looking at the MNK=2048 benchmark. It seems to suggest a limitation in the 
auto-scheduler's search space construction (the hand written [topi 
recipie](https://github.com/apache/tvm/tree/main/apps/topi_recipe/gemm) gets to 
closer to peak about 80-90% of cublas on titan X).

In complementary to this effort, we will also explore the possibility to 
utilize the primitives in cutlass, including those for the FMA computation and 
memory move via tensorization. So see if we can leverage these sub-functions in 
different finer-grained levels, eventually bringing the insights from cutlass 
to auto-scheduling.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-byoc-nvidia-cutlass-integration/9147/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/dd39482a42cbf57a999fa6fb3d1f46befcfe7866081c6b6fe3047ba8ea399fae).

Reply via email to