I spent a lot of time optimizing the sort/argsort kernel for GPUs, we get pretty good performance on GPUs from multiple vendors that competes with those vendor's hand tuned libraries.
If these TIR kernels are well optimized, they shouldn't end up being the bottleneck in models. --- [Visit Topic](https://discuss.tvm.apache.org/t/autoscheduler-do-we-have-plan-to-support-auto-schedule-externop/10346/8) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/14c295c34570a97d001db55f9298884a22c6dc456ae89c68ae205b388f91ad5e).