I spent a lot of time optimizing the sort/argsort kernel for GPUs, we get 
pretty good performance on GPUs from multiple vendors that competes with those 
vendor's hand tuned libraries.

If these TIR kernels are well optimized, they shouldn't end up being the 
bottleneck in models.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/autoscheduler-do-we-have-plan-to-support-auto-schedule-externop/10346/8)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/14c295c34570a97d001db55f9298884a22c6dc456ae89c68ae205b388f91ad5e).

Reply via email to