I've hit a nasty issue. On CPU targets, our sorting related ops are implemented in C++ https://github.com/apache/tvm/blob/main/src/runtime/contrib/sort/sort.cc#L436, and they don't support fp16. So ops like `topk`, `argsort`, `nms` etc do not work on fp16 + cpu target combination. We can add all of them to the NEVER list, but then that would introduce unnecessary cast for GPU targets because sorting on GPU is implemented in TIR so it doesn't have issues with fp16.
Maybe we need to add a specialized CPU sort for fp16 or rewrite CPU sort in TIR... -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm/issues/8296#issuecomment-892378745