I've hit a nasty issue. On CPU targets, our sorting related ops are implemented 
in C++ 
https://github.com/apache/tvm/blob/main/src/runtime/contrib/sort/sort.cc#L436, 
and they don't support fp16. So ops like `topk`, `argsort`, `nms` etc do not 
work on fp16 + cpu target combination. We can add all of them to the NEVER 
list, but then that would introduce unnecessary cast for GPU targets because 
sorting on GPU is implemented in TIR so it doesn't have issues with fp16.

Maybe we need to add a specialized CPU sort for fp16 or rewrite CPU sort in 
TIR...

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8296#issuecomment-892378745

Reply via email to