Some quick comments - I think we should use tir intrinsics(as opposed to a new node, which would add extra burdens in the IR) - In general, it might be useful to know the information that a value is multiple of something (e.g. 128), so having something like `x * 128` might help
- I would still love us think about tensorization support in the codegen with some form of loop annotation (without explicit vector dtypes), as they will generalize across to more complex operations. One possible way to think about SVE is perhaps drawing inspiration from CUDA programming, where each of the thread corresponds to one element in the vector lane, and ways to distinguish between normal register(that is shared acroess threads), and vector register(thread local storage per thread). Having one special sve vector dtype is a fine compromise in the vector case, since we only needs to tell difference between normal scalar reg and vector reg -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1692138122 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/104/c1692138...@github.com>