Some quick comments
- I think we should use tir intrinsics(as opposed to a new node, which would 
add extra burdens in the IR)
- In general, it might be useful to know the information that a value is 
multiple of something (e.g. 128), so having something like `x * 128` might help 

- I would still love us think about tensorization support in the codegen with 
some form of loop annotation (without explicit vector dtypes), as they will 
generalize across to more complex operations. 


One possible way to think about SVE is perhaps drawing inspiration from CUDA 
programming, where each of the thread corresponds to one element in the vector 
lane, and ways to distinguish between normal register(that is shared acroess 
threads), and vector register(thread local storage per thread).

Having one special sve vector dtype is a fine compromise in the vector case, 
since we only needs to tell difference between normal scalar reg and vector reg

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1692138122
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/104/c1692138...@github.com>

Reply via email to