Completely agree with these perspectives. Another observation I have is that
projects developed based on TVM are often not straightforward; they typically
require hacking the underlying TVM code. For example, in the Ladder project
(based on Welder), we added support for MFMA and HIP code gener
@tqchen, thanks! This is exactly what we are expecting. However, last time I
tried to bring my own tuner into `mlc-llm`, I encountered an issue:
```python
import tvm # upstream
relax_mod = relax_transform(relax_mod)
import welder
relax_mod = welder.tune(relax_mod)
# something bad happened
``
@varunnaw Good point, in my project we use this approach to retrieve
attributes, including the dynamic shared memory size and block/grid
information, which might be helpful to you.
https://github.com/microsoft/BitBLAS/blob/main/bitblas/builder/wrapper/tir.py#L64-L80
## Why this is important?
>From merged pr https://github.com/apache/tvm/pull/17278/files . We can do code
>generation for `LetStmt`. For example, if we disable let-inlining in the
>Simplify Pass:
```python
iter_var: T.int32() = T.ceildiv(K, block_K)
for ko in T.serial(iter_var):
...
```
The generated CUDA code bec