it might be useful also bring some discussions to forums. here is a quick 
related sketch of GPU related models

```python
for y in range(64):
  for x in range(64):
      C[y, x] = A[y, x] * (B[y] + 1)
```
Say we are interested in the original program. In a normal GPU programming 
terminology, we will map the compute of x to "threads", there `tid` is the 
thread index. In GPU programming there is also different memory scopes (i am 
using cuda terminology here):
- local: the variable is local to each thread
- shared: the variable is "shared" across threads, concurrent writing different 
values to the same shared variable is somewhat undefined.
- warp shuffle: sometimes we might need to exchange data(e.g. take sum) across 
the threads, and it is done through shuffle instructions(like warp.all_reduce).

```python
for y in range(64):
  for x in range(64 // n):
    for tid in T.scalable_vectorized_as_threads(n):
      a0: local = A[y, tid + n * x]
      b0: shared = B[y]
      b1: shared =  b0 + 1
      c0: local = a0 * b0
      C[y, tid + n * 4 * i] = c0
````

The above code is a rough sketch of what it might looks like. Now, it might 
also be possible to produce a similar more "vector-view" version using the 
following rule:
- local <=> vector<vscale>
- shared <=> normal register

```python
# note vscale = n
for y in range(64):
  for x in range(64 // n):
    with T.sve_scope(n):
      a0: vector<vscale> = A[y, tid + n * x]
      b0: scalar = B[y]
      b1: vector<vscale> =  b0 + 1
      c0: scalar = a0 * b0
      C[y, tid + n * 4 * i] = c0
```

They are not that different. But one thing is true: we do need to be able to 
identify the vector dtype differently from the scalar dtype(or in the case of 
GPU programming local from shared). Being able to mark a dtype as 
ScalableVectorMark seems to serve that purpose.




-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1699173201
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/104/c1699173...@github.com>

Reply via email to