[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Xqdan via Apache TVM Discuss


This is the right way to go. However I have two concern,
1) How to fuse ops as much as possible? Basically fusion is copy propagation 
optimization in compilers, which is based on data flow analysis, but still lack 
of programming analysis in TVM now.
2) TE tensorize can not handle some complex pattern matching, see 
https://github.com/apache/incubator-tvm/pull/1053, can we do 100% pattern 
matching in tir?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/29)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/78c2de28cf50a3f0e21bd234a9fd975d7fd77c870c4627104dab67469571f219).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-21 Thread zj via Apache TVM Discuss


Thank you for your reply. 

Regarding time-consuming fluctuations, I didn't make it clear.
After autotvm tune is completed, I picked the best record for time-consuming 
testing, and its time-consuming fluctuates significantly.I calculate the time 
difference between the start and the end to get the time-consuming.
> struct timeval curTime1;
> 
> gettimeofday(&curTime1, NULL);
> 
> size_t milli_start = curTime1.tv_sec*100 + curTime1.tv_usec;
> 
> tvm::runtime::TVMRetValue ret = f(x, y, z);
> 
> struct timeval curTime2;
> 
> gettimeofday(&curTime2, NULL);
> 
> size_t milli_end = curTime2.tv_sec*100 + curTime2.tv_usec;
> 
> size_t run_time = milli_end - milli_start;


However, the time-consuming of strassen algorithm does not fluctuate 
significantly. So I am curious whether time-consuming fluctuation is related to 
tvm, or it is just caused by cpu load changes(After all, cpu is not dedicated).





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/15) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/45d45c48205208d04bf28f7fe522e6636577bd9c8597251f5e0240d1865e1259).


[Apache TVM Discuss] [Development/RFC] [RFC] Differentiable tensor expression (Create and verify backward op automatically)

2020-09-21 Thread wrongtest via Apache TVM Discuss


As there are more and more demands on TVM's training support, one of the most 
tedious but important work is to write backward implementation for operators. 
It may take great benefit if we can provide automation tools to help this 
process. Such tool can serve in two functionalities:

- Automatically create backward definition from forward definition
- Check gradient given forward and backward definition

Traditional deep learning framework (perhaps Theano except :wink: ) conduct 
auto back-propagation on op graph level, that is, they have to implement one 
backward op given one forward op. Theoretically there should be 1 backward 
op definitions if they have 1 forward ops.

For TVM however, there is an opportunity that we may conduct back-propagation 
on tensor expression level. Tensor expression operations are much less than 
whole neural network operators set, thus it will greatly reduce human work on 
higher level (relay op).

### Backward tensor expression generator
 Interface
Since tensor expression defines how to compute output from input symbolically, 
we can just try apply back-propagation rule to it. eg, we can provide utility 
interface like
```python
def auto_backprop(inputs: List[Tensor], output: Tensor) -> (List[Tensor], 
Tensor):
"""
Given input tensor list and output tensor, generate backward computation.
- The inputs are the placeholder representing the gradient respect to 
original output and some other necessary original tensors.
- The outputs are gradients respect to each of the original inputs.
"""
pass
```
Now if we have already defined some forward computation, then we can extract a 
"default" backward computation definition:
```python
x = te.placeholder((n, k))
y = te.placeholder((m, k))
z = te.compute((n, m), ...)
  
((grad_x, grad_y), grad_z_placeholder) = te.auto_backprop((x, y), z)
sched = te.create_schedule(grad_x.op)
# Do schedule and tune backward ops...
```

The transformation should happens before create_schedule(), since generally 
forward & backward definitions are different and may not share same 
optimization strategies. 

We can wrap this sort of utility in topi and relay, where we can try best to 
provide default backward op definitions automatically without hand-written 
definition. Some pros and cons are listed below:

- Pros
- Avoid hand-written work for at least some portion of operations.
- Auto generated definition maybe more robust on boundary behaviors and 
corner cases.
- Cons 
- It is not all-powerfull. Not all operators can be automatically backward.
- Some optimization hint may lose (backward of matmul is also matmul, 
backward of conv2d is also conv2d)
 
 Transformation logic
At the beginning we may just focus on `te.compute()`, and do not support for 
tensor intrinsic / hybrid / extern. 
- ```te.compute()```
- Use simple matmul as an example
  ```python
  te.compute((m, n), lambda i, j: tvm.sum(data[i, k] * weight[j, k], axis=k)
  ```
  If we want to compute gradient respect to `weight[w1][w2]`, we have to 
know how output is related to this weight position. Thus we "remap" the iter 
vars related to weight:
  ```python
  j = w1, k = w2
  ```
  Then all iter vars in compute expression can be represented with [w1, w2] 
with affine transformations. 
  ```python
  tvm.sum(data[i, w2] * weight[w1, w2], axis=..)  (for i, j=w1)
  ```
  `i` is free variable inner, it can be seen that each `weight[w1, w2]` 
contribute to all `output[i, w1]` for each feasible `i`. For each `i`, the 
gradient of `tvm.sum(...)`  respect to `weight[w1, w2]` is `data[i, w2]`. 
According to chain rule, the gradient of loss respect to `weight[w1, w2]` can 
be computed as

```python
   tvm.sum(data[i, w2] * grad_output[i, w1], axis=i)
```
- Actual back-propagation logic should carefully handle iter var 
relationships. For each occurance of target tensor to compute gradient in the 
expression, the feasible integer sets of each free iter var will get inferred 
based on iter var remapping. Given free vars fixed, compute gradient expression 
of output expression respect to target tensor position. Finally chain rule is 
applied to sum gradient expression among free var's feasible set. Unsupported 
case should be detected explicitly. 

- ```te.scan()```  is also an interesting operation valuable to support 
back-propagation, with which we 
 can get backward implementations of RNN/LSTM/GRU directly.

### Gradient checking between forward && backward ops
Given forward and backward implementation pair, we can verify the correctness 
with approximate gradients. This help developer to detect implementation error 
on general and corner cases. One of the methods is well described in 
https://datascience-enthusiast.com/DL/Improving_DeepNeural_Networks_Gradient_Checking.html





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-differentiable-tensor-expression-create-and

[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Junru Shao via Apache TVM Discuss


@xqdan Thank you for the valuable feedback! Fusion can be done automatically 
with some analysis provided in Ansor.

Do you have any other kind of analysis in mind that might be potentially useful?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/30)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/eaf9e71884e702c672d7ce255eee7c8e9a0f4d9157e54c5ba2539be5852022ad).


[Apache TVM Discuss] [Development/RFC] [RFC] Differentiable tensor expression (Create and verify backward op automatically)

2020-09-21 Thread Junru Shao via Apache TVM Discuss


Hey @wrongtest,

Thank you for the RFC! Just wondering how it compares with the previous AD RFC 
(https://discuss.tvm.apache.org/t/rfc-bring-in-tensor-expression-autodiff/5987)
?

Thanks!





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-differentiable-tensor-expression-create-and-verify-backward-op-automatically/7960/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/563d880c5183b479294df89b5ab284d589148053d0533b522073f617950694e8).


[Apache TVM Discuss] [Development/RFC] [RFC] Rename Hybrid Script

2020-09-21 Thread Tristan Konolige via Apache TVM Discuss


I've put up an initial PR here: 
https://github.com/apache/incubator-tvm/pull/6522.

An issue has come up, what do we name the python module?

## Option 1
We name the module `tvm.tvmscript`.
Example usage:
```python
import tvm

# Can still use this though
@tvm.script # or tvm.script.tir
def my_func():
  pass

@tvm.script # or tvm.script.module
class Mod:
  def my_func():
pass

string = my_func.asscript()
assert(string == tvm.tvmscript.parse(string))

# can also do
from tvm import tvmscript

assert(string == tvmscript.parse(string))
```

The disadvantage here is that `tvm.tvmscript` repeats tvm twice. But it does 
make it explicit that the script we are using is tvm script (as opposed to 
hybrid script).

## Option 2
We name the module `tvm.script`. We still refer to this as "TVM Script" in all 
documentation, etc.
```python
import tvm

# Can't use tvm.script as it is a namespace
@tvm.script.tvm # or tvm.script.tir (see option 2a)
def my_func():
  pass

@tvm.script.tvm # or tvm.script.module
class Mod:
  def my_func():
pass

string = my_func.asscript()
assert(string == tvm.script.parse(string))

# can also do
from tvm import script

assert(string == script.parse(string))
```

If we use `tvm.script` as the module name we cannot use the `@tvm.script` 
decorator. We have two options for the decorator. **Option 2a**: use 
`@tvm.script.tvm`. **Option 2b**: use `@tvm.script.tir` for functions and 
`@tvm.script.module` for modules.

The disadvantage here is that the name `script` can be confusing when used 
unqualified (when using from imports). Pytorch uses this approach, but they 
only have a single script in their package.


Let me know which you like best. (Hopefully this isn't too much bike shedding).





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-rename-hybrid-script/7915/11) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/b98f41f6ca11984e6c712008179bf9f6112402188baa3e13b4243e55eef9de53).


[Apache TVM Discuss] [Development/RFC] [RFC] Rename Hybrid Script

2020-09-21 Thread Bohan Hou via Apache TVM Discuss


No matter which option we take, do we have to discriminate between function and 
class when annotating with decorator?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-rename-hybrid-script/7915/12) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/f8b561ec26d38b88a31e3f14ca3b63434d680fde3731a7ac902170f666581a88).


[Apache TVM Discuss] [Development/RFC] [RFC] Rename Hybrid Script

2020-09-21 Thread Tristan Konolige via Apache TVM Discuss


Yes and no. Right now we do not need to differentiate. But in the future, 
functions in a module may either use be for TIR or for relay.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-rename-hybrid-script/7915/13) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d441a983c0b0cb655d1cda6c63799ba86b83db44b5a77e224b578957b852dc27).


[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Xqdan via Apache TVM Discuss


Is Fusion in Ansor based on tir? 
For other transforms, you may checkout here, that's what we've done in AKG. I 
can explain some if you are intrested.
 
https://github.com/mindspore-ai/akg/blob/master/src/codegen/build_module.cc#L439





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/31)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/a7f0d2bafba438aef187410bc3f676663b7b15309b2644f747d34a10d3bc45bd).


[Apache TVM Discuss] [Development/RFC] [RFC] Differentiable tensor expression (Create and verify backward op automatically)

2020-09-21 Thread wrongtest via Apache TVM Discuss


Glad to see autodiff is already in progress! I think this rfc can be withdrew 
since this is exactly what autodiff is doing.

Now I am very curious about current progress of autodiff with some questions. 
- If I have some common neural network structure such as resnet50 at hand, can 
I just use autodiff to get backward computation graph?
- Is there some description about common ops which can be coveraged by autodiff?
- Can te.scan() be supported?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-differentiable-tensor-expression-create-and-verify-backward-op-automatically/7960/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/8349395ea57da88fe33bdb6e99388b410e6120246422cf1deb9af09122aeac4c).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-21 Thread Zhao Wu via Apache TVM Discuss


If you want to measure it more robust, you should run it more times and 
calculate its average time. For example you could run 1000 times.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/16) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/aa49226226d478314435f8cd251a0a56a38cf2fca52d07617e14465930421a46).


[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Xqdan via Apache TVM Discuss


@junrushao1994 It's better to know loops can be vectoried, permutable or 
distributied, isl can provide these information,so we can do loop optimization 
and tensorization/vectorization automatically.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/32)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/9935959d85972017de17516f48d2c09e3a5b07c0857a9cdcdd3306e512945c9f).


[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Junru Shao via Apache TVM Discuss


@xqdan In Ansor, Fusion analysis is handled in TE with some straightforward 
heuristics, which I believe have covered our usecases. CC: @merrymercy @jcf94

Agree that ISL provides effective information about vectorization, and I 
believe there might be other competitive heuristics too. Tensorization is a 
more general topic that would be super interesting to explore :-)





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/33)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/781aa8cc2d490e9d898000241082a29a887d62556e043cba1c9e5b571e21c087).


[Apache TVM Discuss] [Development/RFC] [RFC] Differentiable tensor expression (Create and verify backward op automatically)

2020-09-21 Thread Junru Shao via Apache TVM Discuss


CC: @yzhliu the major contributor of this feature





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-differentiable-tensor-expression-create-and-verify-backward-op-automatically/7960/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/425e88a15e221472dcccd16d9537032d1039362524c5ff4ac91858442fafbc24).


[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Lianmin Zheng via Apache TVM Discuss


How is the compilation speed compared to the original TE?
In Ansor/Autotvm, we have to compile a lot of schedules for feature extraction, 
so the speed of schedule transformation matters.

Do you have any benchmark results? Intuitively, I think the original TE will be 
faster because it can do a batched bound inference and AST construction. If it 
is true, how can we fix this performance gap?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/34)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d44828c61231fffb080d241914c92520cd26123d675c2c0a96aac51d5d97bab2).


[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Junru Shao via Apache TVM Discuss


@merrymercy I didn't get it about batched bound inference, doesn't Ansor use a 
pool of threads for massive bound inference?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/35)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/125fc44665f3b014e07161454a28ad60c0dd07ce2dab0c12233aa9d79b4fb79c).


[Apache TVM Discuss] [Development/RFC] [RFC] TensorIR: A schedulable IR for TVM

2020-09-21 Thread Chenfan via Apache TVM Discuss


E... @junrushao1994 I guess @merrymercy 's opinion is that doing analysis in TE 
is quicker than using the ISL.

ISL is sure a powerful tool for loop analyse, but in my understanding we should 
lower the schedule to C code first before using ISL? Which I think is more time 
consuming.

Currently, Ansor applies some simple but useful analyses based on TE. Though it 
may not be as accurate as ISL does, but it's cheap. Then we count on the tuning 
to try lots of uncertain schedules and find the best one by measuring.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/36)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/ed7bad22d57cd3d30d99cfba8be5dd21289c41a85ad4e40e8ab21b989b3c9b4f).