@jroesch Currently, it is about 500 loc per backend. I am working on 
improvements so it may increase.

@yzhliu 
* simple reduction: reduction ops that do not have reuse opportunity (e.g. 
softmax, argmin)
* complex reduction: reduction ops that have reuse opportunity (e.g. matmul, 
conv2d
* direct compute: broadcast, elemwise, stencil computation, (e.g. relu, add)
* location-tunable compute: the same as above. The difference is that `direct 
compute` computes at root, while `location-tunable compute` can computes at 
other nodes to increase locality.

@tmoreau89 This is doable. The problem of accelerators is that if we want the 
auto-scheduler to take in a hardware-independent description, then we need a 
special pack pass to transform the layout.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2954#issuecomment-479379983

Reply via email to