@jroesch Currently, it is about 500 loc per backend. I am working on improvements so it may increase.
@yzhliu * simple reduction: reduction ops that do not have reuse opportunity (e.g. softmax, argmin) * complex reduction: reduction ops that have reuse opportunity (e.g. matmul, conv2d * direct compute: broadcast, elemwise, stencil computation, (e.g. relu, add) * location-tunable compute: the same as above. The difference is that `direct compute` computes at root, while `location-tunable compute` can computes at other nodes to increase locality. @tmoreau89 This is doable. The problem of accelerators is that if we want the auto-scheduler to take in a hardware-independent description, then we need a special pack pass to transform the layout. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2954#issuecomment-479379983