Re: [dmlc/tvm] [RFC] Tensor Core Support (#4052)

2019-10-03 Thread Siyuan Feng
@tmoreau89 Exactly! For now, we use the NCHWnc layout, the same layout with VTA.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4052#issuecomment-537816661

Re: [dmlc/tvm] [RFC] Tensor Core Support (#4052)

2019-10-03 Thread Siyuan Feng
@yangjunpro Really happy to see another solution for TensorCore. 

You are right! I just extend tvm intrinsic to support it. It does cause 
programmers who write the schedule some trouble. It is not easy to write a 
high-performance schedule.

I'm really curious about how to use IR passes to recognize the pattern. Does it 
need to split into several loops of 16 in python code? I appreciate it if you 
can show me some details and simple examples

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4052#issuecomment-537821079

Re: [dmlc/tvm] [RFC] Tensor Core Support (#4052)

2019-10-03 Thread 孙敏敏
@Hzfengsy Sure, we will show the code as well as a sample schedule very soon. 
It's being under internal review now. As you will see, the schedule for 
TensorCore CodeGen looks no different than a normal matmul schedule for GPU. 
Everything is done in IR passes including matrix_a/matrix_b/accumulator 
recognition, row/col_major recgnition as @yangjunpro mentioned, thread index 
unification within a warp for tensorcore operations, loop scaling etc..

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4052#issuecomment-537872194

[dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread 雾雨魔理沙
I am trying to train mnist using tvm, and I hit an issue:
There is two function, loss and infer, which one calculate loss and gradient, 
and one just does infer.
However, create_executor/aot/vm all only take one single entry point. If there 
is multiple entry point, the passes will be called multiple time.
Furthurmore, I think the notion of 'compile then execute' is not enough for all 
the deep learning workload. For example, using our partial evaluator to 
specialize training/validation/testing data mean we must compile only after we 
had loaded all the data. Even if we accept that, since PE is slow, an 
optimization (at the client side) will be to pipeline the PEs: only do them per 
minibatch at epoch 0, then reuse. Or just has a thread that constantly PE 
things and fire in NON-PE mode if it did not PE that batch yet. Another example 
is NAS/AdaNet where the Network structure change all the time.

I think we can make the passes incremental (they are given a set of new things 
whenever invoked), and have the module keep track of the compiled vs uncompiled.
@jroesch @tqchen @zhiics @wweic @vinx13 @junrushao1994 what do you guys think?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Junru Shao
Just to clarify:

> There is two function, loss and infer, which one calculate loss and gradient, 
> and one just does infer.

Do you mean that there are two modes, training and inference, so that there are 
multiple entry points in relay Module?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538058861

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Yao Wang
Can we have a more detailed example to help clarify this issue?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538060274

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Junru Shao
And also:

>  'compile then execute' is not enough for all the deep learning workload. For 
> example, using our partial evaluator to specialize 
> training/validation/testing data mean we must compile only after we had 
> loaded all the data.

So in DL, common practice is that we specify the input shape in an ad-hoc way. 
Particularly, in MNIST, we know that our input is in shape `(batch_size, 784)`. 
For more complicated workloads, like models containing complicated control 
flow. I don't really think loading all the data would suffice. Probably 
compilation should happen in basic block level if say the IR is in CFG (so you 
need jit)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538060918

Re: [dmlc/tvm] [DEV] TVM v0.6 Roadmap (#2623)

2019-10-03 Thread Haichen Shen
# TVM Monthly - September 2019
https://discuss.tvm.ai/t/tvm-monthly-september-2019

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2623#issuecomment-538074210

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread 雾雨魔理沙
@junrushao1994 yes, that is what I mean, inference mode and training mode, each 
mode compiled to one function.
we do the partial evaluator primarily for partially evaluating the control flow 
w.r.t data. Other framework also does this, but they require manual loop 
unrolling

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538135922

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Junru Shao
@MarisaKirisame Do you mean you want the training loop itself to be part of 
Relay?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538140726

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread 雾雨魔理沙
No. suppose I have a treelstm. Now normally, I cannot do any operator 
fusion/batching, because of control flow everywhere. Using the partial eval on 
batches of training data individually will solve this problem.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538155261

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread 雾雨魔理沙
However, that require the ability to create multiple entry (one entry per 
batch). If we also want to use relay for any form of jit, we must be able to 
interleave running relay/adding more definitions to a relay module.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538155989

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Junru Shao
I see. So for mnist, there is no such issue; but for treelstm, it is true that 
we are not able to do more optimization if we don't do code replication.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538158092

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Junru Shao
Let's get back to the original topic, which is broader imo.

First of all, depending on your scenario, incremental compilation may be doable 
or not, like on edge devices where space is only allowed for tvm runtime, not 
compiler.

Then, I am actually in favor of incremental compilation, or some profiling 
guided stuff. I do think that it is inevitable for training, especially when 
people are increasingly interested in more flexible models.

However, it is not quite clear for me whether we need multiple entries. Can we 
do better than a single module? Maybe @comaniac can offer some thoughts.

BTW, in the treelstm example, i am curious why we need a 
training/dev/test-data-dependent full code replication? It seems to be possible 
to do that partially so that it could be more controllable.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538160033

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread 雾雨魔理沙
@junrushao1994 we unroll the control flow to be more efficient.
maybe multiple module can work, but there cant be code sharing between multiple 
ones.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538178832

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread Junru Shao
my point is that we don't have to do full data-dependent unrolling, but unroll 
a deterministic 4 steps or 10 steps to make it data independent

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538179490

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

2019-10-03 Thread 雾雨魔理沙
@junrushao1994 it is only possible for lstm. for treelstm if you do it it will 
blow up exponentially, and lots of time will be spend on testing the match on 
all the cases - unless some tricks are used (for example, a decision tree for 
pattern matching instead of linear)
still, this doesnt allow batching for treelstm

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538183244