@tmoreau89 Exactly! For now, we use the NCHWnc layout, the same layout with VTA.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4052#issuecomment-537816661
@yangjunpro Really happy to see another solution for TensorCore.
You are right! I just extend tvm intrinsic to support it. It does cause
programmers who write the schedule some trouble. It is not easy to write a
high-performance schedule.
I'm really curious about how to use IR passes to recogn
@Hzfengsy Sure, we will show the code as well as a sample schedule very soon.
It's being under internal review now. As you will see, the schedule for
TensorCore CodeGen looks no different than a normal matmul schedule for GPU.
Everything is done in IR passes including matrix_a/matrix_b/accumulat
I am trying to train mnist using tvm, and I hit an issue:
There is two function, loss and infer, which one calculate loss and gradient,
and one just does infer.
However, create_executor/aot/vm all only take one single entry point. If there
is multiple entry point, the passes will be called multip
Just to clarify:
> There is two function, loss and infer, which one calculate loss and gradient,
> and one just does infer.
Do you mean that there are two modes, training and inference, so that there are
multiple entry points in relay Module?
--
You are receiving this because you are subscrib
Can we have a more detailed example to help clarify this issue?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538060274
And also:
> 'compile then execute' is not enough for all the deep learning workload. For
> example, using our partial evaluator to specialize
> training/validation/testing data mean we must compile only after we had
> loaded all the data.
So in DL, common practice is that we specify the input
# TVM Monthly - September 2019
https://discuss.tvm.ai/t/tvm-monthly-september-2019
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2623#issuecomment-538074210
@junrushao1994 yes, that is what I mean, inference mode and training mode, each
mode compiled to one function.
we do the partial evaluator primarily for partially evaluating the control flow
w.r.t data. Other framework also does this, but they require manual loop
unrolling
--
You are receiving
@MarisaKirisame Do you mean you want the training loop itself to be part of
Relay?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538140726
No. suppose I have a treelstm. Now normally, I cannot do any operator
fusion/batching, because of control flow everywhere. Using the partial eval on
batches of training data individually will solve this problem.
--
You are receiving this because you are subscribed to this thread.
Reply to this
However, that require the ability to create multiple entry (one entry per
batch). If we also want to use relay for any form of jit, we must be able to
interleave running relay/adding more definitions to a relay module.
--
You are receiving this because you are subscribed to this thread.
Reply t
I see. So for mnist, there is no such issue; but for treelstm, it is true that
we are not able to do more optimization if we don't do code replication.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/
Let's get back to the original topic, which is broader imo.
First of all, depending on your scenario, incremental compilation may be doable
or not, like on edge devices where space is only allowed for tvm runtime, not
compiler.
Then, I am actually in favor of incremental compilation, or some pr
@junrushao1994 we unroll the control flow to be more efficient.
maybe multiple module can work, but there cant be code sharing between multiple
ones.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tv
my point is that we don't have to do full data-dependent unrolling, but unroll
a deterministic 4 steps or 10 steps to make it data independent
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issue
@junrushao1994 it is only possible for lstm. for treelstm if you do it it will
blow up exponentially, and lots of time will be spend on testing the match on
all the cases - unless some tricks are used (for example, a decision tree for
pattern matching instead of linear)
still, this doesnt allow
17 matches
Mail list logo