Opened PR #4234 for the re-implementation of our solution based on tensor
intrinsic. Many thanks to @Hzfengsy for his valuable suggestions and close
collaboration with us on this.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on Git
> I have a proposal to minimize the invasion in TVM and also fundamentally
> support TensorCore in TVM. This is in the middle of both methodology of #4052
> and this RFC.
> I suppose the current pain point of supporting TensorCore is the data
> structure provided by NVIDIA, which introduces non-
I have a proposal to minimize the invasion in TVM and also fundamentally
support TensorCore in TVM. This is in the middle of both methodology of #4052
and this RFC.
I suppose the current pain point of supporting TensorCore is the data structure
provided by NVIDIA, which introduces non-standard b
We had a meeting with @Hzfengsy today. We discussed the difference and
similarity of our solutions. They are different in the front-end: our solution
tries to make it as transparent as possible to make it easy-using while #4095
provides more controllability to the user (schedule developer). They
Thanks @tqchen and @Hzfengsy for your valuable feedbacks. We are trying out
some of your suggestions. Will have further discussions with you after we have
made some evaluations and trials.
> As we know using TensorCores will decrease precision. So, NVIDIA set up a
> switch to turn on and off Te
> * It shocks me that your solution is even faster than CUBLAS and CUDNN. I try
> to reproduce the result but fails. Did you use BatchMatMul and BatchConv? And
> which GPU did you test on? Could you show me the details about the
> performance?
>
Our fp16 TensorCore kernel are tuned on V100 with
This is really impressive work, congrats!
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4105#issuecomment-541259191
Thank you for the RFC. It is complete TensorCore support. It is nice that you
can support different types and different data layouts, which is not supported
in my solution currently.
## Lower Passes vs Intrinsic
Intrinsic is a tool for describing what instructions can be done in specific
hardwa
Thanks for the RFC, also cross link to https://github.com/dmlc/tvm/issues/4052.
## Non standard buffer allocation
We are moving toward using special memory scopes to annotate the special
memory(e.g. mma). The use of ```new_expr``` was convenient, but never the less
a bit too close to low level
> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas,
> what kind of tuning is using?
Good point! We do have some internal discussions about whether we need to
automatically search the schedule space based on performance between TensorCore
and non-TensorCore kernel, sin
> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas,
> what kind of tuning is using?
We haven’t spent much effort on performance tuning yet. For cases with bad
performance we plan to do profiling to figure out the causes firstly. One
possible way of optimization is to m
Awesome solution! Just curios: for shapes which are worse than cudnn/cublas,
what kind of tuning is using?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4105#issuecomment-541014088
#4052 @Hzfengsy
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4105#issuecomment-540978699
13 matches
Mail list logo