Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-31 Thread 孙敏敏
Opened PR #4234 for the re-implementation of our solution based on tensor intrinsic. Many thanks to @Hzfengsy for his valuable suggestions and close collaboration with us on this. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on Git

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-31 Thread 孙敏敏
> I have a proposal to minimize the invasion in TVM and also fundamentally > support TensorCore in TVM. This is in the middle of both methodology of #4052 > and this RFC. > I suppose the current pain point of supporting TensorCore is the data > structure provided by NVIDIA, which introduces non-

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-18 Thread Jian Weng
I have a proposal to minimize the invasion in TVM and also fundamentally support TensorCore in TVM. This is in the middle of both methodology of #4052 and this RFC. I suppose the current pain point of supporting TensorCore is the data structure provided by NVIDIA, which introduces non-standard b

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-14 Thread 孙敏敏
We had a meeting with @Hzfengsy today. We discussed the difference and similarity of our solutions. They are different in the front-end: our solution tries to make it as transparent as possible to make it easy-using while #4095 provides more controllability to the user (schedule developer). They

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
Thanks @tqchen and @Hzfengsy for your valuable feedbacks. We are trying out some of your suggestions. Will have further discussions with you after we have made some evaluations and trials. > As we know using TensorCores will decrease precision. So, NVIDIA set up a > switch to turn on and off Te

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Orion34C
> * It shocks me that your solution is even faster than CUBLAS and CUDNN. I try > to reproduce the result but fails. Did you use BatchMatMul and BatchConv? And > which GPU did you test on? Could you show me the details about the > performance? > Our fp16 TensorCore kernel are tuned on V100 with

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Andrew Tulloch
This is really impressive work, congrats! -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541259191

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Siyuan Feng
Thank you for the RFC. It is complete TensorCore support. It is nice that you can support different types and different data layouts, which is not supported in my solution currently. ## Lower Passes vs Intrinsic Intrinsic is a tool for describing what instructions can be done in specific hardwa

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Tianqi Chen
Thanks for the RFC, also cross link to https://github.com/dmlc/tvm/issues/4052. ## Non standard buffer allocation We are moving toward using special memory scopes to annotate the special memory(e.g. mma). The use of ```new_expr``` was convenient, but never the less a bit too close to low level

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Jun Yang
> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, > what kind of tuning is using? Good point! We do have some internal discussions about whether we need to automatically search the schedule space based on performance between TensorCore and non-TensorCore kernel, sin

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
> Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, > what kind of tuning is using? We haven’t spent much effort on performance tuning yet. For cases with bad performance we plan to do profiling to figure out the causes firstly. One possible way of optimization is to m

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread Bing Xu
Awesome solution! Just curios: for shapes which are worse than cudnn/cublas, what kind of tuning is using? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-541014088

Re: [dmlc/tvm] [RFC] Auto TensorCore CodeGen (#4105)

2019-10-11 Thread 孙敏敏
#4052 @Hzfengsy -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4105#issuecomment-540978699