[Apache TVM Discuss] [Development/pre-RFC] [pre-RFC] [Relay External Library] Intel OneDNN Conv2D Kernel Integration

error_Customshop via Apache TVM Discuss Tue, 21 Dec 2021 00:33:05 -0800


## Summary

We noticed that users can only benefit from Intel OneDNN kernel in relay
nn.dense op which implemented through OneDNN matmul by assigning
**"-libs=mkldnn"** in target. And it still lacks of several commonly used
kernel such as Conv2D and Pooling. Here we proposed this RFC to enrich the OP
Strategy for X86 Conv2D kernels which implemented by Intel OneDNN.

## Motivation

TVM performs as an E2E ML compiler stack for CPUs, GPUs and accelerators,
people can benefit from its performance and convenience on ML models
development. AutoShcheduler helps searching for a satisfactory states for
different workloads, but it usually takes longer time. Users sometimes can
achieve better performance when utilize a third party kernel library. Assigning
a target flag like "-libs=mkldnn" seems to be the most easy way to guide Relay
Op strategy to map to those 3rd party kernels, and it's also a flexible way for
silicon vendor to integrate their high performance kernels.

Intel OneDNN is an open-source cross-platform performance library of basic
building blocks for deep learning applications. Using target flag could guide
TVM to use external library with fully leverage the capability of Relay graph
optimization. Additionally, using plain date format (NHWC) could naturally
eliminating the overhead of layout transformation, that's why we can see that
plain data format(NHWC) already been integrated in TF and Pytorch.

In order to demonstrate the performance, we have some trials in different
models. BTW. we've also observed that OneDNN Conv2d kernel achieves better
performance when the layout format in NHWC, so we add it in TVM and enabled in
Relay Op Strategy which only contains nn.dense kernel now. We've compared it
with AutoScheduler on Resnet and Unet models in Intel Xeon Platinum 8352Y, both
of them were run in NHWC format, seems it outperforms AutoSchedule in varying
cores.

![image-20211217154012982|690x389,
100%](upload://7Exkco0puIxameGeVDEw6L5Tbxv.png)
OneDNN version : V2.4.0

## Proposal

This proposal mainly focused on integrating oneDNN OP implementation and
mapping it in Relay Op Strategy. we've benched Conv2D kernel in NHWC format.
and we are going to bench more oneDNN kernels including different format and
datatypes in future.

* Add OneDNN conv2d kernel in NHWC format.
* Add 'target.libs=mkldnn' branch in Relay X86 OP strategy for NHWC Conv2D
kernel.
* Add OneDNN primitive cache to improve the performance. and let OneDNN
adaptively chose best format for weights. (Cause current OneDNN version still
only support persistent primitive cache for GPU/FPGA engine).

---
[Visit
Topic](https://discuss.tvm.apache.org/t/pre-rfc-relay-external-library-intel-onednn-conv2d-kernel-integration/11749/1)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/4f2db40a275c0a68ece21d6061d4b905defedc106a083b36973c382bc72c30de).

[Apache TVM Discuss] [Development/pre-RFC] [pre-RFC] [Relay External Library] Intel OneDNN Conv2D Kernel Integration

Reply via email to