[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

Animesh Jain via Apache TVM Discuss Mon, 26 Apr 2021 00:49:25 -0700


I apologize for the long delay.

Thanks @electriclilies and team for nicely written RFC. I support the idea.
Reading through the comments, it seems that many of us are in agreement about
the AutoQ and its reliance on QNN extension. The mentioned pain points mostly
revolve around
* The inconsistency of QNN operators.
* Wide variety of choices one can make while quantizing a conv2d.

Therefore, to strengthen the integration of AutoQ, QNN and BYOC, we need more
consistency in QNN operators. And our auto-quantization algorithm needs to be
flexible that it can support different forms of quantization even for the same
operator (as @AndrewZhaoLuo mentioned).

The QNN operator inconsistency pain point is interesting and eye opening. I did
not know that it was so painful from BYOC perspective. I think it is inevitable
that PT/TFLite parsed quantized graphs will have some differences because of
the differences in how frameworks support different operators. But, I agree
that we must strive to keep it as consistent as possible. I like @masahi idea
to add more QNN operators (using automatic code generation maybe) to support
operators like resize, pool, relu, softmax.

A question for @electriclilies from the RFC

> 2. Extend qnn.conv2d, qnn.dense, etc. to be used with more datatypes,
> including fp32. We would also have to add an attribute to QNN specify the
> accumulation datatype used.

* I am trying to understand why we need `qnn.conv2d*` (* represents operator
along the lines of `qnn.simulated_conv2d`) during calibration. The only reason
would be if you want to propagate the error from previous operators while
**calibrating** current conv2d operator. If we calibrate in a manner that it
does not account for the error introduced by quantizing previous operators
(common in today's frameworks), then we need only `qnn.simulated_quantize` and
`qnn.simulated_dequantize` to calculate the quantization error at the current
operator. Is my understanding correct? (Just trying to understand. I will buy
the idea that propagating errors while calibration might be helpful for
aggressive quantization.)

-----

@electriclilies @matt-arm This is somewhat tangential but I wanted to
understand more. Suppose, we extend the qnn.conv2d to qnn.conv2d* that supports
simulation during calibration. So, we have a pattern, `qnn.simulated_quantize`
-> `qnn.conv2d*` --> `qnn.simulated_dequantize`. What are the input scales and
zero points of `qnn.conv2d*`? IIUC, they should be equal to the
`qnn.simulated_quantize` operator at the inputs of `qnn.conv2d*`. If that is
true, once we finish calibration, can we use this graph for BYOC?

---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/20)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/3a39d52913501cca54a3d121a5ff910a16b8910bdfcb3916d5f7314bdb199bbf).

[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

Reply via email to