I apologize for the long delay. 

Thanks @electriclilies and team for nicely written RFC. I support the idea. 
Reading through the comments, it seems that many of us are in agreement about 
the AutoQ and its reliance on QNN extension. The mentioned pain points mostly 
revolve around
* The inconsistency of QNN operators. 
* Wide variety of choices one can make while quantizing a conv2d.

Therefore, to strengthen the integration of AutoQ, QNN and BYOC, we need more 
consistency in QNN operators. And our auto-quantization algorithm needs to be 
flexible that it can support different forms of quantization even for the same 
operator (as @AndrewZhaoLuo mentioned).

The QNN operator inconsistency pain point is interesting and eye opening. I did 
not know that it was so painful from BYOC perspective. I think it is inevitable 
that PT/TFLite parsed quantized graphs will have some differences because of 
the differences in how frameworks support different operators. But, I agree 
that we must strive to keep it as consistent as possible. I like @masahi idea 
to add more QNN operators (using automatic code generation maybe) to support 
operators like resize, pool, relu, softmax. 


A question for @electriclilies from the RFC

> 2. Extend qnn.conv2d, qnn.dense, etc. to be used with more datatypes, 
> including fp32. We would also have to add an attribute to QNN specify the 
> accumulation datatype used.

* I am trying to understand why we need `qnn.conv2d*` (* represents operator 
along the lines of `qnn.simulated_conv2d`) during calibration. The only reason 
would be if you want to propagate the error from previous operators while 
**calibrating** current conv2d operator. If we calibrate in a manner that it 
does not account for the error introduced by quantizing previous operators 
(common in today's frameworks), then we need only `qnn.simulated_quantize` and 
`qnn.simulated_dequantize` to calculate the quantization error at the current 
operator. Is my understanding correct? (Just trying to understand. I will buy 
the idea that propagating errors while calibration might be helpful for 
aggressive quantization.)

-----

@electriclilies @matt-arm This is somewhat tangential but I wanted to 
understand more. Suppose, we extend the qnn.conv2d to qnn.conv2d* that supports 
simulation during calibration. So, we have a pattern, `qnn.simulated_quantize` 
-> `qnn.conv2d*` --> `qnn.simulated_dequantize`. What are the input scales and 
zero points of `qnn.conv2d*`? IIUC, they should be equal to the 
`qnn.simulated_quantize` operator at the inputs of `qnn.conv2d*`. If that is 
true, once we finish calibration, can we use this graph for BYOC?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/20)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/3a39d52913501cca54a3d121a5ff910a16b8910bdfcb3916d5f7314bdb199bbf).

Reply via email to