I apologize for the long delay.
Thanks @electriclilies and team for nicely written RFC. I support the idea. Reading through the comments, it seems that many of us are in agreement about the AutoQ and its reliance on QNN extension. The mentioned pain points mostly revolve around * The inconsistency of QNN operators. * Wide variety of choices one can make while quantizing a conv2d. Therefore, to strengthen the integration of AutoQ, QNN and BYOC, we need more consistency in QNN operators. And our auto-quantization algorithm needs to be flexible that it can support different forms of quantization even for the same operator (as @AndrewZhaoLuo mentioned). The QNN operator inconsistency pain point is interesting and eye opening. I did not know that it was so painful from BYOC perspective. I think it is inevitable that PT/TFLite parsed quantized graphs will have some differences because of the differences in how frameworks support different operators. But, I agree that we must strive to keep it as consistent as possible. I like @masahi idea to add more QNN operators (using automatic code generation maybe) to support operators like resize, pool, relu, softmax. A question for @electriclilies from the RFC > 2. Extend qnn.conv2d, qnn.dense, etc. to be used with more datatypes, > including fp32. We would also have to add an attribute to QNN specify the > accumulation datatype used. * I am trying to understand why we need `qnn.conv2d*` (* represents operator along the lines of `qnn.simulated_conv2d`) during calibration. The only reason would be if you want to propagate the error from previous operators while **calibrating** current conv2d operator. If we calibrate in a manner that it does not account for the error introduced by quantizing previous operators (common in today's frameworks), then we need only `qnn.simulated_quantize` and `qnn.simulated_dequantize` to calculate the quantization error at the current operator. Is my understanding correct? (Just trying to understand. I will buy the idea that propagating errors while calibration might be helpful for aggressive quantization.) ----- @electriclilies @matt-arm This is somewhat tangential but I wanted to understand more. Suppose, we extend the qnn.conv2d to qnn.conv2d* that supports simulation during calibration. So, we have a pattern, `qnn.simulated_quantize` -> `qnn.conv2d*` --> `qnn.simulated_dequantize`. What are the input scales and zero points of `qnn.conv2d*`? IIUC, they should be equal to the `qnn.simulated_quantize` operator at the inputs of `qnn.conv2d*`. If that is true, once we finish calibration, can we use this graph for BYOC? --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/20) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/3a39d52913501cca54a3d121a5ff910a16b8910bdfcb3916d5f7314bdb199bbf).