[quote="anijain2305, post:20, topic:9775"]
I am trying to understand why we need `qnn.conv2d*` (* represents operator 
along the lines of `qnn.simulated_conv2d`) during calibration. The only reason 
would be if you want to propagate the error from previous operators while 
**calibrating** current conv2d operator
[/quote]

We do want to support propogating error from previous operators while 
calibrating the current conv2d operator. 

Additionally, since `qnn.simulated_quantize` does actually move the data into 
affine space, `qnn.simulated_quantize -> nn.conv2d -> qnn.simulated_dequantize` 
is actually incorrect, since `nn.conv2d` doesn't take non-zero zero points into 
account. And, since we will eventually extend QNN to support multiple dtypes 
anyways, it's not that much effort to add fp32 as a dtype. 

[quote="anijain2305, post:20, topic:9775"]
then we need only `qnn.simulated_quantize` and `qnn.simulated_dequantize` to 
calculate the quantization error at the current operator. Is my understanding 
correct?
[/quote]

I'm not sure I understand what you're saying here. Like I said above, if we do 
simulated quantization instead of fake quantization, then we need to take zero 
points into account for every op that's in affine space. Were you thinking we'd 
do something like this: ​

`qnn.simulated_quantize -> qnn.simulated_dequantize -> nn.conv2d -> 
qnn.simulated_quantize -> qnn.simulated_dequantize`.  

(ie we'd use the simulated quantize ops do fake quantization?)

[quote="anijain2305, post:20, topic:9775"]
@electriclilies @matt-arm This is somewhat tangential but I wanted to 
understand more. Suppose, we extend the qnn.conv2d to qnn.conv2d* that supports 
simulation during calibration. So, we have a pattern, `qnn.simulated_quantize` 
→ `qnn.conv2d*` → `qnn.simulated_dequantize`. What are the input scales and 
zero points of `qnn.conv2d*`? IIUC, they should be equal to the 
`qnn.simulated_quantize` operator at the inputs of `qnn.conv2d*`. If that is 
true, once we finish calibration, can we use this graph for BYOC?
[/quote]

I think that yes, that graph could be used for BYOC if the BYOC people want. 
However, that graph will still have some ops in real space that the BYOC people 
would need to transform into affine space, whereas the output of our final 
rewrite will be completely in affine space. 

It's not clear to me whether it's easier to transform real Relay ops into 
affine-space BYOC or affine-space Relay ops into BYOC.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/22)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/7010777ff8fc5a7db45b4d690f04e42a19c9efa102aefe1ff30875ffd65a946e).

Reply via email to