[quote="anijain2305, post:20, topic:9775"] I am trying to understand why we need `qnn.conv2d*` (* represents operator along the lines of `qnn.simulated_conv2d`) during calibration. The only reason would be if you want to propagate the error from previous operators while **calibrating** current conv2d operator [/quote]
We do want to support propogating error from previous operators while calibrating the current conv2d operator. Additionally, since `qnn.simulated_quantize` does actually move the data into affine space, `qnn.simulated_quantize -> nn.conv2d -> qnn.simulated_dequantize` is actually incorrect, since `nn.conv2d` doesn't take non-zero zero points into account. And, since we will eventually extend QNN to support multiple dtypes anyways, it's not that much effort to add fp32 as a dtype. [quote="anijain2305, post:20, topic:9775"] then we need only `qnn.simulated_quantize` and `qnn.simulated_dequantize` to calculate the quantization error at the current operator. Is my understanding correct? [/quote] I'm not sure I understand what you're saying here. Like I said above, if we do simulated quantization instead of fake quantization, then we need to take zero points into account for every op that's in affine space. Were you thinking we'd do something like this: `qnn.simulated_quantize -> qnn.simulated_dequantize -> nn.conv2d -> qnn.simulated_quantize -> qnn.simulated_dequantize`. (ie we'd use the simulated quantize ops do fake quantization?) [quote="anijain2305, post:20, topic:9775"] @electriclilies @matt-arm This is somewhat tangential but I wanted to understand more. Suppose, we extend the qnn.conv2d to qnn.conv2d* that supports simulation during calibration. So, we have a pattern, `qnn.simulated_quantize` → `qnn.conv2d*` → `qnn.simulated_dequantize`. What are the input scales and zero points of `qnn.conv2d*`? IIUC, they should be equal to the `qnn.simulated_quantize` operator at the inputs of `qnn.conv2d*`. If that is true, once we finish calibration, can we use this graph for BYOC? [/quote] I think that yes, that graph could be used for BYOC if the BYOC people want. However, that graph will still have some ops in real space that the BYOC people would need to transform into affine space, whereas the output of our final rewrite will be completely in affine space. It's not clear to me whether it's easier to transform real Relay ops into affine-space BYOC or affine-space Relay ops into BYOC. --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/22) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/7010777ff8fc5a7db45b4d690f04e42a19c9efa102aefe1ff30875ffd65a946e).