[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

Lily Orth-Smith via Apache TVM Discuss Mon, 26 Apr 2021 12:50:05 -0700


[quote="anijain2305, post:20, topic:9775"]
I am trying to understand why we need `qnn.conv2d*` (* represents operator 
along the lines of `qnn.simulated_conv2d`) during calibration. The only reason 
would be if you want to propagate the error from previous operators while 
**calibrating** current conv2d operator
[/quote]

We do want to support propogating error from previous operators while
calibrating the current conv2d operator.

Additionally, since `qnn.simulated_quantize` does actually move the data into
affine space, `qnn.simulated_quantize -> nn.conv2d -> qnn.simulated_dequantize`
is actually incorrect, since `nn.conv2d` doesn't take non-zero zero points into
account. And, since we will eventually extend QNN to support multiple dtypes
anyways, it's not that much effort to add fp32 as a dtype.

[quote="anijain2305, post:20, topic:9775"]
then we need only `qnn.simulated_quantize` and `qnn.simulated_dequantize` to
calculate the quantization error at the current operator. Is my understanding
correct?
[/quote]

I'm not sure I understand what you're saying here. Like I said above, if we do
simulated quantization instead of fake quantization, then we need to take zero
points into account for every op that's in affine space. Were you thinking we'd
do something like this:

`qnn.simulated_quantize -> qnn.simulated_dequantize -> nn.conv2d ->
qnn.simulated_quantize -> qnn.simulated_dequantize`.

(ie we'd use the simulated quantize ops do fake quantization?)

[quote="anijain2305, post:20, topic:9775"]
@electriclilies @matt-arm This is somewhat tangential but I wanted to
understand more. Suppose, we extend the qnn.conv2d to qnn.conv2d* that supports
simulation during calibration. So, we have a pattern, `qnn.simulated_quantize`
→ `qnn.conv2d*` → `qnn.simulated_dequantize`. What are the input scales and
zero points of `qnn.conv2d*`? IIUC, they should be equal to the
`qnn.simulated_quantize` operator at the inputs of `qnn.conv2d*`. If that is
true, once we finish calibration, can we use this graph for BYOC?
[/quote]

I think that yes, that graph could be used for BYOC if the BYOC people want.
However, that graph will still have some ops in real space that the BYOC people
would need to transform into affine space, whereas the output of our final
rewrite will be completely in affine space.

It's not clear to me whether it's easier to transform real Relay ops into
affine-space BYOC or affine-space Relay ops into BYOC.

---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-quantization-a-new-quantization-framework-in-tvm-initial-rfc-1-4/9775/22)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/7010777ff8fc5a7db45b4d690f04e42a19c9efa102aefe1ff30875ffd65a946e).

[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

Reply via email to