I've been exploring quantization in TVM, and one thing that I found that on the CPU there is a special compute/schedule for running int8 conv2d on the CPU ([see here](https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/conv2d_int8.py#L132)). From what I can tell, it seems to be pretty much the same as standard CPU spatial pack convolution.
To explore this, I tried disabling this special compute/schedule, and let the quantized model use the standard spatial pack algorithm (just running quantized). When I do this, I see an expected slowdown compared to the specialized version, however I see an unexpected slow down compared to the `float32` version of the same algorithm. For [a simple example](https://gist.github.com/Wheest/42df546cedf084eaf8a4206c19a273b4) I get the following results: ``` default int8: 7.529054908081889 modified int8: 23.42591354623437 normal float32: 11.465726513415575 ``` (Disabling the algorithm is very simple: just comment out the if block that checks for int8 [here](https://github.com/apache/tvm/blob/70884e957aa5c8de9c02c25a14d30563d7300cb9/python/tvm/relay/op/strategy/x86.py#L117)). My main question is why am I seeing a slowdown using the standard convolution approach? Surely the operations would be the same, just using integers? And on most CPUs, that would take fewer clock cycles. Where would that overhead be coming from? I would assume that the specialised compute/schedule would better exploit the quantization (e.g. the fact you can get more values into SIMD). However that still doesn't explain why `modified` is slower than `normal`. --- [Visit Topic](https://discuss.tvm.apache.org/t/intution-on-why-this-int8-algorithm-is-slower/12920/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/84119296e33c61c329c0be00fe7ad489f5f67502be9b21b8b80719992b46c98a).