I've been exploring quantization in TVM, and one thing that I found that on the 
CPU there is a special compute/schedule for running int8 conv2d  on the CPU 
([see 
here](https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/conv2d_int8.py#L132)).
  From what I can tell, it seems to be pretty much the same as standard CPU 
spatial pack convolution.

To explore this, I tried disabling this special compute/schedule, and let the 
quantized model use the standard spatial pack algorithm (just running 
quantized).  When I do this, I see an expected slowdown compared to the 
specialized version, however  I see an unexpected slow down compared to the 
`float32` version of the same algorithm.

For [a simple 
example](https://gist.github.com/Wheest/42df546cedf084eaf8a4206c19a273b4) I get 
the following results:

```
default int8: 7.529054908081889
modified int8: 23.42591354623437
normal float32: 11.465726513415575
```

(Disabling the algorithm is very simple: just comment out the if block that 
checks for int8 
[here](https://github.com/apache/tvm/blob/70884e957aa5c8de9c02c25a14d30563d7300cb9/python/tvm/relay/op/strategy/x86.py#L117)).

My main question is why am I seeing a slowdown using the standard convolution 
approach?

Surely the operations would be the same, just using integers?  And on most 
CPUs, that would take fewer clock cycles.  Where would that overhead be coming 
from?

I would assume that the specialised compute/schedule would better exploit the 
quantization (e.g. the fact you can get more values into SIMD).  However that 
still doesn't explain why `modified` is slower than `normal`.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/intution-on-why-this-int8-algorithm-is-slower/12920/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/84119296e33c61c329c0be00fe7ad489f5f67502be9b21b8b80719992b46c98a).

Reply via email to