Hi,
I'm trying to use TVM's stack to deploy INT8-quantized Transformer-based models. I tried Relay + Ansor(AutoScheduler) for a Transformer (# layers = 1) and the results weren't so neat. #### |Time (ms)|Original|Quantized| | --- | --- | --- | |PyTorch|20|--| |TVM (Relay, optimized)|130|120| |TVM (Relay, optimized), Ansor (it=20k)|17|44| * (# of runs) = 100 * the stdev was very small. In your opinion, what'd be the best for the next steps? Could you recommend a good starting point or useful references for them? Thanks, --- [Visit Topic](https://discuss.tvm.apache.org/t/quantized-transformer/11850/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/57a442d2ea86a387b4b6ccd2f19d172fb3421e51c7fe6f06bd8c4421fddad174).