Hi,

I'm trying to use TVM's stack to deploy INT8-quantized Transformer-based models.

I tried Relay + Ansor(AutoScheduler) for a Transformer (# layers = 1) and the 
results weren't so neat.
####

|Time (ms)|Original|Quantized|
| --- | --- | --- |
|PyTorch|20|--|
|TVM (Relay, optimized)|130|120|
|TVM (Relay, optimized), Ansor (it=20k)|17|44|
* (# of runs) = 100
* the stdev was very small.

In your opinion, what'd be the best for the next steps?
Could you recommend a good starting point or useful references for them?


Thanks,





---
[Visit Topic](https://discuss.tvm.apache.org/t/quantized-transformer/11850/1) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/57a442d2ea86a387b4b6ccd2f19d172fb3421e51c7fe6f06bd8c4421fddad174).

Reply via email to