Yeah the issue behind creating defaults is that we cannot create defaults that work best for every situation. This is especially true since whenever we want speed we trade accuracy which can sometimes become a problem.
For the defaults I envision that for most ops we don't accumulate to FP32. For some ops like the global pools and sums we might turn it on. Really the best way to determine the criteria is to do a lot of the work you've been doing in trying out different models in different applications and seeing what needs to be turned on and off. That being said, this is really designed to be a tool which requires the user sometimes to go back and modify the default values provided to either get more speed if their model can afford it, or accuracy if they need it. It requires investigation and I don't think we can probably hit all cases well. A tutorial here would help (which is on my long list of TODOs). Finally, while things are done on a per-op basis, the actual mixed precision function can look at some parts of the relay call like the attributes or the node or the input tensor sizes. Therefore we can be smart about the quantization (e.g. for global pooling, only accumulate in fp32 if the input to output reduction is large enough). Again, a tutorial or example would help flesh this out. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm/issues/8296#issuecomment-913440596