Yeah the issue behind creating defaults is that we cannot create defaults that 
work best for every situation. This is especially true since whenever we want 
speed we trade accuracy which can sometimes become a problem.

For the defaults I envision that for most ops we don't accumulate to FP32. For 
some ops like the global pools and sums we might turn it on. Really the best 
way to determine the criteria is to do a lot of the work you've been doing in 
trying out different models in different applications and seeing what needs to 
be turned on and off.

That being said, this is really designed to be a tool which requires the user 
sometimes to go back and modify the default values provided to either get more 
speed if their model can afford it, or accuracy if they need it. It requires 
investigation and I don't think we can probably hit all cases well. A tutorial 
here would help  (which is on my long list of TODOs).

Finally, while things are done on a per-op basis, the actual mixed precision 
function can look at some parts of the relay call like the attributes or the 
node or the input tensor sizes. Therefore we can be smart about the 
quantization (e.g. for global pooling, only accumulate in fp32 if the input to 
output reduction is large enough). Again, a tutorial or example would help 
flesh this out.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8296#issuecomment-913440596

Reply via email to