Thanks for this much needed contribution!

Can you elaborate on the design you imagine for

> there needs to be some control over the output data types for converted 
> operations. Some FP16 operations might accumulate results into FP32 for 
> numerical reasons and some might produce an FP16 number. In the rewrite of 
> the graph, we will provide some control over this variable.

In some edge use cases, it is desirable for all parameters to be stored as fp16 
to limit storage footprint. In that context, take the following graph as an 
example,

```
conv2d -> multiply -> bias_add -> relu -> max_pool

Greenlist{conv2d, max_pool}
Graylist{elemwise}
```
If the conv2d should accumulate in fp32, but the consecutive elemwise operators 
should run in fp16, how will a user express this? In this case I would expect 
final graph to be, 

[fp16] -> conv2d -> [fp32] -> cast(fp16) -> multiply -> bias_add -> relu -> 
max_pool

fused_conv2d_cast_multiply_bias_add_relu -> max_pool

An alternative option would be to add an accumulator_dtype separate from 
output_dtype to various operators and rewrite based on that field. Both can 
work but I'd like to hear more on how you envision to do this with the mixed 
precision transform in the above context. Thanks!





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-relay-fp32-fp16-model-support/9994/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/bf0ab97d2429e1182604109cd7a15fca79fdf8d920e8b83eaed0593162595086).

Reply via email to