Hey Chris,

The two extensible bits will be done through user defined callable functions.

For the green list/gray list/red list situation, we have the user define a 
function which given a Call node, returns the color of the operation. For the 
initial implementation we will just do a naive solution like placing all 
conv2d's in the green list, all elementwise in the graylist, etc.

For the accumulation datatype, I imagine a user defined function which given a 
Call node, returns the accumulation datatype and the output datatype of the 
operation. The accumulation datatype is self explanatory and confusingly maps 
to the existing "output_dtype" field in existing relay ops like conv and dense. 
Our "new" output datatype meanwhile for example tells what precision other 
operations will ingest the results of the operation at:

weight (fp16 or fp32), data (fp16 or fp32) --> conv2d (accumulation_dtype) -> 
cast(output_dtype). 

If the accumulation_dtype == output_dtype then we don't need the cast.

Finally, to answer your question, in the scenario given we would simply express 
conv2d as an operator with an accumulation_dtype of fp32 and an output_dtype of 
fp16. This should give the final graph listed (don't know about the operator 
fusion part though to be honest, not sure if all the knobs on the fused 
operator are there, if not guess I have to do something about that too). In a 
sense we do have separate "accumulator_dtype" and "output_dtypes" then the user 
can define on a per-operation basis. 

I hope that answers your question and I hope it is sufficient for most 
applications! For the default I am going to do something simple like define all 
operators which support accumulation datatypes separate from the input 
datatypes accumulate into fp32 but output into fp16. Otherwise, we assume it 
accumulates in fp16 and outputs fp16 (e.g. for elementwise operators).

There are downsides with this simplistic method. One downside is the only sort 
of analysis that is easy with this framework looking at the current operator. 
That is to say, it's kind of cumbersome to look ahead and backward to make 
decisions. There are some other theoretical limitations in the graphs it can 
easily generates but I think it covers most reasonable scenarios! \





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-relay-fp32-fp16-model-support/9994/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/8cbd80bcf0fd9d26220b9ab517b698049b2e331d117d32d0b95dff0bb201cc71).

Reply via email to