FP16 Model Support

Andrew Zhao Luo via Apache TVM Discuss Fri, 21 May 2021 23:20:25 -0700


Hey Chris,

The two extensible bits will be done through user defined callable functions.

For the green list/gray list/red list situation, we have the user define a
function which given a Call node, returns the color of the operation. For the
initial implementation we will just do a naive solution like placing all
conv2d's in the green list, all elementwise in the graylist, etc.

For the accumulation datatype, I imagine a user defined function which given a
Call node, returns the accumulation datatype and the output datatype of the
operation. The accumulation datatype is self explanatory and confusingly maps
to the existing "output_dtype" field in existing relay ops like conv and dense.
Our "new" output datatype meanwhile for example tells what precision other
operations will ingest the results of the operation at:

weight (fp16 or fp32), data (fp16 or fp32) --> conv2d (accumulation_dtype) ->
cast(output_dtype).

If the accumulation_dtype == output_dtype then we don't need the cast.

Finally, to answer your question, in the scenario given we would simply express
conv2d as an operator with an accumulation_dtype of fp32 and an output_dtype of
fp16. This should give the final graph listed (don't know about the operator
fusion part though to be honest, not sure if all the knobs on the fused
operator are there, if not guess I have to do something about that too). In a
sense we do have separate "accumulator_dtype" and "output_dtypes" then the user
can define on a per-operation basis.

I hope that answers your question and I hope it is sufficient for most
applications! For the default I am going to do something simple like define all
operators which support accumulation datatypes separate from the input
datatypes accumulate into fp32 but output into fp16. Otherwise, we assume it
accumulates in fp16 and outputs fp16 (e.g. for elementwise operators).

There are downsides with this simplistic method. One downside is the only sort
of analysis that is easy with this framework looking at the current operator.
That is to say, it's kind of cumbersome to look ahead and backward to make
decisions. There are some other theoretical limitations in the graphs it can
easily generates but I think it covers most reasonable scenarios! \

---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-relay-fp32-fp16-model-support/9994/3)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/8cbd80bcf0fd9d26220b9ab517b698049b2e331d117d32d0b95dff0bb201cc71).

[Apache TVM Discuss] [Development/RFC] [RFC][Relay] FP32 -> FP16 Model Support

Reply via email to