It looks like transformer like models have many `softmax` ops that introduce a 
lot of casting before / after them, like 
https://gist.github.com/masahi/0d7d96ae88722b616a906cec2054559e#file-transformer-txt-L137-L143

The fact that softmax and the following cast to fp16 are not fused surprised 
me. This is because the op pattern for softmax is kOpaque, 
https://github.com/apache/tvm/blob/66ac4705aae9bec92047920c8a9273693cd48c44/python/tvm/relay/op/nn/_nn.py#L42.
 The cast overheads are big if they are not fused, so we are leaving a lot of 
perf on the table.

@yzhliu Is there a reason softmax op pattern cannot be `OUT_ELEMWISE_FUSABLE`? 

  

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8296#issuecomment-904568003

Reply via email to