It looks like transformer like models have many `softmax` ops that introduce a lot of casting before / after them, like https://gist.github.com/masahi/0d7d96ae88722b616a906cec2054559e#file-transformer-txt-L137-L143
The fact that softmax and the following cast to fp16 are not fused surprised me. This is because the op pattern for softmax is kOpaque, https://github.com/apache/tvm/blob/66ac4705aae9bec92047920c8a9273693cd48c44/python/tvm/relay/op/nn/_nn.py#L42. The cast overheads are big if they are not fused, so we are leaving a lot of perf on the table. @yzhliu Is there a reason softmax op pattern cannot be `OUT_ELEMWISE_FUSABLE`? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm/issues/8296#issuecomment-904568003