Hi Richard, The base idea of the patch is to optimize for the (common) situation where FTZ/DAZ is controlled by a CPU-wide flag and we then need to only avoid compile-time optimizations that assume semantics where denorm handling is on to support the ‘forced FTZ/DAZ semantics’.
> This suggests only outputs are flushed to zero? OTOH documentation > for X * 1 -> X suggests otherwise. This simplification also suggests to > make FTZ operations explicit instead of adding a flag? Thus the BRIG > FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we > could eventually add a pass optimizing FTZ operations? Both the inputs and outputs must be flushed to zero in the HSAIL’s ‘ftz’ semantics. FTZ operations were previously always “explicit” in the BRIG FE output, like you propose here; there were builtin calls injected for all inputs and the output of ‘ftz’-marked float HSAIL instructions. This is still provided as a fallback for targets which do not support a CPU mode flag. The problem with a special FTZ ‘operation’ of some kind in the generic output is that the basic optimizations get confused by a new operation and we’d need to add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer code, which seems unnecessary to support this case as the optimizations typically apply also for the ‘FTZ semantics’ when the FTZ/DAZ flag is on. Thanks, Pekka