https://bugs.kde.org/show_bug.cgi?id=433801

--- Comment #5 from Julian Seward <jsew...@acm.org> ---
(In reply to Carl Love from comment #1)
> Created attachment 136291 [details]
> Reduced-Precision - bfloat16 Outer Product &   Format Conversion Operations

+static Float conv_bf16_to_float( UInt input )
+{
..
+     output is 64-bit float.
+     bias +127, exponent 8-bits, fraction 22-bits

Is this comment correct?  1 sign bit + 8 exponent bits + 22 mantissa bits
looks much more like a 32-bit float than a 64-bit float.

--

Is there an inconsistency in naming these functions?  It appears that in
some places, a 32-bit float is called `_float`  in the name, but in others
it is called `_f32`.  Eg.

+static Float conv_bf16_to_float( UInt input )
vs
+static UInt conv_f32_to_bf16( UInt input )

Can you either fix the inconsistencies (if they exist) and/or also add a
comment at the top explaining the naming?

---

+ULong convert_from_f32tobf16_helper( ULong src ) {

In this file, either mark functions as 'static' or add a comment saying they
are called from generated code.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to