On Fri, 8 Jul 2022 06:11:22 GMT, Joe Darcy <da...@openjdk.org> wrote:
> Initial implementation. src/java.base/share/classes/java/lang/Float.java line 1044: > 1042: } > 1043: > 1044: assert -14 <= bin16Exp && bin16Exp <= 15; assert -15 < bin16Exp && bin16Exp < 16; is perhaps more readable because the code above uses -15 and 16: less mental calculation at no runtime costs ;-) src/java.base/share/classes/java/lang/Float.java line 1056: > 1054: // formats > 1055: (bin16SignifBits << (FloatConsts.SIGNIFICAND_WIDTH - > 11))); > 1056: return sign * Float.intBitsToFloat(result); int result = (floatExpBits | // Shift left difference in the number of // significand bits in the float and binary16 // formats (bin16SignifBits << (FloatConsts.SIGNIFICAND_WIDTH - 11))); avoids a useless `|` operation src/java.base/share/classes/java/lang/Float.java line 1090: > 1088: public static short floatToBinary16AsShortBits(float f) { > 1089: if (Float.isNaN(f)) { > 1090: // Arbitrary binary16 NaN value; could try to preserve the // Arbitrary binary16 quiet NaN value; could try to preserve the src/java.base/share/classes/java/lang/Float.java line 1100: > 1098: > 1099: // The overflow threshold is binary16 MAX_VALUE + 1/2 ulp > 1100: if (abs_f > (65504.0f + 16.0f) ) { if (abs_f >= (65504.0f + 16.0f) ) { Value exactly halfway must round to infinity. src/java.base/share/classes/java/lang/Float.java line 1124: > 1122: // 2^(-125) -- since (-125 = -149 - (-24)) -- so that > 1123: // the trailing bits of a subnormal float represent > 1124: // the correct trailing bits of a binary16 subnormal. I would write intervals (ranges) in the form `[low, high]`, so `[-24, -15]` and `[-149, -140]`. src/java.base/share/classes/java/lang/Float.java line 1127: > 1125: exp = -15; // Subnormal encoding using -E_max. > 1126: float f_adjust = abs_f * 0x1.0p-125f; > 1127: signif_bits = (short)(Float.floatToRawIntBits(f_adjust) > & 0x03ff); I think the `if` and the `exp++` can be avoided if the `& 0x03ff` is dropped altogether. signif_bits = (short)Float.floatToRawIntBits(f_adjust); The reason is the same as for the normalized case below: a carry will eventually flow into the representation for the exponent. src/java.base/share/classes/java/lang/Float.java line 1141: > 1139: > 1140: // Significand bits as if using rounding to zero > (truncation). > 1141: signif_bits = (short)((doppel & 0x0007f_e000) >> signif_bits = (short)((doppel & 0x007f_e000) >> or even signif_bits = (short)((doppel & 0x007f_ffff) >> 32 bit hex are more readable when they have 8 hex digits src/java.base/share/classes/java/lang/Float.java line 1163: > 1161: int round = doppel & 0x00000_1000; > 1162: int sticky = doppel & 0x00000_0fff; > 1163: int lsb = doppel & 0x0000_2000; int round = doppel & 0x0000_1000; int sticky = doppel & 0x0000_0fff; As above, these are 32 bit hex constants and should have at most 8 hex digits. src/java.base/share/classes/java/lang/Float.java line 1166: > 1164: if (((lsb == 0) && (round != 0) && (sticky != 0)) || > 1165: ( lsb != 0 && round != 0 ) ) { // sticky not > needed > 1166: // Due to the representational properties, an if (round != 0 && (sticky != 0 || lsb != 0)) { is more succinct. src/java.base/share/classes/java/lang/Float.java line 1174: > 1172: > 1173: short result = 0; > 1174: result = (short)(((exp + 15) << 10) | signif_bits); result = (short)(((exp + 15) << 10) + signif_bits); The final exponent needs to be incremented when `signif_bits == 0x400`. The `|` is not enough for this to happen. src/java.base/share/classes/java/lang/Float.java line 1175: > 1173: short result = 0; > 1174: result = (short)(((exp + 15) << 10) | signif_bits); > 1175: return (short)(sign_bit | (0x7fff & result)); return (short)(sign_bit | result); because `result <= 0x7fff`. ------------- PR: https://git.openjdk.org/jdk/pull/9422