Dear Thomas,thank your for the excellent approach for solving the problem, finding the closest smaller or greater floating point value to a given one.
We had an descent discussion about the same topic on "LazarusForum.de"
https://www.lazarusforum.de/viewtopic.php?f=10&t=16019&hilit=kleinste&start=15 where your solution comes up.I packed your code into a working unit, so that it may be easier to include into other projects.
While performing an intense testing, I found two minor bugs. The first one is, that the const MinDouble from the unit math.pasis not the smallest Double which could be generated. Thus the code will bypass all values between MinDouble (2.2250738585072014e-308) and 4.9406564584124654E-324
The second one is that the usage of the const MaxSingle, will not work as expected. The compiler will obviously convert MaxSingle into a Double value and subsequently calling the related Double functions, not as expected the Single function.
Example: CopySign (MaxSingle, Start) will call function CopySign (x: Double; y: Double): Double; and not function CopySign (x: Single; y: Single): Single; This will luckily not lead into errors. In the function NextAfter if (toZero) then begin (...) end else begin // we are increasing the magnitude, toward +-Infinity if (Start = 0.0) then Exit (CopySign (MinSingle, Direction)); if (absStart = MaxSingle) then Exit (CopySign (Infinity, Start)); Exit (CopySign (LongBitsToSingle(Succ(SingleToLongBits(absStart))), Start)); end; this conversion leads to an unwanted behaviour in the line: if (absStart = MaxSingle) then Exit (CopySign (Infinity, Start));since the term (absStart = MaxSingle) will never become true, even if NextAfter is called with a value containing MaxSingle. The reason is
that Single(MaxSingle) <> MaxSingleSurprisingly the function is working and delivers the correct result, since the following line
CopySign (LongBitsToSingle(Succ(SingleToLongBits(absStart))),Start)); executed absStart containing MaxSingle, will deliver Infinity as expected. So I decided to remove this line.The unit contains some new defined consts and uses them to bypass the described problems
MinMinDouble = 4.9406564584124654E-324; SingleMaxSingle : Single = Single(MaxSingle); NegSingleMaxSingle : Single = Single(-MaxSingle);I attached the unit, which is also slightly modified due to my different pascal formatting style.
Thank you very much for sharing. Best regards Ekkehard Domning
{ Unit UNextFloat a very close adaption of the Source found at https://www.mail-archive.com/fpc-pascal@lists.freepascal.org/msg55419.html The two mainfunction are NextAfter and ULP NextAfter is an replacement for the nextafter function found in the "msvcrt.dll" https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/nextafter-functions?view=msvc-170 It returns the next closest float (single or double) value to a Start-value to the Direction-value ULP is "unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit (rightmost digit) represents if it is 1." https://en.wikipedia.org/wiki/Unit_in_the_last_place -=-=-=-Start of Page excerpt =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [fpc-pascal] Functions for dealing with floating-point precision Thomas Kurz via fpc-pascal Sun, 19 Jun 2022 13:38:48 -0700 Hello, in my program I have need for checking floating-point precision. I'm internally using floating-points for calculations, but in the end I have to use integer numbers. I cannot use Round() because I have to check for thresholds. I.e. that I wish to accept a value of 1.9999.... as being ">=2". As you can see, I cannot use Trunc() either, so I decided to check for the difference in ULPs (unit in the last place). I couldn't find the corresponding functions in the RTL, but I think they should be included. I wrote the functions listed below. I would appreciate if the maintainers of FPC decide to add them to the Math unit. I skipped the functions for the Extended type, but if it's being desired, I can add hand them in later. Kind regards, Thomas -=-=-=-End of Page excerpt =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Version Date Who Change 0.0.1 2024-09-05 EDo First light 0.0.2 2024-09-05 EDo Bugfix MinDouble, introducing MinMinDouble = 4.9406564584124654E-324 0.0.3 2024-09-06 EDo Bugfix in usage of MaxSingle (see comments below) Introducing two typed consts SingleMaxSingle and NegSingleMaxSingle to ease the usage. } unit unextfloat; interface uses Math; // CAUTION NextAfter for single values is not compiled into the code if e.g // NextAfter(MaxSingle,-MaxSingle) // is written. In this case the compiler uses the overloaded Double version! // To ensure the usage of the single parameter function // always cast MaxSingle to Single (Single(MaxSingle), Single(-MaxSingle)) // or use the below defined SingleMaxSingle and NegSingleMaxSingle constants. function NextAfter(const Start: Single; const Direction: Single): Single; function ULP(Argument: Single): Single; {$ifdef FPC_HAS_TYPE_DOUBLE} function NextAfter(const Start: Double; const Direction: Double): Double; function ULP(Argument: Double): Double; {$endif FPC_HAS_TYPE_DOUBLE} // Other functions, probably only needed internal function SingleToLongBits(const Argument: Single): UInt32; function LongBitsToSingle(const Argument: UInt32): Single; // returns the first floating-point argument with the sign of the second floating-point argument function CopySign(const x: Single; const y: Single): Single; function IsSameSign(const x: Single; const y: Single): Boolean; function FloatPrior(const Argument: Single): Single; function FloatNext(const Argument: Single): Single; {$ifdef FPC_HAS_TYPE_DOUBLE} function DoubleToLongBits(const Argument: Double): UInt64; function LongBitsToDouble(Argument: UInt64): Double; // returns the first floating-point argument with the sign of the second floating-point argument function CopySign(const x: Double; const y: Double): Double; function IsSameSign(const x: Double; const y: Double): Boolean; function FloatPrior(const Argument: Double): Double; function FloatNext(Argument: Double): Double; {$endif FPC_HAS_TYPE_DOUBLE} const MinMinDouble = 4.9406564584124654E-324; MinMinSingle = 1.40129846E-45; // MaxSingle failed in equal comparations! // singlevalue := MaxSingle; // equal := (singlevalue = MaxSingle); // will give equal = false !! // This is caused obviously by converting MaxSingle into a double value, // by the compiler, prior the comparism // coding // equal := (singlevalue = Single(MaxSingle)); // will give equal = true!! // To bypass the casting two const are introduced to be used // instead MaxSingle and -MaxSingle SingleMaxSingle : Single = Single(MaxSingle); NegSingleMaxSingle : Single = Single(-MaxSingle); implementation const MAX_ULP_SINGLE = 2.028240960E+31; const MAX_ULP_DOUBLE = 1.9958403095347198E+292; function SingleToLongBits(const Argument: Single): UInt32; var argResult : Cardinal absolute Argument; begin Result := ArgResult; end; {$ifdef FPC_HAS_TYPE_DOUBLE} function DoubleToLongBits(const Argument: Double): UInt64; var argResult : UInt64 absolute Argument; begin Result := argResult; end; {$endif FPC_HAS_TYPE_DOUBLE} function LongBitsToSingle(const Argument: UInt32): Single; var argResult : Single absolute Argument; begin Result := argResult; end; {$ifdef FPC_HAS_TYPE_DOUBLE} function LongBitsToDouble(Argument: UInt64): Double; var argResult : Double absolute Argument; begin Result := argResult; end; {$endif FPC_HAS_TYPE_DOUBLE} // returns the first floating-point argument with the sign of the second floating-point argument function CopySign(const x: Single; const y: Single): Single; begin Result := (Abs(x) * Sign(y)); end; {$ifdef FPC_HAS_TYPE_DOUBLE} // returns the first floating-point argument with the sign of the second floating-point argument function CopySign(const x: Double; const y: Double): Double; begin Result := (Abs(x) * Sign(y)); end; {$endif FPC_HAS_TYPE_DOUBLE} function IsSameSign(const x: Single; const y: Single): Boolean; begin Result := (CopySign(x, y) = x); end; {$ifdef FPC_HAS_TYPE_DOUBLE} function IsSameSign(const x: Double; const y: Double): Boolean; begin Result := (CopySign(x, y) = x); end; {$endif FPC_HAS_TYPE_DOUBLE} function NextAfter(const Start: Single; const Direction: Single): Single; var absStart: Single; absDir: Single; toZero: Boolean; begin // If either argument is a NaN, then NaN is returned. if IsNan(Start) or IsNan(Direction) then Exit(NaN); // If both arguments compare as equal the second argument is returned. if (Start = Direction) then Exit(Direction); absStart := Abs(Start); absDir := Abs(Direction); toZero := not IsSameSign(Start, Direction) or (AbsDir < absStart); if (toZero) then begin // we are reducing the magnitude, going toward zero. if (absStart = MinMinSingle) then Exit(CopySign(0.0, Start)); if IsInfinite(absStart) then Exit(CopySign(SingleMaxSingle, Start)); Exit(CopySign(LongBitsToSingle(Pred(SingleToLongBits(absStart))), Start)); end else begin // we are increasing the magnitude, toward +-Infinity if (Start = 0.0) then Exit(CopySign(MinMinSingle, Direction)); { // This comparism does not work // if (absStart = MaxSingle) then // to get it work it must be coded if (absStart = SingleMaxSingle) then Exit(CopySign(Infinity, Start)); // But since the next Value after MaxSingle is Infinity, the whole comparism is not needed! } Exit(CopySign(LongBitsToSingle(Succ(SingleToLongBits(absStart))), Start)); end; end; {$ifdef FPC_HAS_TYPE_DOUBLE} function NextAfter(const Start: Double; const Direction: Double): Double; var absStart: Double; absDir: Double; toZero: Boolean; begin // If either argument is a NaN, then NaN is returned. if IsNan(Start) or IsNan(Direction) then Exit(NaN); // If both arguments compare as equal the second argument is returned. if (Start = Direction) then Exit(Direction); absStart := Abs(Start); absDir := Abs(Direction); toZero := not IsSameSign(Start, Direction) or (AbsDir < AbsStart); if (toZero) then begin // we are reducing the magnitude, going toward zero. if (absStart = MinMinDouble) then Exit(CopySign(0.0, Start)); if IsInfinite(absStart) then Exit(CopySign(MaxDouble, Start)); Exit(CopySign(LongBitsToDouble(Pred(DoubleToLongBits(absStart))),Start)); end else begin // we are increasing the magnitude, toward +-Infinity if (Start = 0.0) then Exit(CopySign(MinMinDouble, Direction)); if (AbsStart = MaxDouble) then Exit(CopySign(Infinity, Start)); Exit(CopySign(LongBitsToDouble(Succ(DoubleToLongBits(absStart))), Start)); end; end; {$endif FPC_HAS_TYPE_DOUBLE} function FloatPrior(const Argument: Single): Single; begin Result := NextAfter(Argument, NegInfinity); end; function FloatNext(const Argument: Single): Single; begin Result := NextAfter(Argument, Infinity); end; {$ifdef FPC_HAS_TYPE_DOUBLE} function FloatPrior(const Argument: Double): Double; begin Result := NextAfter(Argument, NegInfinity); end; {$endif FPC_HAS_TYPE_DOUBLE} {$ifdef FPC_HAS_TYPE_DOUBLE} function FloatNext(Argument: Double): Double; begin Result := NextAfter(Argument, Infinity); end; {$endif FPC_HAS_TYPE_DOUBLE} function ULP(Argument: Single): Single; begin // If the argument is NaN, then the result is NaN. if IsNaN(Argument) then Exit(NaN); // If the argument is positive or negative infinity, then the result is positive infinity. if IsInfinite(Argument) then Exit(Infinity); // If the argument is positive or negative zero, then the result is Single.MIN_VALUE. if (Argument = 0.0) then Exit(MinMinSingle); Argument := Abs(Argument); // If the argument is ±Single.MAX_VALUE, then the result is equal to 2^104. if (Argument = SingleMaxSingle) then Exit(MAX_ULP_SINGLE); Result := FloatNext(Argument) - Argument; end; {$ifdef FPC_HAS_TYPE_DOUBLE} function ULP(Argument: Double): Double; begin // If the argument is NaN, then the result is NaN. if IsNaN(Argument) then Exit(NaN); // If the argument is positive or negative infinity, then the result is positive infinity. if IsInfinite(Argument) then Exit(Infinity); // If the argument is positive or negative zero, then the result is Double.MIN_VALUE. if (Argument = 0.0) then Exit(MinMinDouble); Argument := Abs(Argument); // If the argument is ±Double.MAX_VALUE, then the result is equal to 2^971. if (Argument = MaxDouble) then Exit(MAX_ULP_DOUBLE); Result := FloatNext(Argument) - Argument; end; {$endif FPC_HAS_TYPE_DOUBLE} end.
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal