Re: [fpc-pascal] math with infinity and NaN
This is compatible with IEEE-754. Section 7.2 says 7.2 Invalid operation For operations producing results in floating-point format, the default result of an operation that signals the invalid operation exception shall be a quiet NaN that should provide some diagnostic information (see 6.2). These operations are: ... b) multiplication: multiplication(0, ∞) or multiplication(∞, 0) ... d) addition or subtraction or fusedMultiplyAdd: magnitude subtraction of infinities, such as: addition(+∞, −∞) e) division: division(0, 0) or division(∞, ∞) ... g) squareRoot if the operand is less than zero You get the expected quite NaN results if you add the line SetExceptionMask([exInvalidOp, exDenormalized, exZeroDivide, exOverflow, exUnderflow, exPrecision]); to your test programs. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] math with infinity and NaN
Zitat von C Western : My first reaction is that SetExceptionMask([exDenormalized,exZeroDivide,exOverflow,exUnderflow,exPrecision]) would be required (I can't remember what the default is), but it doesn't stop the error (x86_64/linux). I suspect a bug, though I am slightly surprised it hasn't surfaced before. You forgot the most important item: exInvalidOp! At least on Win7 the code works as exptected. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] math with infinity and NaN
Quoting James Richters : SetExceptionMask(GetExceptionMask + [exInvalidOp]); Works! Thank you for the help! I'm curious why things like SQRT(-1) just produce NAN without needing to change the exception mask and (+inf) - (+inf) does not behave the same way. They both are invalid, why treat one method of generating an invalid number one way and another method of getting just as invalid of a number another way? If there is flexibility in the standard, then why not choose to always be consistent?I have a lot of examples of things that do produce NAN without needing to change the exception mask... like ln(-1) or 0/0 why do some cause the exception and others do not? The case SQRT(-1) seems to be handled by the compiler, just like the definition Nan = 0/0 in math. If you have an expression, which is not handled at compile time you get the EInvalidOp exception, try variable2 := Pi-4; variable1 := sqrt(variable2); Writeln(variable1); ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] math with infinity and NaN
Zitat von Adriaan van Os : Even with masked exceptions, runtime errors are produced in the Math unit. That is not conformant to the standard. Even with masked exInvalidOp? Can you give an example? ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] math with infinity and NaN
Zitat von James Richters : For operations producing results in floating-point format, the default result of an operation that signals the invalid operation exception shall be a quiet NaN that should provide some diagnostic information (see 6.2). If it shall be a quiet NaN doesn't that mean it would never cause the runtime error? To my understating signaling NaN raises the exception, and Quiet Nan does not.. it just quietly sets the variable to NaN and continues by default. It states that the DEFAULT result shall be this quiet NaN so if that's the case then setting SetExceptionMask([exInvalidOp]); should not be required to prevent the runtime error. The default behavior should be that it's a Quiet NaN. As far as I understand: if the exInvalidOp is masked the default result is a quiet NaN. If exInvalidOp is not masked the excption is thrown. (I do not know if a result is set to NaN and how this could be exploited). ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] math with infinity and NaN
Zitat von James Richters : The fact that it raises the exception at all makes it a signaling NaN not a quiet Nan, but they are supposed to be Quiet Nan which never throw the exception according to the specification which clearly states they are Quiet Nans, not Signaling Nans. Suppressing the exception makes them act like quiet nans, but the fact that they need suppressing of the exception makes them signaling nans. This is not correct. Signal and quit NaNs ar precisly defined: Quote from https://en.wikipedia.org/wiki/IEEE_floating_point#Interchange_formats "For NaNs, quiet NaNs and signaling NaNs are distinguished by using the most significant bit of the trailing significand field exclusively (the standard recommends 0 for signaling NaNs, 1 for quiet NaNs, so that a signaling NaNs can be quieted by changing only this bit to 1, while the reverse could yield the encoding of an infinity), and the payload is carried in the remaining bits." In the IEEE-754 standard it is "3.4 Binary interchange format encodings" You can test, that the Nan is a quiet NaN with Uses math, sysutils; var variable1, variable2:double; iv1: int64 absolute variable1; Begin SetExceptionMask([exInvalidOp, exDenormalized, exZeroDivide, exOverflow, exUnderflow, exPrecision]); variable1:= sqrt(-1); variable2 := Pi-4; variable1 := sqrt(variable2); Writeln(variable1, ' ', IntToHex(iv1, 16)); End. C:\TMP>fpc64304 xx1.pas C:\TMP>D:\FPC304\bin\i386-win32\ppcrossx64.exe xx1.pas Free Pascal Compiler version 3.0.4 [2017/10/06] for x86_64 Copyright (c) 1993-2017 by Florian Klaempfl and others Target OS: Win64 for x64 Compiling xx1.pas Linking xx1.exe 13 lines compiled, 0.2 sec, 71824 bytes code, 4964 bytes data C:\TMP>xx1.exe Nan FFF8 The 8 nibble shows that the highest bit of the mantissa is 1 and therefore you have a quiet NaN. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Loss of precision when using math.Max()
Zitat von Vojtěch Čihák : will not give any warning even if correct result is 2.5. It is simply absurd because it is not about shooting your own foot. Compiler is not a crystal ball, it does what you tell him. If you need floating point, use floating point types and floating point division (my example) and if you need signed results, use Really? There are other source of loss of precision (in the trunk version) and the compiler does not what I tell him. Example: const d1: double = 1.0/3.0; d2: double = 1/3; x1: extended = 1.0/3.0; x2: extended = 1/3; s1: single = 1.0/3.0; s2: single = 1/3; begin writeln(s1:30:10, s2:30:10); writeln(d1:30:16, d2:30:16); writeln(x1:30:16, x2:30:16); end. and the result 0.333433 0.333433 0.3334326744080. 0.3334326744080. The single result is expected. But the double and extended constants d1, x1 are evaluated as single, even if I told the compiler to use the floating point quotient 1.0/3.0. If I use the integer quotient the values are as expected. This is against intuition and gives no warning. And even if I can adapt to this and live with this quirk: Is there a definite statement, that is will remain so. (Delphi works as expected). ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Loss of precision when using math.Max()
Zitat von Florian Klämpfl : In pascal, the rule applies that *the resulting type of an operation does not depend on the type a value is assigned too*. So: 1.0 fits perfectly into a single, 3.0 as well, they are single constants (you didn't tell the compiler otherwise). Nobody wants that unnecessarily larger types are used. So for the compiler this is a single division and later on the result is converted into extended. The result for integer is indeed a little bit strange, here probably the default rule applies that any /-operand is converted to the best available real type if it is not a real type, thus the result differs. Question is, if the compiler should look the operand types and choose more carefully the types, but I tend more to say no. In any case there two facts. 1. The constants are evaluated at compile time, so there is no speed penalty. 2. This is a regression from 3.0.4 (here the 32-bit version works as expected for both double and extended, and same for 64-bit except that here extended=double) and to 3.1.1 (both under Win7/64). Is there a definite statement, that is will remain so. Insert type casts for the constants to get proper results on all archs, see below. This is problematic because for portable code because not all compilers support type casts here. (Delphi works as expected). The reason why delphi behaves different is probably due to it's roots in being x87 only: x87 does all calculations with extended precision. This is also as expected for 64-Bit Delphis using SSE. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Loss of precision when using math.Max()
Zitat von Florian Klämpfl : So you want float constants being evaluated always with full precision (which would be required for consistency) causing any floating point expression containing a constant being evaluated with full precision as well? Yes, at least as default or selectable per option (like FASTMATH etc), and AFAIK it is default for all compilers I know except 3.1.1. 2. This is a regression from 3.0.4 (here the 32-bit version works as expected for both double and extended, and same for 64-bit except that here extended=double) and to 3.1.1 (both under Win7/64). This was probably changed for consistency. I tried to find the commit which changes this, but I cannot currently find it. Thank you for your effort. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Loss of precision when using math.Max()
Zitat von Florian Klämpfl : But actually, I just found out that we have something like this already for years: {$minfpconstprec 64} OK, then two questions remain: Why does is occur/apply only for (newer?) 3.1.1 versions? And why is there no option to select the maximum precision for the target FPU? E.g. if extended has 10 bytes (no only an alias for double with 64-bit), it should be possible to select {$minfpconstprec 80} or {$minfpconstprec max}. But thanks anyhow. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Loss of precision when using math.Max()
Zitat von Martok : To make sure this works, one has to manually make the const expression be of the type required: const e: Extended = Extended(1.0) / 3.0; I seem to remember that this was once documented somewhere for Delphi. Can't seem to find it though, so maybe it was a 3rd-party book? There is no hint of it in the FPC documentation, anyway. As already noted, it is not necessary in Delphi D:\Work\TMP>D:\DMX\M18\DCC32 -b exttest.dpr Embarcadero Delphi for Win32 compiler version 25.0 Copyright (c) 1983,2013 Embarcadero Technologies, Inc. exttest.dpr(14) 15 lines, 0.00 seconds, 20816 bytes code, 13864 bytes data. D:\Work\TMP>exttest 0.333433 0.333433 0.0. 0.0. And with Delphi Tokio 10.2.3 Starter G:\TMP>bds -b exttest.dpr G:\TMP>exttest.exe 0.333433 0.333433 0.0. 0.0. And I also wrote that the explicit type cast can only be used with very new Delphi, e.g. compiler version 25 gives for {$apptype console} const x1: extended = extended(1.0)/3.0; begin writeln(x1:30:16); end. D:\Work\TMP>b18 exttest2.dpr D:\Work\TMP>D:\DMX\M18\DCC32 -b exttest2.dpr Embarcadero Delphi for Win32 compiler version 25.0 Copyright (c) 1983,2013 Embarcadero Technologies, Inc. exttest2.dpr(3) Error: E2089 Invalid typecast exttest2.dpr(7) Tokyo compiles without error, I do know now which was the first version which allows the type cast. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Loss of precision when using math.Max()
Zitat von Florian Klaempfl : I am happy to implement this in delphi mode, but it requires some documentation references how delphi evaluates such expressions (which precision is used when and why) and how they are handled in expressions. These links may be of interest: http://docwiki.embarcadero.com/RADStudio/Tokyo/en/About_Floating-Point_Arithmetic#Understand_the_Data_Flow and http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Floating_point_precision_control_%28Delphi_for_x64%29 ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Order of Precedence: FPC/Delphi vs Java
This is one of the most useless collection of floating point myths I have seen since a long time. I don't know why you want to compare two floats, but you'd better use currency type. Float is for calculus, but comparing float1=float2 (or float1>float2) is rolling the dice. Obviously, the more precision, the less errors. But an accurate, intuitive result, is not guaranteed. With rolling a dice you mean, that the comparisons are only randomly correct or what)? Since the floating-point numbers are well-defined and exact (yes they are, and truncation/rounding errors are the results from former computations and/or the rounding of non-representable numbers). All operations are predictable, so there is no chance for random. People who play with maths may use something like function equal(const f1,f2:Extended; const Error:extended=1E-6):boolean; begin Result:=(abs(f1-f2) This function is non-sense and I doubt that 'math people' will use it. First: it uses only absolute errors, so 1e-7 and 1e-4000 are equal. Second: A tolerance of 1e-6 is ridiculous, given that the machine epsilon for 80-bit extended is 1.0842e-19. What does java does? I don't know. Perhaps it just rounds the output, try System.out.println(ans==0.0). Perhaps it uses a high precision that *in this case* gets always 0. As already said, you can get the same values as Java, if you use the same data types (double) and the same precision (53-bit) with Free Pascal (even with the X87 FPU) 5.10099001E+004 5.10099001E+004 5.10099001E+004 --- 0.E+000 0.E+000 0.E+000 0.E+000 0.E+000 But the question is that floating point representation can't store accurately many numbers. To begin with, 0.1 in base 10, is periodic number in binary 0.0001...001..001... so it has to truncate it, No is this only true if you use the truncate rounding mode, which is not default (default is round-to-nearest/even). and when you truncate, the result depends of the order of operations, no matter the language or precision. So, it is matter of probability to get 1. or 1.0001 or 0. Yes, but it is predictable and deterministic and no matter of probability. To conclude: The reason for the differences is the usage of intermediate elevated precision (and this can occur also e.g. for C where it is also allowed to use intermediate higher precision). For Delphi/Free Pascal this phenomenon does not occur if you always compute with the precision appropriate to the data type, as e.g. with 64-bit/SSE. Regards, Wolfgang ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal