Infinity vs Error for division by zero

2019-03-01 Thread Matt Pulver
Hello,

PostgreSQL FLOAT appears to support +/-Infinity and NaN per the IEEE 754
standard, with expressions such as CAST('NaN' AS FLOAT) and CAST('Infinity'
AS FLOAT) and even supports ordering columns of floats that contain NaN.

However the query "SELECT 1.0/0.0;" produces an exception:

ERROR:  division by zero


Question: If Infinity and NaN are supported, then why throw an exception
here, instead of returning Infinity? Is it purely for historical reasons,
or if it could all be done again, would an exception still be preferred?

For purely integer arithmetic, I can see how an exception would make sense.
However for FLOAT, I would expect/prefer Infinity to be returned.

Best regards,
Matt


Re: Infinity vs Error for division by zero

2019-03-01 Thread Matt Pulver
On Fri, Mar 1, 2019 at 12:59 PM Andrew Gierth 
wrote:

> >>>>> "Matt" == Matt Pulver  writes:
>
>  Matt> ERROR:  division by zero
>
>  Matt> Question: If Infinity and NaN are supported, then why throw an
>  Matt> exception here, instead of returning Infinity?
>
> Spec says so:
>
>   4) The dyadic arithmetic operators , ,
>  , and  (+, -, *, and /, respectively) specify
>  addition, subtraction, multiplication, and division, respectively.
>  If the value of a divisor is zero, then an exception condition is
>  raised: data exception -- division by zero.


Thank you, that is what I was looking for. In case anyone else is looking
for source documentation on the standard, there is a link from
https://en.wikipedia.org/wiki/SQL:2003#Documentation_availability to a zip
file of the SQL 2003 draft http://www.wiscorp.com/sql_2003_standard.zip
where one can confirm this (page 242 of 5WD-02-Foundation-2003-09.pdf).


On Fri, Mar 1, 2019 at 2:26 PM David G. Johnston 
wrote:

> On Friday, March 1, 2019, Chapman Flack  wrote:
>
>>
>> But if someone wanted to write a user-defined division function or
>> operator that would return Inf for (anything > 0) / 0 and for
>> (anything < 0) / -0, and -Inf for (anything < 0) / 0 and for
>> (anything > 0) / -0, and NaN for (either zero) / (either zero), I think
>> that function or operator would be fully in keeping with IEEE 754.
>>
>
> Upon further reading you are correct - IEEE 754 has chosen to treat n/0
> differently for n=0 and n<>0 cases.  I'm sure they have their reasons but
> within the scope of this database, and the core arithmetic functions it
> provides, those distinctions don't seeming meaningful and having to add
> query logic to deal with both cases would just be annoying.  I don't use,
> or have time for the distraction, to understand why such a decision was
> made and how it could be useful.  Going from an exception to NaN makes
> sense to me, going instead to infinity - outside of limit expressions which
> aren't applicable here - does not.
>
> For my part in the queries I have that encounter divide-by-zero I end up
> transforming the result to zero which is considerably easier to
> present/absorb along side other valid fractions in a table or chart.
>

In heavy financial/scientific calculations with tables of data, using inf
and nan are very useful, much more so than alternatives such as throwing an
exception (which row(s) included the error?), or replacing them with NULL
or 0. There are many intermediate values where using inf makes sense and
results in finite outcomes at the appropriate limit: atan(1.0/0)=pi/2,
erf(1.0/0)=1, exp(-1.0/0)=0, etc.

In contrast, nan represents a mathematically indeterminate form, in which
the appropriate limit could not be ascertained. E.g. 0.0/0, inf-inf,
0.0*inf, etc. In many applications, I would much rather see calculations
carried out via IEEE 754 all the way to the end, with nans and infs, which
provides much more useful diagnostic information than an exception that
doesn't return any rows at all. As Andres Freund pointed out, it is also
more expensive to do the intermediate checks. Just let IEEE 754 do its
thing! (More directed at the SQL standard than to PostgreSQL.)

Best regards,
Matt


Re: Infinity vs Error for division by zero

2019-03-01 Thread Matt Pulver
On Fri, Mar 1, 2019 at 4:51 PM Chapman Flack  wrote:

> On 3/1/19 3:49 PM, Matt Pulver wrote:
>
> > In many applications, I would much rather see calculations carried out
> > via IEEE 754 all the way to the end, with nans and infs, which
> > provides much more useful diagnostic information than an exception that
> > doesn't return any rows at all. As Andres Freund pointed out, it is also
> > more expensive to do the intermediate checks. Just let IEEE 754 do its
> > thing! (More directed at the SQL standard than to PostgreSQL.)
>
> I wanted to try this out a little before assuming it would work,
> and there seems to be no trouble creating a trivial domain over
> float8 (say, CREATE DOMAIN ieeedouble AS float8), and then creating
> operators whose operand types are the domain type.
>
> So it seems an extension could easily do that, and supply happily
> inf-returning and NaN-returning versions of the operators and
> functions, and those will be used whenever operands have the domain
> type.
>
> It might even be useful and relatively elegant, while leaving the
> SQL-specified base types to have the SQL-specified behavior.
>

That would be very useful. I've been wanting this for years, and I'm sure
the data users I work with will appreciate it (but don't directly
understand this to be the solution).

There are issues relating to ordering and aggregation that perhaps are
already transparent to you, but I'll mention anyway for the record.
Conceptually, there would be different contexts of ordering:

   1. When writing mathematical functions, <, =, and > are all false when
   comparing to NaN (NaN != NaN is true.)
   2. In SQL when sorting or aggregating, NaN=NaN. Consider that there are
   2^53-2 different double precision representations of NaN at the bit level.
   Under the same floating point ordering logic used for finite numbers, when
   applied to inf and nan, we get the following ordering: -nan < -inf < (all
   finite numbers) < inf < nan. When the bit patterns are taken into
   consideration, an efficient sort algorithm can be implemented. (Forgive me
   for stating the obvious, but just mentioning this for whoever is going to
   take this on.)

I would be most interested to hear of and discuss any other unforeseen
complications or side-effects.

Best regards,
Matt