Recently someone opened ARROW-2145 <https://issues.apache.org/jira/projects/ARROW/issues/ARROW-2145> asking for support for non-finite values, such as NaN and infinity. It may seem like a “no-brainer” to implement this, but there’s no real consistency on how to implement it or *even to implement it at all*:
- Java BigDecimal: raises an exception for nan or inf as per the docs <https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#BigDecimal-double-> - boost multiprecision supports it but not for fixed precision decimal numbers (cpp_bin_float/cpp_dec_float, which are arbitrary precision floating point not fixed point) - python supports it using flags and special string exponents (and it supports both signaling and quiet nans) - impala doesn’t support it (returns null when you try to perform CAST(CAST('NaN' AS DOUBLE) AS DECIMAL) - postgres supports it with its numeric <https://www.postgresql.org/docs/10/static/datatype-numeric.html> type by using the sign member of the C struct backing numeric values <https://github.com/postgres/postgres/blob/c7b8998ebbf310a156aa38022555a24d98fdbfb4/src/interfaces/ecpg/include/pgtypes_numeric.h#L16-L25> - MySQL: doesn’t even support nan/inf! The lack of support for these values across languages likely stems from the fact that fixed precision arithmetic by definition must happen on finite values, and nan/inf are not finite values therefore they are not supported. We could go down this rabbit hole in the name of providing support for Python decimal.Decimal(<non-finite value>) but I’m not sure how useful it is. No other system except in-memory C++ arrow arrays would be able to operate on these values (I suppose we could add a wrapper around BigDecimal that has the desired behavior). For example, writing arrow arrays containing Decimal128 values (with nans or infs) to a parquet file seems untenable. Additionally, if we decided to implement it, we’d likely have to take something like the flag approach which would require a change to the metadata (not necessary a bad thing) that would add two bitmaps to arrow Decimal arrays: one for indicating nan-ness and one for indicating inf-ness (that’s a ton of overhead IMO when I think it’s likely that most values are always finite). I’m skeptical about whether we should support this. Thoughts?