I believe that there are many overlapping problems, many of which are that each 
DBMS has its own semantics.

In order to emulate a particular DBMS, we need to understand its semantics, 
document that semantics by writing tests, and then follow that semantics (if 
Calcite is in a mode that should follow those semantics).

In my opinion, some DBMS overuse binary floating point types (which are often 
lossy when converting integer or decimal literals). Ideally Calcite would never 
convert to a binary floating point unless specifically asked to do so. But if 
we are emulating another DBMS, maybe we would need to do that.

By default, Calcite’s numeric literals are arbitrary-precision decimal values. 
But some recent threads/issues have convinced me that we might need arbitrary 
precision binary numeric literals in some cases (e.g. after applying constant 
reduction).



> On Jun 9, 2024, at 12:21 PM, Cancai Cai <caic68...@gmail.com> wrote:
> 
> Hello calcite community,
> I would like to ask a question. Actually, I am not sure whether it is
> appropriate to ask this question in the calcite community. It is mainly
> about the database's handling of numeric types.
> 
> [10:53:45]TiDB root:test> explain select id = 8145308033243873280 from test;
> +---------------------+----------+-----------+--------------------------+------------------------------------------------------------------------+
> | id                  | estRows  | task      | access object            |
> operator info                                                          |
> +---------------------+----------+-----------+--------------------------+------------------------------------------------------------------------+
> | Projection_3        | 10000.00 | root      |                          |
> eq(cast(test.test.id, double BINARY), 8.145308033243873e+18)->Column#3 |
> | └─IndexReader_7     | 10000.00 | root      |                          |
> index:IndexFullScan_6                                                  |
> |   └─IndexFullScan_6 | 10000.00 | cop[tikv] | table:test, index:id(id) |
> keep order:false, stats:pseudo                                         |
> +---------------------+----------+-----------+--------------------------+------------------------------------------------------------------------+
> 3 rows in set
> Time: 0.004s
> [10:54:08]TiDB root:test> select cast('8145308033243873281' as double) ;
> +---------------------------------------+
> | cast('8145308033243873281' as double) |
> +---------------------------------------+
> | 8.145308033243873e+18                 |
> +---------------------------------------+
> 1 row in set
> Time: 0.008s
> [10:54:15]TiDB root:test> select cast('8145308033243873280' as double) ;
> +---------------------------------------+
> | cast('8145308033243873280' as double) |
> +---------------------------------------+
> | 8.145308033243873e+18                 |
> +---------------------------------------+
> 1 row in set
> Time: 0.003s
> 
> As shown above, some numeric types are converted to double type by default
> in the underlying database, but this is not compatible with bigdecimal and
> decimal types. Why is this?
> 
> These behaviors exist in both MySQL
> <https://dev.mysql.com/doc/refman/8.0/en/type-conversion.html>(I put the
> above example into mysql and it is also reproduced) and Spark. For example,
> Spark's map_contains_key function also converts numeric types to double for
> comparison.
> 
> I am not a database developer, I am curious about where the difficulty is
> here, if someone can tell me, I will be very grateful and happy that I will
> learn something new.
> 
> Best wishes,
> Cancai Cai

Reply via email to