coderfender commented on PR #3056:
URL: 
https://github.com/apache/datafusion-comet/pull/3056#issuecomment-3745541691

   @andygrove  I removed existing cast documentation . Please take a look 
whenever you get a chance 
   
   New document for ref :
   
   ```
   <!---
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
   
   http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
   -->
   
   # Compatibility Guide
   
   Comet aims to provide consistent results with the version of Apache Spark 
that is being used.
   
   This guide offers information about areas of functionality where there are 
known differences.
   
   ## Parquet
   
   Comet has the following limitations when reading Parquet files:
   
   - Comet does not support reading decimals encoded in binary format.
   - No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
   
   ## ANSI Mode
   
   Comet will fall back to Spark for the following expressions when ANSI mode 
is enabled. These expressions can be enabled by setting
   `spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` 
is the Spark expression class name. See
   the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
   
   - Average (supports all numeric inputs except decimal types)
   - Cast (in some cases)
   
   There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
   
   ## Floating-point Number Comparison
   
   Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
   However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
   because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
   functions of arrow-rs used by DataFusion do not normalize NaN and zero 
(e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
   So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
   to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
   case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
   will make relevant operations fall back to Spark.
   
   ## Incompatible Expressions
   
   Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
   `spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` 
is the Spark expression class name. See
   the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
   
   ## Regular Expressions
   
   Comet uses the Rust regexp crate for evaluating regular expressions, and 
this has different behavior from Java's
   regular expression engine. Comet will fall back to Spark for patterns that 
are known to produce different results, but
   this can be overridden by setting 
`spark.comet.regexp.allowIncompatible=true`.
   
   ## Window Functions
   
   Comet's support for window functions is incomplete and known to be 
incorrect. It is disabled by default and
   should not be used in production. The feature will be enabled in a future 
release. Tracking issue: 
[#2721](https://github.com/apache/datafusion-comet/issues/2721).
   
   ## Cast
   
   Cast operations in Comet fall into three levels of support:
   
   - **C (Compatible)**: The results match Apache Spark
   - **I (Incompatible)**: The results may match Apache Spark for some inputs, 
but there are known issues where some inputs
     will result in incorrect results or exceptions. The query stage will fall 
back to Spark by default. Setting
     `spark.comet.expression.Cast.allowIncompatible=true` will allow all 
incompatible casts to run natively in Comet, but this is not
     recommended for production use.
   - **U (Unsupported)**: Comet does not provide a native version of this cast 
expression and the query stage will fall back to
     Spark.
   - **N/A**: Spark does not support this cast.
   
   ### Legacy Mode
   
   <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS 
-->
   
   <!--BEGIN:CAST_LEGACY_TABLE-->
   <!-- prettier-ignore-start -->
   | | binary | boolean | byte | date | decimal | double | float | integer | 
long | short | string | timestamp |
   |---|---|---|---|---|---|---|---|---|---|---|---|---|
   | binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A 
|
   | boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
   | byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
   | date | N/A | U | U | - | U | U | U | U | U | U | C | U |
   | decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
   | double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
   | float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
   | integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
   | long | U | C | C | N/A | C | C | C | C | - | C | C | U |
   | short | U | C | C | N/A | C | C | C | C | C | - | C | U |
   | string | C | C | C | C | I | C | C | C | C | C | - | I |
   | timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
   <!-- prettier-ignore-end -->
   
   **Notes:**
   
   - **decimal -> string**: There can be formatting differences in some case 
due to Spark using scientific notation where Comet does not
   - **double -> decimal**: There can be rounding differences
   - **double -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
   - **float -> decimal**: There can be rounding differences
   - **float -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
   - **string -> date**: Only supports years between 262143 BC and 262142 AD
   - **string -> decimal**: Does not support fullwidth unicode digits (e.g 
\\uFF10)
     or strings containing null bytes (e.g \\u0000)
   - **string -> timestamp**: Not all valid formats are supported
   <!--END:CAST_LEGACY_TABLE-->
   
   ### Try Mode
   
   <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS 
-->
   
   <!--BEGIN:CAST_TRY_TABLE-->
   <!-- prettier-ignore-start -->
   | | binary | boolean | byte | date | decimal | double | float | integer | 
long | short | string | timestamp |
   |---|---|---|---|---|---|---|---|---|---|---|---|---|
   | binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A 
|
   | boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
   | byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
   | date | N/A | U | U | - | U | U | U | U | U | U | C | U |
   | decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
   | double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
   | float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
   | integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
   | long | U | C | C | N/A | C | C | C | C | - | C | C | U |
   | short | U | C | C | N/A | C | C | C | C | C | - | C | U |
   | string | C | C | C | C | I | C | C | C | C | C | - | I |
   | timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
   <!-- prettier-ignore-end -->
   
   **Notes:**
   
   - **decimal -> string**: There can be formatting differences in some case 
due to Spark using scientific notation where Comet does not
   - **double -> decimal**: There can be rounding differences
   - **double -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
   - **float -> decimal**: There can be rounding differences
   - **float -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
   - **string -> date**: Only supports years between 262143 BC and 262142 AD
   - **string -> decimal**: Does not support fullwidth unicode digits (e.g 
\\uFF10)
     or strings containing null bytes (e.g \\u0000)
   - **string -> timestamp**: Not all valid formats are supported
   <!--END:CAST_TRY_TABLE-->
   
   ### ANSI Mode
   
   <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS 
-->
   
   <!--BEGIN:CAST_ANSI_TABLE-->
   <!-- prettier-ignore-start -->
   | | binary | boolean | byte | date | decimal | double | float | integer | 
long | short | string | timestamp |
   |---|---|---|---|---|---|---|---|---|---|---|---|---|
   | binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A 
|
   | boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
   | byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
   | date | N/A | U | U | - | U | U | U | U | U | U | C | U |
   | decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
   | double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
   | float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
   | integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
   | long | U | C | C | N/A | C | C | C | C | - | C | C | U |
   | short | U | C | C | N/A | C | C | C | C | C | - | C | U |
   | string | C | C | C | C | I | C | C | C | C | C | - | I |
   | timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
   <!-- prettier-ignore-end -->
   
   **Notes:**
   
   - **decimal -> string**: There can be formatting differences in some case 
due to Spark using scientific notation where Comet does not
   - **double -> decimal**: There can be rounding differences
   - **double -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
   - **float -> decimal**: There can be rounding differences
   - **float -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
   - **string -> date**: Only supports years between 262143 BC and 262142 AD
   - **string -> decimal**: Does not support fullwidth unicode digits (e.g 
\\uFF10)
     or strings containing null bytes (e.g \\u0000)
   - **string -> timestamp**: ANSI mode not supported
   <!--END:CAST_ANSI_TABLE-->
   
   See the [tracking 
issue](https://github.com/apache/datafusion-comet/issues/286) for more details.
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to