MartinKolbAtWork opened a new issue, #12226:
URL: https://github.com/apache/datafusion/issues/12226

   ### Describe the bug
   
   The "to_date" function 
(https://github.com/apache/datafusion/blob/main/datafusion/functions/src/datetime/to_date.rs)
 fails to process dates that are later than year 2262.
   This is caused by the implementation detail that the conversion process uses 
nano-seconds based on epoch.
   The Arrow datatype for Date32 and Date64 support much larger values. The 
statements in some areas in the code state that the usage of nanoseconds is 
imposed by the Date types of Arrow. This is simply wrong. The Date32 type 
stores the number of days since epoch. The Date64 type stores the milliseconds 
(NOT nanoseconds) since epoch. Both Date32 and Date64 can therefore massively 
exceed the year 2262.
   See: https://arrow.apache.org/docs/cpp/api/datatype.html
   
   NOTE:
   Processing dates later than 2262 is **not a theoretical issue**. In widely 
used business software systems, **unbounded dates** (e.g. an "expiry_date" that 
is set to never expire) are set to the 31st of December of the year 9999 (i.e. 
"9999-12-31").
   Processing such data with **datafusion will fail** if the current "to_date" 
implementation touches this data.
   
   
   ### To Reproduce
   
   Add these unit tests to 
https://github.com/apache/datafusion/blob/main/datafusion/functions/src/datetime/to_date.rs
   ```
   mod tests {
       use arrow::{compute::kernels::cast_utils::Parser, datatypes::Date32Type};
       use datafusion_common::ScalarValue;
       use datafusion_expr::{ColumnarValue, ScalarUDFImpl};
   
       use super::ToDateFunc;
   
       #[test]
       fn test_year_9999() {
           let date_str = "9999-12-31";
           let date_scalar = ScalarValue::Utf8(Some(date_str.to_string()));
   
           let res = 
ToDateFunc::new().invoke(&[ColumnarValue::Scalar(date_scalar)]);
   
           match res {
               Ok(ColumnarValue::Scalar(ScalarValue::Date32(date_val))) => {
                   let expected = Date32Type::parse("9999-12-31");
                   assert_eq!(date_val, expected, "to_date created wrong 
value");
               }
               _ => panic!("Could not convert '{}' to Date", date_str),
           }
       }
   
       #[test]
       fn test_year_9999_formatted() {
           let date_str = "99991231";
           let format_str = "%Y%m%d";
           let date_scalar = ScalarValue::Utf8(Some(date_str.to_string()));
           let format_scalar = ScalarValue::Utf8(Some(format_str.to_string()));
   
           let res = ToDateFunc::new().invoke(&[
               ColumnarValue::Scalar(date_scalar),
               ColumnarValue::Scalar(format_scalar),
           ]);
   
           match res {
               Ok(ColumnarValue::Scalar(ScalarValue::Date32(date_val))) => {
                   let expected = Date32Type::parse("9999-12-31");
                   assert_eq!(date_val, expected, "to_date created wrong 
value");
               }
               _ => panic!(
                   "Could not convert '{}' with format string '{}'to Date",
                   date_str, format_str
               ),
           }
       }
   }
   
   ```
   
   ### Expected behavior
   
   "to_date" must be able to process date values in the whole range of values 
that are supported by Arrow's Date32 and Date64, e.g. the date "9999-12-31"
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to