rkwagner opened a new issue, #13233:
URL: https://github.com/apache/hudi/issues/13233

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   When upgrading our stack from Hudi 0.14.0 to 0.15.0, writes on new tables 
with columns defined as `logicalType: timestamp-millis` are written into 
Parquet as `timestamp`, which in Spark defaults to `micro`.  This means data 
read as the Long 1720631224939 (meant to represent 2024-07-10) will instead 
write back into the parquet as a date in January, 1970.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Start up a clean Spark Session with zero defined tables, as we want the 
DeltaStreamer to auto generate the schema based on a Schema provider.
   2. Using the provider, set the source schema as reading from a column which 
represents a long, and contains the above value.
   3. Set the target schema to use type of Long, with logicalType as 
timestamp-millis.
   4. Have the dataframe generated do no data casting or anything to this so 
that it translates directly from a Long to a timestamp, and check the generated 
Parquet data.
   
   **Expected behavior**
   
   Parquet should have the resultant data read as some form or analog of 
(timezones might change this by up to a day)  Wednesday, July 10, 2024 5:07:04 
PM
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version : 3.5.2 (3.4.1 produces the same result)
   
   * Hive version :
   
   * Hadoop version : common config v 3.0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : local & EKS same result
   
   
   **Additional context**
   
   I noticed that in the Data Object, the source and destination Schemas were 
present as expected.  I just couldn't find anywhere to inspect the actual data 
to see if the Schemas from the provider actually got passed along into the 
Write.
   This was something one of us noticed earlier: When upgrading to Hudi 15, the 
writer will always seem to use Spark's default Timestamp type (microsecond 
representation) instead of whatever we were passing in.
   
   
   **Stacktrace**
   
   N/A
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to