Jonathan Vexler created HUDI-9172: ------------------------------------- Summary: Timestamp millis logical type is being read wrong from log files Key: HUDI-9172 URL: https://issues.apache.org/jira/browse/HUDI-9172 Project: Apache Hudi Issue Type: Bug Components: reader-core, spark, spark-sql Affects Versions: 1.0.1, 1.0.0 Reporter: Jonathan Vexler
Partial schema: {code:java} { "name": "timestamp_millis_nullable_field", "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } ], "default": null }, { "name": "timestamp_micros_nullable_field", "type": [ "null", { "type": "long", "logicalType": "timestamp-micros" } ], "default": null }, { "name": "timestamp_local_millis_nullable_field", "type": [ "null", { "type": "long", "logicalType": "local-timestamp-millis" } ], "default": null }, { "name": "timestamp_local_micros_nullable_field", "type": [ "null", { "type": "long", "logicalType": "local-timestamp-micros" } ], "default": null }{code} Here is the data read before and after compaction using spark datasource: {code:java} {"_hoodie_commit_time":"20250312194153518","_hoodie_commit_seqno":"20250312194153518_2_83","_hoodie_record_key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","_hoodie_partition_path":"WARN","key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","ts":1741808513729,"severity":null,"double_field":0.7353261619385916,"float_field":0.7660646,"int_field":2010860559,"long_field":8103800306916814465,"boolean_field":true,"string_field":"JMdZXEImEEXXScOivldhirRdMxmdbXxuQMyMfHpQynWkTDNBoOkoyOdVZNPgvxNQZOColrHsbrLJASmWSHKOEsKXnVUYsZhckRjEHLCSrUBIeeCEftWvmtxoExNcOPCxVhNZrQgRqxAWbssnYiPqzMFfmZXrtMfkihFfvWfgMZQkTIKpDdpBOREWPrqYBNwRmVtpMXItwCIsgvpWmUiiTQCkxsegiauMpMgGiOTQUPkJppnjrloeOBpTMjkbNefyXNfsyRlqsfIVnnAfgxuwJdbBKFMYZnjJCPqPmCWZUVetPLiVUWvTrhUFjyLlxsjvyfrOIktzabVyiPnIzZkUUTJoIkIktqVzWeWbVWSivYrOCbRboPYbmTtfPIYaUcMQrlHaYEKwtFXpWZBeIHcOkTpCueBPqJAcdxRsfkwIwTRIExGqXlMaCLoUtaNrccViRqLnfjjjguskqWZncyOUtYeZjFQvFFcYsuhbrpUSFTiCFYrtSpdvBnKCnjINoUijyYSLhvNNaggCnEGShkrgBeWguHyFnFhNWbVWXUjrACTzLSFyZVWRGfEvBzzlKlEymyXXeRvnoMxxfhcEDBOpQBXEGFZMLEdmdqhKmNvafARRuHJGrzjWxwTfPTFqtLjSGnxqdZBIOqjuignkWIFpzbHWnWtYfCRIqRBICdnNzKvNVjtYgIsBXjZLRdkzdBvsNeMRhbDzYjxbxDyiEIdBHabzoTlWgguFLkStQvkYhMrPhcioDmiusCgyuuVzlqStzLMsRksajVDRxFEKmZKZLKApeuRCLKDoVOSkuMXBowizUdEe","bytes_field":"dFloYXhsRGFIUw==","decimal_field":7608.44,"nested_record":null,"nullable_map_field":{"MmbfD":{"nested_int":-560277617,"level":"ERROR"}},"array_field":[{"nested_int":2020017221,"level":"WARN"},{"nested_int":370699254,"level":"ERROR"}],"enum_field":"FIRST","date_nullable_field":"2025-03-11","timestamp_millis_nullable_field":"1970-01-21T03:47:59.388Z","timestamp_micros_nullable_field":"2025-03-11T04:40:28.143Z","timestamp_local_millis_nullable_field":1741715142032,"timestamp_local_micros_nullable_field":1741675805756000,"level":"WARN"} {"_hoodie_commit_time":"20250312194153518","_hoodie_commit_seqno":"20250312194153518_2_83","_hoodie_record_key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","_hoodie_partition_path":"WARN","key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","ts":1741808513729,"severity":null,"double_field":0.7353261619385916,"float_field":0.7660646,"int_field":2010860559,"long_field":8103800306916814465,"boolean_field":true,"string_field":"JMdZXEImEEXXScOivldhirRdMxmdbXxuQMyMfHpQynWkTDNBoOkoyOdVZNPgvxNQZOColrHsbrLJASmWSHKOEsKXnVUYsZhckRjEHLCSrUBIeeCEftWvmtxoExNcOPCxVhNZrQgRqxAWbssnYiPqzMFfmZXrtMfkihFfvWfgMZQkTIKpDdpBOREWPrqYBNwRmVtpMXItwCIsgvpWmUiiTQCkxsegiauMpMgGiOTQUPkJppnjrloeOBpTMjkbNefyXNfsyRlqsfIVnnAfgxuwJdbBKFMYZnjJCPqPmCWZUVetPLiVUWvTrhUFjyLlxsjvyfrOIktzabVyiPnIzZkUUTJoIkIktqVzWeWbVWSivYrOCbRboPYbmTtfPIYaUcMQrlHaYEKwtFXpWZBeIHcOkTpCueBPqJAcdxRsfkwIwTRIExGqXlMaCLoUtaNrccViRqLnfjjjguskqWZncyOUtYeZjFQvFFcYsuhbrpUSFTiCFYrtSpdvBnKCnjINoUijyYSLhvNNaggCnEGShkrgBeWguHyFnFhNWbVWXUjrACTzLSFyZVWRGfEvBzzlKlEymyXXeRvnoMxxfhcEDBOpQBXEGFZMLEdmdqhKmNvafARRuHJGrzjWxwTfPTFqtLjSGnxqdZBIOqjuignkWIFpzbHWnWtYfCRIqRBICdnNzKvNVjtYgIsBXjZLRdkzdBvsNeMRhbDzYjxbxDyiEIdBHabzoTlWgguFLkStQvkYhMrPhcioDmiusCgyuuVzlqStzLMsRksajVDRxFEKmZKZLKApeuRCLKDoVOSkuMXBowizUdEe","bytes_field":"dFloYXhsRGFIUw==","decimal_field":7608.44,"nested_record":null,"nullable_map_field":{"MmbfD":{"nested_int":-560277617,"level":"ERROR"}},"array_field":[{"nested_int":2020017221,"level":"WARN"},{"nested_int":370699254,"level":"ERROR"}],"enum_field":"FIRST","date_nullable_field":"2025-03-11","timestamp_millis_nullable_field":"2025-03-11T07:49:48.590Z","timestamp_micros_nullable_field":"2025-03-11T04:40:28.143Z","timestamp_local_millis_nullable_field":1741715142032,"timestamp_local_micros_nullable_field":1741675805756000,"level":"WARN"} {code} All fields are the same except the timestamp_millis_nullable_field. My guess is that it is due to avro->internal row conversion in the filegroup reader HUDI-9142 seems like it might be related -- This message was sent by Atlassian Jira (v8.20.10#820010)