Jonathan Vexler created HUDI-9172:
-------------------------------------

             Summary: Timestamp millis logical type is being read wrong from 
log files
                 Key: HUDI-9172
                 URL: https://issues.apache.org/jira/browse/HUDI-9172
             Project: Apache Hudi
          Issue Type: Bug
          Components: reader-core, spark, spark-sql
    Affects Versions: 1.0.1, 1.0.0
            Reporter: Jonathan Vexler


Partial schema:
{code:java}
{
      "name": "timestamp_millis_nullable_field",
      "type": [
        "null",
        {
          "type": "long",
          "logicalType": "timestamp-millis"
        }
      ],
      "default": null
    },
    {
      "name": "timestamp_micros_nullable_field",
      "type": [
        "null",
        {
          "type": "long",
          "logicalType": "timestamp-micros"
        }
      ],
      "default": null
    },
    {
      "name": "timestamp_local_millis_nullable_field",
      "type": [
        "null",
        {
          "type": "long",
          "logicalType": "local-timestamp-millis"
        }
      ],
      "default": null
    },
    {
      "name": "timestamp_local_micros_nullable_field",
      "type": [
        "null",
        {
          "type": "long",
          "logicalType": "local-timestamp-micros"
        }
      ],
      "default": null
    }{code}

Here is the data read before and after compaction using spark datasource:
{code:java}
{"_hoodie_commit_time":"20250312194153518","_hoodie_commit_seqno":"20250312194153518_2_83","_hoodie_record_key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","_hoodie_partition_path":"WARN","key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","ts":1741808513729,"severity":null,"double_field":0.7353261619385916,"float_field":0.7660646,"int_field":2010860559,"long_field":8103800306916814465,"boolean_field":true,"string_field":"JMdZXEImEEXXScOivldhirRdMxmdbXxuQMyMfHpQynWkTDNBoOkoyOdVZNPgvxNQZOColrHsbrLJASmWSHKOEsKXnVUYsZhckRjEHLCSrUBIeeCEftWvmtxoExNcOPCxVhNZrQgRqxAWbssnYiPqzMFfmZXrtMfkihFfvWfgMZQkTIKpDdpBOREWPrqYBNwRmVtpMXItwCIsgvpWmUiiTQCkxsegiauMpMgGiOTQUPkJppnjrloeOBpTMjkbNefyXNfsyRlqsfIVnnAfgxuwJdbBKFMYZnjJCPqPmCWZUVetPLiVUWvTrhUFjyLlxsjvyfrOIktzabVyiPnIzZkUUTJoIkIktqVzWeWbVWSivYrOCbRboPYbmTtfPIYaUcMQrlHaYEKwtFXpWZBeIHcOkTpCueBPqJAcdxRsfkwIwTRIExGqXlMaCLoUtaNrccViRqLnfjjjguskqWZncyOUtYeZjFQvFFcYsuhbrpUSFTiCFYrtSpdvBnKCnjINoUijyYSLhvNNaggCnEGShkrgBeWguHyFnFhNWbVWXUjrACTzLSFyZVWRGfEvBzzlKlEymyXXeRvnoMxxfhcEDBOpQBXEGFZMLEdmdqhKmNvafARRuHJGrzjWxwTfPTFqtLjSGnxqdZBIOqjuignkWIFpzbHWnWtYfCRIqRBICdnNzKvNVjtYgIsBXjZLRdkzdBvsNeMRhbDzYjxbxDyiEIdBHabzoTlWgguFLkStQvkYhMrPhcioDmiusCgyuuVzlqStzLMsRksajVDRxFEKmZKZLKApeuRCLKDoVOSkuMXBowizUdEe","bytes_field":"dFloYXhsRGFIUw==","decimal_field":7608.44,"nested_record":null,"nullable_map_field":{"MmbfD":{"nested_int":-560277617,"level":"ERROR"}},"array_field":[{"nested_int":2020017221,"level":"WARN"},{"nested_int":370699254,"level":"ERROR"}],"enum_field":"FIRST","date_nullable_field":"2025-03-11","timestamp_millis_nullable_field":"1970-01-21T03:47:59.388Z","timestamp_micros_nullable_field":"2025-03-11T04:40:28.143Z","timestamp_local_millis_nullable_field":1741715142032,"timestamp_local_micros_nullable_field":1741675805756000,"level":"WARN"}
{"_hoodie_commit_time":"20250312194153518","_hoodie_commit_seqno":"20250312194153518_2_83","_hoodie_record_key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","_hoodie_partition_path":"WARN","key":"0252824c-7b64-41a0-81a4-b8f5e2271b14","ts":1741808513729,"severity":null,"double_field":0.7353261619385916,"float_field":0.7660646,"int_field":2010860559,"long_field":8103800306916814465,"boolean_field":true,"string_field":"JMdZXEImEEXXScOivldhirRdMxmdbXxuQMyMfHpQynWkTDNBoOkoyOdVZNPgvxNQZOColrHsbrLJASmWSHKOEsKXnVUYsZhckRjEHLCSrUBIeeCEftWvmtxoExNcOPCxVhNZrQgRqxAWbssnYiPqzMFfmZXrtMfkihFfvWfgMZQkTIKpDdpBOREWPrqYBNwRmVtpMXItwCIsgvpWmUiiTQCkxsegiauMpMgGiOTQUPkJppnjrloeOBpTMjkbNefyXNfsyRlqsfIVnnAfgxuwJdbBKFMYZnjJCPqPmCWZUVetPLiVUWvTrhUFjyLlxsjvyfrOIktzabVyiPnIzZkUUTJoIkIktqVzWeWbVWSivYrOCbRboPYbmTtfPIYaUcMQrlHaYEKwtFXpWZBeIHcOkTpCueBPqJAcdxRsfkwIwTRIExGqXlMaCLoUtaNrccViRqLnfjjjguskqWZncyOUtYeZjFQvFFcYsuhbrpUSFTiCFYrtSpdvBnKCnjINoUijyYSLhvNNaggCnEGShkrgBeWguHyFnFhNWbVWXUjrACTzLSFyZVWRGfEvBzzlKlEymyXXeRvnoMxxfhcEDBOpQBXEGFZMLEdmdqhKmNvafARRuHJGrzjWxwTfPTFqtLjSGnxqdZBIOqjuignkWIFpzbHWnWtYfCRIqRBICdnNzKvNVjtYgIsBXjZLRdkzdBvsNeMRhbDzYjxbxDyiEIdBHabzoTlWgguFLkStQvkYhMrPhcioDmiusCgyuuVzlqStzLMsRksajVDRxFEKmZKZLKApeuRCLKDoVOSkuMXBowizUdEe","bytes_field":"dFloYXhsRGFIUw==","decimal_field":7608.44,"nested_record":null,"nullable_map_field":{"MmbfD":{"nested_int":-560277617,"level":"ERROR"}},"array_field":[{"nested_int":2020017221,"level":"WARN"},{"nested_int":370699254,"level":"ERROR"}],"enum_field":"FIRST","date_nullable_field":"2025-03-11","timestamp_millis_nullable_field":"2025-03-11T07:49:48.590Z","timestamp_micros_nullable_field":"2025-03-11T04:40:28.143Z","timestamp_local_millis_nullable_field":1741715142032,"timestamp_local_micros_nullable_field":1741675805756000,"level":"WARN"}
 {code}
All fields are the same except the timestamp_millis_nullable_field. My guess is 
that it is due to avro->internal row conversion in the filegroup reader

HUDI-9142 seems like it might be related



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to