[ https://issues.apache.org/jira/browse/HUDI-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937209#comment-17937209 ]
sivabalan narayanan commented on HUDI-8820: ------------------------------------------- tried w/ latest master {code:java} scala> spark.sql("SELECT * FROM default.lliangyu_table_mor").show(20, false) 25/03/20 12:40:53 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file 25/03/20 12:40:53 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 25/03/20 12:40:55 WARN ConfigUtils: The configuration key 'hoodie.compaction.record.merger.strategy' has been deprecated and may be removed in the future. Please use the new key 'hoodie.record.merge.strategy.id' instead. # WARNING: Unable to attach Serviceability Agent. Unable to attach even with module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.] +-------------------+---------------------+----------------------------------+----------------------+------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+ |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key |_hoodie_partition_path|_hoodie_file_name |event_id|event_date|event_name |event_ts |event_type| +-------------------+---------------------+----------------------------------+----------------------+------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+ |20250320123547228 |20250320123547228_0_0|event_id:103,event_date:2015-01-01|event_type=type4 |fbfe3829-49e6-4a9c-bc6b-e65bef1ef7a7-0_0-4-4_20250320123813410.parquet |103 |2015-01-01|event_name_234|2015-01-01T13:51:40.519832Z|type4 | |20250320123813410 |20250320123813410_0_1|event_id:107,event_date:2015-01-01|event_type=type4 |fbfe3829-49e6-4a9c-bc6b-e65bef1ef7a7-0_0-4-4_20250320123813410.parquet |107 |2015-01-01|event_name_944|2015-01-01T13:51:45.019544Z|type4 | |20250320123507990 |20250320123507990_0_0|event_id:102,event_date:2015-01-01|event_type=type3 |93908a56-bb69-4a9d-9d9f-4e6648bb3475-0_0-4-4_20250320123744436.parquet |102 |2015-01-01|event_name_345|2015-01-01T13:51:40.417052Z|type3 | |20250320123744436 |20250320123744436_0_1|event_id:106,event_date:2015-01-01|event_type=type3 |93908a56-bb69-4a9d-9d9f-4e6648bb3475-0_0-4-4_20250320123744436.parquet |106 |2015-01-01|event_name_890|2015-01-01T13:51:44.735360Z|type3 | |20250320123427991 |20250320123427991_0_0|event_id:101,event_date:2015-01-01|event_type=type2 |1b0b76e9-e17b-4469-b3bd-4bc9937a5137-0_0-4-4_20250320123954268.parquet |101 |2015-01-01|event_name_546|2015-01-01T12:14:58.597216Z|type2 | |20250320123704952 |20250320123704952_0_1|event_id:105,event_date:2015-01-01|event_type=type2 |1b0b76e9-e17b-4469-b3bd-4bc9937a5137-0_0-4-4_20250320123954268.parquet |105 |2015-01-01|event_name_678|2015-01-01T13:51:42.248818Z|type2 | |20250320123954268 |20250320123954268_0_2|event_id:109,event_date:2015-01-01|event_type=type2 |1b0b76e9-e17b-4469-b3bd-4bc9937a5137-0_0-4-4_20250320123954268.parquet |109 |2015-01-01|event_name_567|2015-01-01T13:51:45.369689Z|type2 | |20250320122512801 |20250320122512801_0_0|event_id:100,event_date:2015-01-01|event_type=type1 |822deec4-7772-469b-bf6c-a5ad3c1ee52a-0_0-10-17_20250320123848146.parquet|100 |2015-01-01|event_name_900|2015-01-01T13:51:39.340396Z|type1 | |20250320123635123 |20250320123635123_0_1|event_id:104,event_date:2015-01-01|event_type=type1 |822deec4-7772-469b-bf6c-a5ad3c1ee52a-0_0-10-17_20250320123848146.parquet|104 |2015-01-01|event_name_123|2015-01-01T12:15:00.512679Z|type1 | |20250320123848146 |20250320123848146_0_2|event_id:108,event_date:2015-01-01|event_type=type1 |822deec4-7772-469b-bf6c-a5ad3c1ee52a-0_0-10-17_20250320123848146.parquet|108 |2015-01-01|event_name_456|2015-01-01T13:51:45.208007Z|type1 | +-------------------+---------------------+----------------------------------+----------------------+------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+ {code} > Hudi 1.0 Spark SQL failed to query all rows written by Backward Writer > ---------------------------------------------------------------------- > > Key: HUDI-8820 > URL: https://issues.apache.org/jira/browse/HUDI-8820 > Project: Apache Hudi > Issue Type: Sub-task > Affects Versions: 1.0.0 > Reporter: Leon Lin > Assignee: Lokesh Jain > Priority: Blocker > Fix For: 1.0.2 > > Attachments: 8820 investigation-2025011316183193.pdf, > image-2025-01-10-16-48-17-766.png > > Time Spent: 3h > Remaining Estimate: 2h > > Hudi 1.0 failed to read all the data in table created by backward writer. > When reading the same table using Hudi 0.14.0 returns correct results. > *Reproduction steps:* > {code:java} > 1. Create a table using Hudi 0.14.0 / Spark 3.5.0 > spark-shell --jars /usr/lib/hudi/hudi-spark-bundle.jar \ > --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \ > --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog" > \ > --conf > "spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" > spark.sql( > """ > |CREATE TABLE lliangyu_table_mor ( > | event_id INT, > | event_date STRING, > | event_name STRING, > | event_ts STRING, > | event_type STRING > |) USING hudi > | OPTIONS( > | type = 'mor', > | primaryKey = 'event_id,event_date', > | preCombileField = 'event_ts', > | hoodie.write.table.version = 6, > | hoodie.compact.inline = 'true', > | hoodie.compact.inline.max.delta.commits = 2 > |) > |PARTITIONED BY (event_type) > |LOCATION 's3://[bucketname]/warehouse/hudi/lliangyu_table_mor'; > """.stripMargin){code} > {code:java} > 2. Insert records using Hudi 1.0 backward writer > spark-shell --jars /usr/lib/hudi/hudi-spark3-bundle_2.12-1.0.0.jar \ > --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \ > --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog" > \ > --conf > "spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" > spark.sql("set hoodie.write.table.version=6") > spark.sql("set hoodie.compact.inline='true'") > spark.sql("set hoodie.compact.inline.max.delta.commits=2") > val insertStatements = Seq( "INSERT INTO lliangyu_table_mor VALUES (100, > '2015-01-01', 'event_name_900', '2015-01-01T13:51:39.340396Z', 'type1');", > "INSERT INTO lliangyu_table_mor VALUES (101, '2015-01-01', 'event_name_546', > '2015-01-01T12:14:58.597216Z', 'type2');", "INSERT INTO lliangyu_table_mor > VALUES (102, '2015-01-01', 'event_name_345', '2015-01-01T13:51:40.417052Z', > 'type3');", "INSERT INTO lliangyu_table_mor VALUES (103, '2015-01-01', > 'event_name_234', '2015-01-01T13:51:40.519832Z', 'type4');", "INSERT INTO > lliangyu_table_mor VALUES (104, '2015-01-01', 'event_name_123', > '2015-01-01T12:15:00.512679Z', 'type1');", "INSERT INTO lliangyu_table_mor > VALUES (105, '2015-01-01', 'event_name_678', '2015-01-01T13:51:42.248818Z', > 'type2');", "INSERT INTO lliangyu_table_mor VALUES (106, '2015-01-01', > 'event_name_890', '2015-01-01T13:51:44.735360Z', 'type3');", "INSERT INTO > lliangyu_table_mor VALUES (107, '2015-01-01', 'event_name_944', > '2015-01-01T13:51:45.019544Z', 'type4');", "INSERT INTO lliangyu_table_mor > VALUES (108, '2015-01-01', 'event_name_456', '2015-01-01T13:51:45.208007Z', > 'type1');", "INSERT INTO lliangyu_table_mor VALUES (109, '2015-01-01', > 'event_name_567', '2015-01-01T13:51:45.369689Z', 'type2');", "INSERT INTO > lliangyu_table_mor VALUES (110, '2015-01-01', 'event_name_789', > '2015-01-01T12:15:05.664947Z', 'type3');" ) > insertStatements.foreach { query => spark.sql(query) } > spark.sql("SELECT * FROM default.lliangyu_table_mor").show(false); > +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+ > |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key > |_hoodie_partition_path|_hoodie_file_name |event_id|event_date|event_name > |event_ts |event_type| > +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+ > |20250103202545187 > |20250103202545187_0_0|event_id:110,event_date:2015-01-01|event_type=type3 > |0c873425-55d1-42bd-886f-230726276f3d-0_0-166-4765_20250103202545187.parquet|110 > |2015-01-01|event_name_789|2015-01-01T12:15:05.664947Z|type3 | > |20250103202501108 > |20250103202501108_0_0|event_id:108,event_date:2015-01-01|event_type=type1 > |b935d179-56b3-4f81-81e4-8bb0cf97c873-0_0-131-4218_20250103202501108.parquet|108 > |2015-01-01|event_name_456|2015-01-01T13:51:45.208007Z|type1 | > +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)