[ https://issues.apache.org/jira/browse/HUDI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yann Byron updated HUDI-3213: ----------------------------- Description: when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`, `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and third times and 50 records updated in the fifth updated, and 2 records inserted in the six time. The right answer should be 2, and 150 records should not be counted in. The reason is that `compaction` has changed the commit time of some records which are updated later and stored in log file. {code:java} val hudiIncDF6 = spark.read.format("org.apache.hudi") .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL) .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time) .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time) .load(basePath) // compaction updated 150 rows + inserted 2 new row assertEquals(152, hudiIncDF6.count()) {code} was: when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`, `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and third times and 50 records updated in the fifth updated, and 2 records inserted in the six time. The right answer should be 2, and 150 records should not be counted in. {code:java} val hudiIncDF6 = spark.read.format("org.apache.hudi") .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL) .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time) .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time) .load(basePath) // compaction updated 150 rows + inserted 2 new row assertEquals(152, hudiIncDF6.count()) {code} > compaction should not change the commit time > -------------------------------------------- > > Key: HUDI-3213 > URL: https://issues.apache.org/jira/browse/HUDI-3213 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration, Writer Core > Reporter: Yann Byron > Assignee: Yann Byron > Priority: Major > Fix For: 0.11.0 > > > when finish the sixth operation where two records inserted and `compaction` > in `TestMORDataSource.testCount`, `hudiIncDF6.count()` returns 152. Because > there are 150 records which just have finished the `compaction` and consist > of 100 records updated in the second and third times and 50 records updated > in the fifth updated, and 2 records inserted in the six time. > The right answer should be 2, and 150 records should not be counted in. > The reason is that `compaction` has changed the commit time of some records > which are updated later and stored in log file. > {code:java} > val hudiIncDF6 = spark.read.format("org.apache.hudi") > .option(DataSourceReadOptions.QUERY_TYPE.key, > DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL) > .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time) > .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time) > .load(basePath) > // compaction updated 150 rows + inserted 2 new row > assertEquals(152, hudiIncDF6.count()) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)