[ https://issues.apache.org/jira/browse/KUDU-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3619: -------------------------------- Description: With the functionality introduced with [ad920e69f|https://github.com/apache/kudu/commit/ad920e69fcd67ceefa25ea81a38a10a27d9e3afc] doesn't handle the appearance of an empty rowset as the result of major delta compaction scheduled, and that leads to errors like below once it's run its course: {noformat} W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T 64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption: Failed major delta compaction on RowSet(1675): No min key found: CFile base data in RowSet(1675) {noformat} Similarly, the {{mt-tablet-test}} is sporadically failing due to the same issue when the test workload happens to create similar situation with all-the-rows-deleted rowsets: {noformat} MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction: src/kudu/tablet/mt-tablet-test.cc:489: Failure Failed Bad status: Corruption: Failed major delta compaction on RowSet(1): No min key found: CFile base data in RowSet(1) {noformat} There is a simple test scenario that triggers the issue: [https://gerrit.cloudera.org/#/c/21809/|https://gerrit.cloudera.org/#/c/21809/]. As a workaround, it's possible to set the {{\-\-all_delete_op_delta_file_cnt_for_compaction}} to a very high value, e.g. 1000000. To address the issue properly, it's necessary to update the major delta compaction code to handle situations where the result rowset is completely empty. In theory, swapping out the result rowset with an empty one should be enough: for example, see how it's done in [changelist 705954872|https://github.com/apache/kudu/commit/705954872dc86238556456abed0a879bb1462e51]. was: With the functionality introduced with [ad920e69f|https://github.com/apache/kudu/commit/ad920e69fcd67ceefa25ea81a38a10a27d9e3afc] doesn't handle the appearance of an empty rowset as the result of major delta compaction scheduled, and that leads to errors like below once it's run its course: {noformat} W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T 64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption: Failed major delta compaction on RowSet(1675): No min key found: CFile base data in RowSet(1675) {noformat} Similarly, the {{mt-tablet-test}} is sporadically failing due to the same issue when the test workload happens to create similar situation with all-the-rows-deleted rowsets: {noformat} MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction: src/kudu/tablet/mt-tablet-test.cc:489: Failure Failed Bad status: Corruption: Failed major delta compaction on RowSet(1): No min key found: CFile base data in RowSet(1) {noformat} There is a simple test scenario that triggers the issue: [https://gerrit.cloudera.org/#/c/21809/|https://gerrit.cloudera.org/#/c/21809/]. As a workaround, it's possible to set the {{\-\-all_delete_op_delta_file_cnt_for_compaction}} to a very high value, e.g. 1000000. To address the issue properly, it's necessary to update the major delta compaction code to handle situations where the result rowset is completely empty. In theory, swapping the rowset with an empty one should be enough: for example, see how it's done in [changelist 705954872|https://github.com/apache/kudu/commit/705954872dc86238556456abed0a879bb1462e51]. > The 'supplement to GC algorithm' breaks major delta compaction > -------------------------------------------------------------- > > Key: KUDU-3619 > URL: https://issues.apache.org/jira/browse/KUDU-3619 > Project: Kudu > Issue Type: Bug > Components: compaction, tserver > Affects Versions: 1.17.0 > Reporter: Alexey Serbin > Priority: Major > > With the functionality introduced with > [ad920e69f|https://github.com/apache/kudu/commit/ad920e69fcd67ceefa25ea81a38a10a27d9e3afc] > doesn't handle the appearance of an empty rowset as the result of major > delta compaction scheduled, and that leads to errors like below once it's run > its course: > {noformat} > W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T > 64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major > delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption: > Failed major delta compaction on RowSet(1675): No min key found: CFile base > data in RowSet(1675) > {noformat} > Similarly, the {{mt-tablet-test}} is sporadically failing due to the same > issue when the test workload happens to create similar situation with > all-the-rows-deleted rowsets: > {noformat} > MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction: > src/kudu/tablet/mt-tablet-test.cc:489: Failure > Failed > Bad status: Corruption: Failed major delta compaction on RowSet(1): No min > key found: CFile base data in RowSet(1) > {noformat} > There is a simple test scenario that triggers the issue: > [https://gerrit.cloudera.org/#/c/21809/|https://gerrit.cloudera.org/#/c/21809/]. > As a workaround, it's possible to set the > {{\-\-all_delete_op_delta_file_cnt_for_compaction}} to a very high value, > e.g. 1000000. > To address the issue properly, it's necessary to update the major delta > compaction code to handle situations where the result rowset is completely > empty. In theory, swapping out the result rowset with an empty one should be > enough: for example, see how it's done in [changelist > 705954872|https://github.com/apache/kudu/commit/705954872dc86238556456abed0a879bb1462e51]. -- This message was sent by Atlassian Jira (v8.20.10#820010)