[
https://issues.apache.org/jira/browse/KUDU-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin updated KUDU-3619:
--------------------------------
Description:
With the functionality introduced with
[ad920e69f|https://github.com/apache/kudu/commit/ad920e69fcd67ceefa25ea81a38a10a27d9e3afc]
doesn't handle the appearance of an empty rowset as the result of major delta
compaction scheduled, and that leads to errors like below once it's run its
course:
{noformat}
W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T
64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major
delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption: Failed
major delta compaction on RowSet(1675): No min key found: CFile base data in
RowSet(1675)
{noformat}
Similarly, the {{mt-tablet-test}} is sporadically failing due to the same issue
when the test workload happens to create similar situation with
all-the-rows-deleted rowsets:
{noformat}
MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction:
src/kudu/tablet/mt-tablet-test.cc:489: Failure
Failed
Bad status: Corruption: Failed major delta compaction on RowSet(1): No min key
found: CFile base data in RowSet(1)
{noformat}
There is a simple test scenario that triggers the issue:
[https://gerrit.cloudera.org/#/c/21809/|https://gerrit.cloudera.org/#/c/21809/].
As a workaround, it's possible to set the
{{\-\-all_delete_op_delta_file_cnt_for_compaction}} to a very high value, e.g.
1000000.
To address the issue properly, it's necessary to update the major delta
compaction code to handle situations where the result rowset is completely
empty. In theory, swapping out the result rowset with an empty one should be
enough: for example, see how it's done in [changelist
705954872|https://github.com/apache/kudu/commit/705954872dc86238556456abed0a879bb1462e51].
was:
With the functionality introduced with
[ad920e69f|https://github.com/apache/kudu/commit/ad920e69fcd67ceefa25ea81a38a10a27d9e3afc]
doesn't handle the appearance of an empty rowset as the result of major delta
compaction scheduled, and that leads to errors like below once it's run its
course:
{noformat}
W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T
64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major
delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption: Failed
major delta compaction on RowSet(1675): No min key found: CFile base data in
RowSet(1675)
{noformat}
Similarly, the {{mt-tablet-test}} is sporadically failing due to the same issue
when the test workload happens to create similar situation with
all-the-rows-deleted rowsets:
{noformat}
MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction:
src/kudu/tablet/mt-tablet-test.cc:489: Failure
Failed
Bad status: Corruption: Failed major delta compaction on RowSet(1): No min key
found: CFile base data in RowSet(1)
{noformat}
There is a simple test scenario that triggers the issue:
[https://gerrit.cloudera.org/#/c/21809/|https://gerrit.cloudera.org/#/c/21809/].
As a workaround, it's possible to set the
{{\-\-all_delete_op_delta_file_cnt_for_compaction}} to a very high value, e.g.
1000000.
To address the issue properly, it's necessary to update the major delta
compaction code to handle situations where the result rowset is completely
empty. In theory, swapping the rowset with an empty one should be enough: for
example, see how it's done in [changelist
705954872|https://github.com/apache/kudu/commit/705954872dc86238556456abed0a879bb1462e51].
> The 'supplement to GC algorithm' breaks major delta compaction
> --------------------------------------------------------------
>
> Key: KUDU-3619
> URL: https://issues.apache.org/jira/browse/KUDU-3619
> Project: Kudu
> Issue Type: Bug
> Components: compaction, tserver
> Affects Versions: 1.17.0
> Reporter: Alexey Serbin
> Priority: Major
>
> With the functionality introduced with
> [ad920e69f|https://github.com/apache/kudu/commit/ad920e69fcd67ceefa25ea81a38a10a27d9e3afc]
> doesn't handle the appearance of an empty rowset as the result of major
> delta compaction scheduled, and that leads to errors like below once it's run
> its course:
> {noformat}
> W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T
> 64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major
> delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption:
> Failed major delta compaction on RowSet(1675): No min key found: CFile base
> data in RowSet(1675)
> {noformat}
> Similarly, the {{mt-tablet-test}} is sporadically failing due to the same
> issue when the test workload happens to create similar situation with
> all-the-rows-deleted rowsets:
> {noformat}
> MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction:
> src/kudu/tablet/mt-tablet-test.cc:489: Failure
> Failed
> Bad status: Corruption: Failed major delta compaction on RowSet(1): No min
> key found: CFile base data in RowSet(1)
> {noformat}
> There is a simple test scenario that triggers the issue:
> [https://gerrit.cloudera.org/#/c/21809/|https://gerrit.cloudera.org/#/c/21809/].
> As a workaround, it's possible to set the
> {{\-\-all_delete_op_delta_file_cnt_for_compaction}} to a very high value,
> e.g. 1000000.
> To address the issue properly, it's necessary to update the major delta
> compaction code to handle situations where the result rowset is completely
> empty. In theory, swapping out the result rowset with an empty one should be
> enough: for example, see how it's done in [changelist
> 705954872|https://github.com/apache/kudu/commit/705954872dc86238556456abed0a879bb1462e51].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)