[jira] [Commented] (KUDU-1625) Schedule compaction on rowsets with high percentage of deleted data

ASF subversion and git services (Jira) Wed, 25 Sep 2024 13:51:31 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884781#comment-17884781
 ]


ASF subversion and git services commented on KUDU-1625:
-------------------------------------------------------

Commit 05043e6aba6ab45c1b77de9f0762de3dfc5a54c0 in kudu's branch 
refs/heads/branch-1.17.x from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=05043e6ab ]

KUDU-3619 disable KUDU-3367 behavior by default

As it turned out, KUDU-3367 has introduced a regression due to
a deficiency in its implementation, where major compactions would fail
with errors like below if it had kicked in:

  Corruption: Failed major delta compaction on RowSet(1): No min key found: 
CFile base data in RowSet(1)

Since KUDU-3367 isn't quite relevant in Kudu versions of 1.12.0 and
newer when working with data that supports live row count (see
KUDU-1625), a quick-and-dirty fix is to set the default value for the
corresponding flag --all_delete_op_delta_file_cnt_for_compaction
to a value that effectively disables KUDU-3367 behavior.
This patch does exactly so.

Change-Id: Iec0719462e379b7a0fb05ca011bb9cdd991a58ef
Reviewed-on: http://gerrit.cloudera.org:8080/21848
Reviewed-by: KeDeng <kdeng...@gmail.com>
Tested-by: Alexey Serbin <ale...@apache.org>
(cherry picked from commit 3666d2026d48adb5ff636321ef22320a8af5facb)
  Conflicts:
    src/kudu/tablet/delta_tracker.cc
Reviewed-on: http://gerrit.cloudera.org:8080/21855
Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com>


> Schedule compaction on rowsets with high percentage of deleted data
> -------------------------------------------------------------------
>
>                 Key: KUDU-1625
>                 URL: https://issues.apache.org/jira/browse/KUDU-1625
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tablet
>    Affects Versions: 1.0.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Although with KUDU-236 we can now remove rows that were deleted prior to the 
> ancient history mark, we don't actively schedule compactions based on deleted 
> rows. So, if for example we have a fully compacted table and issue a DELETE 
> for every row, the data size actually does not change, because no compactions 
> are triggered.
> We need some way to notice the fact that the ratio of deletes to rows is high 
> and decide to compact those rowsets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-1625) Schedule compaction on rowsets with high percentage of deleted data

Reply via email to