[jira] [Assigned] (KUDU-3429) Refactor CompactRowSetsOp to run on a pre-determined memory budget

Alexey Serbin (Jira) Thu, 23 May 2024 12:37:06 -0700


     [ 
https://issues.apache.org/jira/browse/KUDU-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Serbin reassigned KUDU-3429:
-----------------------------------

    Assignee: Alexey Serbin

> Refactor CompactRowSetsOp to run on a pre-determined memory budget 
> -------------------------------------------------------------------
>
>                 Key: KUDU-3429
>                 URL: https://issues.apache.org/jira/browse/KUDU-3429
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>
> [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory 
> budgeting for running CompactRowSetsOp maintenance operations.  On its 
> nature, that provides an interim approach adding memory budgeting on top of 
> the current CompactRowSetsOp implementation as-is.
> Ideally, the implementation of CompactRowSetsOp should be refactored to merge 
> the deltas in participating rowsets sequentially, chunk by chunk, persisting 
> the results and allocating memory just for small bunch of processed deltas, 
> not loading all the deltas at once.
> This JIRA item is to track the work in the context outlined above.
> Key points to address in this scope:
> * even if it's a merge-like operation by its nature, the current 
> implementation of CompactRowSetsOp allocates all the memory necessary to load 
> the UNDO deltas at once, and it keeps all the preliminary results in the 
> memory as well before persisting the result data to disk
> * the current implementation of CompactRowSetsOp loads all the UNDO deltas 
> from the rowsets selected for compaction regardless whether they are ancient 
> or not; it discards of the data sourced from the ancient deltas in the very 
> end before persisting the result data
> Also, while keeping memory usage on a predetermined budget, the new 
> implementation for CompactRowSetsOp should strive to avoid IO multiplication 
> as much as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KUDU-3429) Refactor CompactRowSetsOp to run on a pre-determined memory budget

Reply via email to