[jira] [Updated] (KUDU-3429) Refactor CompactRowSetsOp to run on a pre-determined memory budget

Alexey Serbin (Jira) Thu, 23 May 2024 12:37:05 -0700


     [ 
https://issues.apache.org/jira/browse/KUDU-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Serbin updated KUDU-3429:
--------------------------------
    Description: 
[KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory 
budgeting for running CompactRowSetsOp maintenance operations.  On its nature, 
that provides an interim approach adding memory budgeting on top of the current 
CompactRowSetsOp implementation as-is.

Ideally, the implementation of CompactRowSetsOp should be refactored to merge 
the deltas in participating rowsets sequentially, chunk by chunk, persisting 
the results and allocating memory just for small bunch of processed deltas, not 
loading all the deltas at once.

This JIRA item is to track the work in the context outlined above.

Key points to address in this scope:
* even if it's a merge-like operation by its nature, the current implementation 
of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas 
at once, and it keeps all the preliminary results in the memory as well before 
persisting the result data to disk
* the current implementation of CompactRowSetsOp loads all the UNDO deltas from 
the rowsets selected for compaction regardless whether they are ancient or not; 
it discards of the data sourced from the ancient deltas in the very end before 
persisting the result data

Also, while keeping memory usage on a predetermined budget, the new 
implementation for CompactRowSetsOp should strive to avoid IO multiplication as 
much as possible.


  was:
[KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory 
budgeting for running CompactRowSetsOp maintenance operations.  On its nature, 
that provides an interim approach adding memory budgeting on top of the current 
CompactRowSetsOp implementation as-is.

Ideally, the implementation of CompactRowSetsOp should be refactored to merge 
the deltas in participating rowsets sequentially, chunk by chunk, persisting 
the results and allocating memory just for small bunch of processed deltas, not 
loading all the deltas at once.

This JIRA item is to track the work in the context outlined above.

Below are a key points to address in this scope:
* even if it's a merge-like operation by its nature, the current implementation 
of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas 
at once, and it keeps all the preliminary results in the memory as well before 
persisting the result data to disk
* the current implementation of CompactRowSetsOp loads all the UNDO deltas from 
the rowsets selected for compaction regardless whether they are ancient or not; 
it discards of the data sourced from the ancient deltas in the very end before 
persisting the result data

Also, while keeping memory usage on a predetermined budget, the new 
implementation for CompactRowSetsOp should strive to avoid IO multiplication as 
much as possible.



> Refactor CompactRowSetsOp to run on a pre-determined memory budget 
> -------------------------------------------------------------------
>
>                 Key: KUDU-3429
>                 URL: https://issues.apache.org/jira/browse/KUDU-3429
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Alexey Serbin
>            Priority: Major
>
> [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory 
> budgeting for running CompactRowSetsOp maintenance operations.  On its 
> nature, that provides an interim approach adding memory budgeting on top of 
> the current CompactRowSetsOp implementation as-is.
> Ideally, the implementation of CompactRowSetsOp should be refactored to merge 
> the deltas in participating rowsets sequentially, chunk by chunk, persisting 
> the results and allocating memory just for small bunch of processed deltas, 
> not loading all the deltas at once.
> This JIRA item is to track the work in the context outlined above.
> Key points to address in this scope:
> * even if it's a merge-like operation by its nature, the current 
> implementation of CompactRowSetsOp allocates all the memory necessary to load 
> the UNDO deltas at once, and it keeps all the preliminary results in the 
> memory as well before persisting the result data to disk
> * the current implementation of CompactRowSetsOp loads all the UNDO deltas 
> from the rowsets selected for compaction regardless whether they are ancient 
> or not; it discards of the data sourced from the ancient deltas in the very 
> end before persisting the result data
> Also, while keeping memory usage on a predetermined budget, the new 
> implementation for CompactRowSetsOp should strive to avoid IO multiplication 
> as much as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KUDU-3429) Refactor CompactRowSetsOp to run on a pre-determined memory budget

Reply via email to