[ https://issues.apache.org/jira/browse/KUDU-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin reassigned KUDU-3429: ----------------------------------- Assignee: Alexey Serbin > Refactor CompactRowSetsOp to run on a pre-determined memory budget > ------------------------------------------------------------------- > > Key: KUDU-3429 > URL: https://issues.apache.org/jira/browse/KUDU-3429 > Project: Kudu > Issue Type: Improvement > Reporter: Alexey Serbin > Assignee: Alexey Serbin > Priority: Major > > [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory > budgeting for running CompactRowSetsOp maintenance operations. On its > nature, that provides an interim approach adding memory budgeting on top of > the current CompactRowSetsOp implementation as-is. > Ideally, the implementation of CompactRowSetsOp should be refactored to merge > the deltas in participating rowsets sequentially, chunk by chunk, persisting > the results and allocating memory just for small bunch of processed deltas, > not loading all the deltas at once. > This JIRA item is to track the work in the context outlined above. > Key points to address in this scope: > * even if it's a merge-like operation by its nature, the current > implementation of CompactRowSetsOp allocates all the memory necessary to load > the UNDO deltas at once, and it keeps all the preliminary results in the > memory as well before persisting the result data to disk > * the current implementation of CompactRowSetsOp loads all the UNDO deltas > from the rowsets selected for compaction regardless whether they are ancient > or not; it discards of the data sourced from the ancient deltas in the very > end before persisting the result data > Also, while keeping memory usage on a predetermined budget, the new > implementation for CompactRowSetsOp should strive to avoid IO multiplication > as much as possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)