[ https://issues.apache.org/jira/browse/KUDU-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3429: -------------------------------- Description: [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory budgeting for running CompactRowSetsOp maintenance operations. On its nature, that provides an interim approach adding memory budgeting on top of the current CompactRowSetsOp implementation as-is. Ideally, the implementation of CompactRowSetsOp should be refactored to merge the deltas in participating rowsets sequentially, chunk by chunk, persisting the results and allocating memory just for small bunch of processed deltas, not loading all the deltas at once. This JIRA item is to track the work in the context outlined above. Key points to address in this scope: * even if it's a merge-like operation by its nature, the current implementation of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas at once, and it keeps all the preliminary results in the memory as well before persisting the result data to disk * the current implementation of CompactRowSetsOp loads all the UNDO deltas from the rowsets selected for compaction regardless whether they are ancient or not; it discards of the data sourced from the ancient deltas in the very end before persisting the result data Also, while keeping memory usage on a predetermined budget, the new implementation for CompactRowSetsOp should strive to avoid IO multiplication as much as possible. was: [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory budgeting for running CompactRowSetsOp maintenance operations. On its nature, that provides an interim approach adding memory budgeting on top of the current CompactRowSetsOp implementation as-is. Ideally, the implementation of CompactRowSetsOp should be refactored to merge the deltas in participating rowsets sequentially, chunk by chunk, persisting the results and allocating memory just for small bunch of processed deltas, not loading all the deltas at once. This JIRA item is to track the work in the context outlined above. Below are a key points to address in this scope: * even if it's a merge-like operation by its nature, the current implementation of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas at once, and it keeps all the preliminary results in the memory as well before persisting the result data to disk * the current implementation of CompactRowSetsOp loads all the UNDO deltas from the rowsets selected for compaction regardless whether they are ancient or not; it discards of the data sourced from the ancient deltas in the very end before persisting the result data Also, while keeping memory usage on a predetermined budget, the new implementation for CompactRowSetsOp should strive to avoid IO multiplication as much as possible. > Refactor CompactRowSetsOp to run on a pre-determined memory budget > ------------------------------------------------------------------- > > Key: KUDU-3429 > URL: https://issues.apache.org/jira/browse/KUDU-3429 > Project: Kudu > Issue Type: Improvement > Reporter: Alexey Serbin > Priority: Major > > [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory > budgeting for running CompactRowSetsOp maintenance operations. On its > nature, that provides an interim approach adding memory budgeting on top of > the current CompactRowSetsOp implementation as-is. > Ideally, the implementation of CompactRowSetsOp should be refactored to merge > the deltas in participating rowsets sequentially, chunk by chunk, persisting > the results and allocating memory just for small bunch of processed deltas, > not loading all the deltas at once. > This JIRA item is to track the work in the context outlined above. > Key points to address in this scope: > * even if it's a merge-like operation by its nature, the current > implementation of CompactRowSetsOp allocates all the memory necessary to load > the UNDO deltas at once, and it keeps all the preliminary results in the > memory as well before persisting the result data to disk > * the current implementation of CompactRowSetsOp loads all the UNDO deltas > from the rowsets selected for compaction regardless whether they are ancient > or not; it discards of the data sourced from the ancient deltas in the very > end before persisting the result data > Also, while keeping memory usage on a predetermined budget, the new > implementation for CompactRowSetsOp should strive to avoid IO multiplication > as much as possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)