Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2240660130 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3133626834 EPIC! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-28 Thread via GitHub
2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3130543883 > Amazing work! Aside from updating to `main` what do users of DataFusion need to do to use this improvement? Nothing else needed, it will be triggered automatically and it

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-28 Thread via GitHub
adriangb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3130446693 Amazing work! Aside from updating to `main` what do users of DataFusion need to do to use this improvement? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-28 Thread via GitHub
2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3130380274 Thanks everyone @rluvaton @ding-young and others who have helped along the way 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-28 Thread via GitHub
2010YOUY01 merged PR #15700: URL: https://github.com/apache/datafusion/pull/15700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-25 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230996830 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-25 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230995605 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-25 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230992741 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -17,11 +17,10 @@ //! Define the `SpillManager` struct, which is responsible for reading and writ

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230075680 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
Copilot commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230055548 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -17,11 +17,10 @@ //! Define the `SpillManager` struct, which is responsible for reading and writi

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227706228 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -140,3 +210,19 @@ impl SpillManager { Ok(spawn_buffered(stream, self.batch_read_buffer_capa

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227706228 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -140,3 +210,19 @@ impl SpillManager { Ok(spawn_buffered(stream, self.batch_read_buffer_capa

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-23 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227210285 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -140,3 +210,19 @@ impl SpillManager { Ok(spawn_buffered(stream, self.batch_read_buffer_ca

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-23 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3109894011 can you please re-review I don't believe there are any actionable comments from my side to review -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-23 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2226469366 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-23 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2226461212 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-23 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2226402212 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221120019 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221120019 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221118843 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219474642 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219148029 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -125,6 +133,156 @@ impl SpillManager { self.spill_record_batch_and_finish(&batches, request

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219057789 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219143884 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219141893 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219098075 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -125,6 +133,156 @@ impl SpillManager { self.spill_record_batch_and_finish(&batches, request

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219057789 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218214013 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -125,6 +133,156 @@ impl SpillManager { self.spill_record_batch_and_finish(&batches, reque

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-19 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2217240477 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-19 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2217220186 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3084017814 @2010YOUY01 I've updated based on your comments and commented back on some -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213305286 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213294534 ## datafusion/physical-plan/src/sorts/streaming_merge.rs: ## @@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> { enable_round_robin_tie_breake

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213289055 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream { sort_batch(&batch, &expr, No

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213223988 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212846002 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212828103 ## datafusion/core/tests/fuzz_cases/sort_fuzz.rs: ## @@ -377,3 +388,335 @@ fn make_staggered_i32_utf8_batches(len: usize) -> Vec { batches } + Review Co

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212827518 ## datafusion/core/tests/fuzz_cases/sort_fuzz.rs: ## @@ -377,3 +388,335 @@ fn make_staggered_i32_utf8_batches(len: usize) -> Vec { batches } + +#[tokio::

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212846002 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-08 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191875936 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream { sort_batch(&batch, &expr,

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-08 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191760246 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream { sort_batch(&batch, &expr, No

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-07 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191376010 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream { sort_batch(&batch, &expr,

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-07 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191389061 ## datafusion/physical-plan/src/sorts/streaming_merge.rs: ## @@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> { enable_round_robin_tie_brea

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-07 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191383325 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-06 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3043682960 > > I would appreciate it, it would greatly help me > > @rluvaton I opened a pr on your fork. Would you take a look when you have some time? I **really** appriciate the

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-06 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3043202973 > I would appreciate it, it would greatly help me @rluvaton I opened a pr on your fork. Would you take a look when you have some time? -- This is an automated message fr

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036807094 I would appreciate it, it would greatly help me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036770691 @rluvaton If you’d like, I can send a PR to your (fork's) branch that resolve merge conflicts since I already have one. Anyway there were only minor diffs to handle when I rebased

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036724762 So should I fix this PR conflicts? It seems like this pr has a change to be merged -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3034804482 https://github.com/apache/datafusion/pull/15700#discussion_r2041372025 I have a idea to fix this concern: adding a max merge degree configuration, if either a. SPM's est

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-03 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3031592289 I've rebased this branch on the latest main and tested whether estimated size changes after we load `RecordBatch` which was compressed with `lz4_frame` into memory. The result of

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-06-30 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3021762356 Hi @adriangb, thanks for raising this point. I'm currently reviewing both this PR and the other cascading merge sort PR (https://github.com/apache/datafusion/pull/15610). I'm not

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-06-30 Thread via GitHub
adriangb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3021699053 In the interest of this valuable work not being lost, is there any way that https://github.com/apache/datafusion/pull/15700#discussion_r2041372025 could be addressed by a method tha

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906947109 > > @2010YOUY01 and @ding-young I wonder if you can review this PR again to help @rluvaton get it merged? > > > > > > Specifically if it needs more tests perhaps you c

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906805485 > @2010YOUY01 and @ding-young I wonder if you can review this PR again to help @rluvaton get it merged? > > Specifically if it needs more tests perhaps you can help identify

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906803554 @alamb Sure! I may not be able to provide a detailed review right away, but I can definitely help by running the tests added in the PR locally and looking into memory accounting f

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-24 Thread via GitHub
alamb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906787792 @2010YOUY01 and @ding-young I wonder if you can review this PR again to help @rluvaton get it merged? Specifically if it needs more tests perhaps you can help identify which are

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-05-23 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2104892196 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-15 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2046041614 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_bat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-15 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2045582560 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041878079 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-14 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041629246 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_bat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-13 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041372025 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_bat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-13 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041222937 ## datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs: ## @@ -753,3 +765,226 @@ async fn test_single_mode_aggregate_with_spill() -> Result<()> { Ok(()) }

[PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-13 Thread via GitHub
rluvaton opened a new pull request, #15700: URL: https://github.com/apache/datafusion/pull/15700 ## Which issue does this PR close? - Closes #14692. ## Rationale for this change We need merge sort that does not fail with out of memory ## What changes are included in th