alamb commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2240660130
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,449 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contribut
alamb commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3133626834
EPIC!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscr
2010YOUY01 commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3130543883
> Amazing work! Aside from updating to `main` what do users of DataFusion
need to do to use this improvement?
Nothing else needed, it will be triggered automatically and it
adriangb commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3130446693
Amazing work! Aside from updating to `main` what do users of DataFusion need
to do to use this improvement?
--
This is an automated message from the Apache Git Service.
To respond
2010YOUY01 commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3130380274
Thanks everyone @rluvaton @ding-young and others who have helped along the
way 🚀
--
This is an automated message from the Apache Git Service.
To respond to the message, please
2010YOUY01 merged PR #15700:
URL: https://github.com/apache/datafusion/pull/15700
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dat
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230996830
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,449 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230995605
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,449 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230992741
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -17,11 +17,10 @@
//! Define the `SpillManager` struct, which is responsible for reading and
writ
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230075680
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,449 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more cont
Copilot commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230055548
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -17,11 +17,10 @@
//! Define the `SpillManager` struct, which is responsible for reading and
writi
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227706228
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -140,3 +210,19 @@ impl SpillManager {
Ok(spawn_buffered(stream, self.batch_read_buffer_capa
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227706228
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -140,3 +210,19 @@ impl SpillManager {
Ok(spawn_buffered(stream, self.batch_read_buffer_capa
ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227210285
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -140,3 +210,19 @@ impl SpillManager {
Ok(spawn_buffered(stream, self.batch_read_buffer_ca
rluvaton commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3109894011
can you please re-review I don't believe there are any actionable comments
from my side to review
--
This is an automated message from the Apache Git Service.
To respond to the me
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2226469366
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2226461212
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2226402212
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221120019
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221120019
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221118843
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219474642
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219148029
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -125,6 +133,156 @@ impl SpillManager {
self.spill_record_batch_and_finish(&batches, request
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219057789
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,345 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219143884
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,345 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219141893
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,345 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219098075
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -125,6 +133,156 @@ impl SpillManager {
self.spill_record_batch_and_finish(&batches, request
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219057789
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,345 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218214013
##
datafusion/physical-plan/src/spill/spill_manager.rs:
##
@@ -125,6 +133,156 @@ impl SpillManager {
self.spill_record_batch_and_finish(&batches, reque
ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2217240477
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more cont
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2217220186
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,345 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more cont
rluvaton commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3084017814
@2010YOUY01 I've updated based on your comments and commented back on some
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213305286
##
datafusion/physical-plan/src/spill/get_size.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213294534
##
datafusion/physical-plan/src/sorts/streaming_merge.rs:
##
@@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> {
enable_round_robin_tie_breake
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213289055
##
datafusion/physical-plan/src/aggregates/row_hash.rs:
##
@@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream {
sort_batch(&batch, &expr, No
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213223988
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212846002
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212828103
##
datafusion/core/tests/fuzz_cases/sort_fuzz.rs:
##
@@ -377,3 +388,335 @@ fn make_staggered_i32_utf8_batches(len: usize) ->
Vec {
batches
}
+
Review Co
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212827518
##
datafusion/core/tests/fuzz_cases/sort_fuzz.rs:
##
@@ -377,3 +388,335 @@ fn make_staggered_i32_utf8_batches(len: usize) ->
Vec {
batches
}
+
+#[tokio::
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212846002
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contri
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191875936
##
datafusion/physical-plan/src/aggregates/row_hash.rs:
##
@@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream {
sort_batch(&batch, &expr,
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191760246
##
datafusion/physical-plan/src/aggregates/row_hash.rs:
##
@@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream {
sort_batch(&batch, &expr, No
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191376010
##
datafusion/physical-plan/src/aggregates/row_hash.rs:
##
@@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream {
sort_batch(&batch, &expr,
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191389061
##
datafusion/physical-plan/src/sorts/streaming_merge.rs:
##
@@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> {
enable_round_robin_tie_brea
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2191383325
##
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##
@@ -0,0 +1,342 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more cont
rluvaton commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3043682960
> > I would appreciate it, it would greatly help me
>
> @rluvaton I opened a pr on your fork. Would you take a look when you have
some time?
I **really** appriciate the
ding-young commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3043202973
> I would appreciate it, it would greatly help me
@rluvaton I opened a pr on your fork. Would you take a look when you have
some time?
--
This is an automated message fr
rluvaton commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036807094
I would appreciate it, it would greatly help me
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
ding-young commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036770691
@rluvaton If you’d like, I can send a PR to your (fork's) branch that
resolve merge conflicts since I already have one. Anyway there were only minor
diffs to handle when I rebased
rluvaton commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036724762
So should I fix this PR conflicts? It seems like this pr has a change to be
merged
--
This is an automated message from the Apache Git Service.
To respond to the message, please
2010YOUY01 commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3034804482
https://github.com/apache/datafusion/pull/15700#discussion_r2041372025
I have a idea to fix this concern: adding a max merge degree configuration,
if either
a. SPM's est
ding-young commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3031592289
I've rebased this branch on the latest main and tested whether estimated
size changes after we load `RecordBatch` which was compressed with `lz4_frame`
into memory. The result of
ding-young commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3021762356
Hi @adriangb, thanks for raising this point. I'm currently reviewing both
this PR and the other cascading merge sort PR
(https://github.com/apache/datafusion/pull/15610). I'm not
adriangb commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3021699053
In the interest of this valuable work not being lost, is there any way that
https://github.com/apache/datafusion/pull/15700#discussion_r2041372025 could be
addressed by a method tha
rluvaton commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906947109
> > @2010YOUY01 and @ding-young I wonder if you can review this PR again to
help @rluvaton get it merged?
>
> >
>
> > Specifically if it needs more tests perhaps you c
2010YOUY01 commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906805485
> @2010YOUY01 and @ding-young I wonder if you can review this PR again to
help @rluvaton get it merged?
>
> Specifically if it needs more tests perhaps you can help identify
ding-young commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906803554
@alamb Sure! I may not be able to provide a detailed review right away, but
I can definitely help by running the tests added in the PR locally and looking
into memory accounting f
alamb commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-2906787792
@2010YOUY01 and @ding-young I wonder if you can review this PR again to help
@rluvaton get it merged?
Specifically if it needs more tests perhaps you can help identify which are
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2104892196
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_batch
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2046041614
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_bat
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2045582560
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_batch
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041878079
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_batch
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041629246
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_bat
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_batch
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_batch
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_batch
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041372025
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_bat
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041222937
##
datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs:
##
@@ -753,3 +765,226 @@ async fn test_single_mode_aggregate_with_spill() ->
Result<()> {
Ok(())
}
rluvaton opened a new pull request, #15700:
URL: https://github.com/apache/datafusion/pull/15700
## Which issue does this PR close?
- Closes #14692.
## Rationale for this change
We need merge sort that does not fail with out of memory
## What changes are included in th
78 matches
Mail list logo