alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3044919809
As a random aside, this is the kind of micro optimization that I would have
been terrified of making in C/C++ land without extreme care to avoid concurrent
access.
With Rust I
alamb commented on code in PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#discussion_r2187217486
##
datafusion/physical-plan/src/sorts/stream.rs:
##
@@ -76,8 +76,40 @@ impl FusedStreams {
}
}
+/// A pair of `Arc` that can be reused
+#[derive(Debug)]
+str
Dandandan merged PR #16647:
URL: https://github.com/apache/datafusion/pull/16647
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
comphead commented on code in PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#discussion_r2183175335
##
datafusion/physical-plan/src/sorts/stream.rs:
##
@@ -105,26 +110,57 @@ impl RowCursorStream {
})
.collect::>>()?;
-let stre
zhuqi-lucas commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032768991
> > π€: Benchmark completed
> > Details
> > ```
> > Comparing HEAD and reuse_rows
> >
> > Benchmark sort_tpch10.json
> >
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032747385
> π€: Benchmark completed
>
> Details
>
> ```
> Comparing HEAD and reuse_rows
>
> Benchmark sort_tpch10.json
>
>
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032716133
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark sort_tpch1.json
ββββ³
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032712753
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032712630
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark sort_tpch10.json
βββ
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032659676
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032659548
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark clickbench_extended.json
βββ
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032511251
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032383269
> @alamb could you maybe run it on `sort_tpch10`. Perhaps the difference is
different when using more cores (16 vs 10 I believe?).
I have queued this up and it should run in a fe
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3032327248
```
Benchmark sort_tpch10.json
ββββ³ββ³ββ³ββββ
β Queryβmain β r
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3031974821
> Sometimes, i found sort_tpch10 will get the more accurate or good result
when we optimize the merge part, because our in_mem sort buffer is 1MB, so the
sort_tpch will have less c
zhuqi-lucas commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3031932686
Sometimes, i found sort_tpch10 will get the more accurate or good result
when we optimize the merge part, because our in_mem sort buffer is 1MB, so the
sort_tpch will have less c
Dandandan commented on code in PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#discussion_r2180819618
##
datafusion/physical-plan/src/sorts/stream.rs:
##
@@ -105,26 +110,53 @@ impl RowCursorStream {
})
.collect::>>()?;
-let str
Dandandan commented on code in PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#discussion_r2180844635
##
datafusion/physical-plan/src/sorts/stream.rs:
##
@@ -105,26 +110,53 @@ impl RowCursorStream {
})
.collect::>>()?;
-let str
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3029069816
> This is quite cool @Dandandan
>
> My only real concern is that this code will be tricky to maintain and
could easily get reverted / regressed as part of a follow on change
alamb commented on code in PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#discussion_r2180789490
##
datafusion/physical-plan/src/sorts/stream.rs:
##
@@ -88,6 +90,9 @@ pub struct RowCursorStream {
streams: FusedStreams,
/// Tracks the memory used by `co
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028914931
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028969141
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark sort_tpch.json
ββββ³β
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028965756
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028965671
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark clickbench_extended.json
βββ
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028828620
> π€: Benchmark completed
>
> Details
>
> ```
> Comparing HEAD and reuse_rows
>
> Benchmark sort_tpch.json
>
> β
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028800817
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark sort_tpch.json
ββββ³β
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028797204
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028797121
π€: Benchmark completed
Details
```
Comparing HEAD and reuse_rows
Benchmark clickbench_extended.json
βββ
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028641060
> @alamb may I request some benchmark run?
I really need to figure out how to script this automatically. I will see if
I can get claude to do something for me
--
This is an au
alamb commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028639856
π€ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028611122
@alamb may I request some benchmark run?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3028609059
> I believe we can increase the in-place memory for sorting benchmark here,
here the default is 1MB.
>
> The result will largely affected by the in place sort memory buffer f
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027535311
Ah ok I see you changed `MemoryStream` to allow for this π€
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027527429
> I made the following changes to allow the test to support returning 8192
rows each time:
 ins
acking-you commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027354373
> I had a look at this benchmark. It seems it only is testing a single 1M
batch per partition/column?
You mean streaming back RecordBatch (e.g., batch of 8192 rows) instead
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027311548
> ```shell
> bench_merge_sorted_preserving
> ```
I had a look at this benchmark. It seems it only is testing a single 1M
batch per partition/column?
If so, it wouldn
acking-you commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027160766
This is the benchmark scenario where the test data has not been modified by
default(multi large string):
```sh
Benchmarking
bench_merge_sorted_preserving/multiple_large_stri
acking-you commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027041977
LGTM,I will try running the benchmark for
`datafusion/physical-plan/benches/sort_preserving_merge.rs` to see if there is
any improvement
--
This is an automated message from th
zhuqi-lucas commented on code in PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#discussion_r2179323995
##
datafusion/physical-plan/src/sorts/stream.rs:
##
@@ -105,26 +110,53 @@ impl RowCursorStream {
})
.collect::>>()?;
-let s
Dandandan commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3026648547
FYI @zhuqi-lucas @acking-you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
43 matches
Mail list logo