AdamGS commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3146689387
Ready here - https://github.com/apache/datafusion/pull/17020, I think I
didn't mess up the conflict resolution too much, I'll probably do another pass
to make sure.
I don't ful
adriangb commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3146547049
I think so! I haven't fully grokked what this PR means but I will say one of
the pieces of DataFusion we customize is partitioning / concurrency so I'm
interested to see what impact
AdamGS commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3146537884
I can take the work to rebase this PR and fix the conflicts, is there
interest?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
github-actions[bot] closed pull request #14411: feat: Support On-Demand
Repartition
URL: https://github.com/apache/datafusion/pull/14411
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
github-actions[bot] commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3111723405
Thank you for your contribution. Unfortunately, this pull request is stale
because it has been open 60 days with no activity. Please remove the stale
label or comment or
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2904301788
The conflicts seem very minimal @adriangb so if someone can fix the
conflicts, it should be in reviewable state again
--
This is an automated message from the Apache Git Service.
adriangb commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2888717383
I'm sad to see this go stale 😢 , sadly I also don't have the bandwith to
push it forward
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
github-actions[bot] commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2888711945
Thank you for your contribution. Unfortunately, this pull request is stale
because it has been open 60 days with no activity. Please remove the stale
label or comment or
adriangb commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2729232834
This is exciting!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
berkaysynnada commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2692276290
I got a bit off-topic, but we will focus on this promising work to drive it
to completion in the coming week, together with @mertak-synnada.
--
This is an automated message f
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2692083297
I actually anticipate higher memory consumption, particularly in systems
where the upstream portion of RepartitionExec generates results faster than the
downstream component process
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2690915249
I still got the higher memory usage compared to main. Is this something that
is planned to improve or you want to have it reviewed as is @Weijun-H ?
--
This is an automated messa
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2661425398
> > I ran some tests yesterday and I can confirm the runtime improvements. I
do get some high memory usage however especially with some queries (TPC-H Query
18 I believe) than when
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2660797386
> I ran some tests yesterday and I can confirm the runtime improvements. I
do get some high memory usage however especially with some queries (TPC-H Query
18 I believe) than when us
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1957051847
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1589 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1957051847
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1589 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2659343979
I ran some tests yesterday and I can confirm the runtime improvements.
I do get some high memory usage however especially with some queries (TPC-H
Query 18 I believe) than with t
Dandandan commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1956072347
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1589 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2658410453
> > > > I wonder why tpch_mem_sf10 is slower for some queries? Might it be
possible the created memtable is not created evenly because of the new round
robin (that might be fixable
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2654601057
Specifically, I think we can try [this
approach](https://github.com/apache/datafusion/pull/13707) together with
on-demand repartition 🤔
--
This is an automated message from the
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2654136409
> > > I wonder why tpch_mem_sf10 is slower for some queries? Might it be
possible the created memtable is not created evenly because of the new round
robin (that might be fixable e
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2654051392
> > I wonder why tpch_mem_sf10 is slower for some queries? Might it be
possible the created memtable is not created evenly because of the new round
robin (that might be fixable e.g.
berkaysynnada commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2652041558
> I wonder why tpch_mem_sf10 is slower for some queries? Might it be
possible the created memtable is not created evenly because of the new round
robin (that might be fixable e
Dandandan commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2651627436
I wonder why tpch_mem_sf10 is slower for some queries? Might it be possible
the created memtable is not created evenly because of the new round robin (that
might be fixable).
--
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646358808
I believe we cannot use the customized channel `DistributionReceiver`
currently, as `OnDemandRepartitionExec` can prevent channels from filling up
endlessly. Additionally, I noticed
berkaysynnada commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646294361
> The point is: I think there should be no memory-bloat issue in
TPCH/clickbench queries caused by `RepartitionExec`, just wondering do you have
any bad query can reproduce the
2010YOUY01 commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646218873
> Hi @2010YOUY01. I'd like to thank you firstly for this investigation. I
actually expect higher memory consumption—especially in systems where the
upstream part of the Repartitio
berkaysynnada commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646154037
> * No performance regression (benchmarks already showed)
> * Reduce memory footprint, for queries which batch can accumulate in
`RepartitionExec` (as the origin issue said)
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646145904
> Impressive work! I got a suggestion and a high-level question:
>
> ### Suggestion
> I think to justify this change, we have to make sure:
>
> * No performance regre
2010YOUY01 commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646119135
Impressive work! I got a suggestion and a high-level question:
### Suggestion
I think to justify this change, we have to make sure:
- No performance regression (benchma
mertak-synnada commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1946078413
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1362 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+
Weijun-H commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2641916193
I updated the latest benchmark results. It seems the `OnDemandRepartition`
improved performance on `clickbench_partitioned` and large datasets like
`tpch_sf50`. For `tpch_sf1` and `
berkaysynnada commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640985434
> Maybe I am missing something, but the benchmark numbers reported above
don't really show much of an improvement
this might be a silly question but, did you set the conf
ozankabak commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640838240
@Weijun-H did [some
benchmarks](https://github.com/synnada-ai/datafusion-upstream/pull/60) a while
back and the approach seemed promising in TPCH/SF50.
@mertak-synnada will
alamb commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640829028
> This is still in somewhat early stages, and there is work to do. But it
might be good to get feedback early on from the community as the performance of
this code is somewhat sensitiv
mertak-synnada commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1944579206
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1942878131
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1941260256
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1941261097
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1941257144
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1941255907
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
Weijun-H commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1941254980
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or
mertak-synnada commented on code in PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#discussion_r1938964242
##
datafusion/physical-plan/src/repartition/on_demand_repartition.rs:
##
@@ -0,0 +1,1320 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+
ozankabak commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2630289724
This is still in somewhat early stages, and there is work to do. But it
might be good to get feedback early on from the community as the performance of
this code is somewhat sensit
ozankabak commented on PR #14411:
URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2630036258
@Weijun-H has been working on this with the Synnada team for a while. The
initial benchmark results were promising, so we decided to continue development
while receiving community
45 matches
Mail list logo