ctsk commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051758338
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -298,25 +299,15 @@ impl BatchPartitioner {
.into_iter()
.e
ctsk opened a new pull request, #15783:
URL: https://github.com/apache/datafusion/pull/15783
## Which issue does this PR close?
- Closes #15620.
## What changes are included in this PR?
The hash join seed is hard-coded to a different value that the
RepartitionExec seed.
codecov-commenter commented on PR #1663:
URL:
https://github.com/apache/datafusion-comet/pull/1663#issuecomment-2817252576
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1663?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
ctsk commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051758338
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -298,25 +299,15 @@ impl BatchPartitioner {
.into_iter()
.e
ctsk commented on PR #15757:
URL: https://github.com/apache/datafusion/pull/15757#issuecomment-2817276675
The implementation looks good to me. Two points:
1. The testing seems quite verbose - I briefly browsed around and think
`DFSchema::matches_arrow_schema` can help you shorten it.
2
ctsk commented on PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#issuecomment-2817240670
Does the `batches_until_yield` threshold need to be adjusted?
https://github.com/apache/datafusion/blob/91870b626b184a901c713fada3a1adb0eda2de7c/datafusion/physical-plan/src/repar
Rachelint commented on issue #15383:
URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2817566333
> However, I found the performance won't be better for Clickbench queries 4
and 7.
I think it may be possible that the test queries can't reflect the
improvement well.
2010YOUY01 commented on issue #15784:
URL: https://github.com/apache/datafusion/issues/15784#issuecomment-2817567103
It's a good idea. However, I think if the build side has a large HT and the
probe side has many `NULL`s, the existing implementation is likely to be
faster. Some statistics m
2010YOUY01 commented on PR #15782:
URL: https://github.com/apache/datafusion/pull/15782#issuecomment-2817587293
Thanks, I think it's good to go after adding some simple docs for such
configurations, in:
https://github.com/apache/datafusion/tree/main/benchmarks
and also the output of `.
chenkovsky commented on code in PR #15773:
URL: https://github.com/apache/datafusion/pull/15773#discussion_r2051868053
##
benchmarks/queries/clickbench/README.md:
##
@@ -155,7 +155,7 @@ WHERE
THEN split_part(split_part("URL", 'resolution=', 2), '&', 1)::INT
EL
guojidan commented on issue #15756:
URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2817457411
```
use std::sync::Arc;
use bytes::Bytes;
use datafusion::prelude::{CsvReadOptions, SessionContext};
use object_store::{memory::InMemory, path::Path, ObjectStore}
ctsk opened a new pull request, #15782:
URL: https://github.com/apache/datafusion/pull/15782
## Which issue does this PR close?
- Closes #15684
## Rationale for this change
When benchmarking, it is convenient to modify some datafusion settings
between runs and compare t
Dandandan commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051763714
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -298,25 +299,15 @@ impl BatchPartitioner {
.into_iter()
ctsk commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051758338
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -298,25 +299,15 @@ impl BatchPartitioner {
.into_iter()
.e
ctsk opened a new issue, #15784:
URL: https://github.com/apache/datafusion/issues/15784
### Is your feature request related to a problem or challenge?
The join implementation currently removes nulls during equality checking - I
believe it would be faster / more efficient if we did so
EmilyMatt opened a new issue, #1659:
URL: https://github.com/apache/datafusion-comet/issues/1659
This is a nitpick, but I've noticed that in the new interleaved shuffle,
when copying the data into the output data file, first the in-memory data is
written to the file, and only then is the co
EmilyMatt opened a new pull request, #1660:
URL: https://github.com/apache/datafusion-comet/pull/1660
This is very much a nitpick and not a necessary change, but I believe that
shuffle batches should be FIFO.
## Which issue does this PR close?
Closes #1659 .
## R
chenkovsky commented on issue #15755:
URL: https://github.com/apache/datafusion/issues/15755#issuecomment-2817103271
@jayzhan211 could you please give me an example sql? I tried to construct
test sql. but
https://github.com/apache/datafusion/blob/8a193c22f6c3f771bff16cb8f9608d764d2
jayzhan211 commented on issue #15755:
URL: https://github.com/apache/datafusion/issues/15755#issuecomment-2817104010
> @jayzhan211 could you please give me an example sql? I tried to construct
test sql. but
>
>
https://github.com/apache/datafusion/blob/8a193c22f6c3f771bff16cb8f9608
ion-elgreco closed pull request #908: refactor: rename to `with_column_renamed`
to `rename`
URL: https://github.com/apache/datafusion-python/pull/908
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
goldmedal commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2051721176
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -320,6 +336,68 @@ impl BatchPartitioner {
Ok((partition, batch))
ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817057508
Sorry for the silence. I've been extensively benchmarking this PR and the
results have been fairly mixed. I've also tried different thresholds for
coalescing. I plan on generating table
ctsk commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051665380
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -298,25 +299,15 @@ impl BatchPartitioner {
.into_iter()
.e
rluvaton commented on code in PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#discussion_r2051696202
##
datafusion/expr-common/src/groups_accumulator.rs:
##
@@ -106,6 +107,44 @@ impl EmitTo {
/// [`Accumulator`]: crate::accumulator::Accumulator
/// [Aggregating
rluvaton commented on code in PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#discussion_r2051696277
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs:
##
@@ -299,6 +420,12 @@ impl GroupsAccumulatorAdapter {
}
impl GroupsAccumulator
goldmedal commented on issue #15383:
URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2817168799
I have another implementation for this issue
https://github.com/goldmedal/datafusion/pull/4
The concept is that getting the row according to indices in the selection
vecto
EmilyMatt opened a new pull request, #1662:
URL: https://github.com/apache/datafusion-comet/pull/1662
## Which issue does this PR close?
Closes #1661 .
## Rationale for this change
As described in the issue, I just think the pools logic is becoming complex
codecov-commenter commented on PR #1660:
URL:
https://github.com/apache/datafusion-comet/pull/1660#issuecomment-2817176336
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1660?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
kumarlokesh commented on PR #15644:
URL: https://github.com/apache/datafusion/pull/15644#issuecomment-2817176100
> > thanks @kumarlokesh The code now looks much more aligned.
> > perhaps we can factor out
> > ```
> > self.parser
> >
Dandandan commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051731884
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -298,25 +299,15 @@ impl BatchPartitioner {
.into_iter()
EmilyMatt opened a new issue, #1661:
URL: https://github.com/apache/datafusion-comet/issues/1661
### What is the problem the feature request solves?
As there is now more than one memory Comet pool, and it stands to reason
there will be more in the future, as well as with the increasin
Dandandan commented on code in PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#discussion_r2051728746
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -320,6 +336,68 @@ impl BatchPartitioner {
Ok((partition, batch))
andygrove opened a new issue, #15781:
URL: https://github.com/apache/datafusion/issues/15781
### Describe the bug
`datasource` needs to be published before `catalog`:
```
Packaging datafusion-catalog v47.0.0
(/Users/andy/git/apache/datafusion/datafusion/catalog)
Up
YanivKunda commented on PR #1563:
URL:
https://github.com/apache/datafusion-comet/pull/1563#issuecomment-2817207648
> I'd like to merge this one and then switch to the official release next
week
@andygrove here's the switch:
https://github.com/apache/datafusion-comet/pull/1663
-
ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817209539
[benchmarks/hashagg-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/hashagg-results.md)
[benchmarks/join-results.md](https:/
YanivKunda opened a new pull request, #1663:
URL: https://github.com/apache/datafusion-comet/pull/1663
## Which issue does this PR close?
Closes #1634
## Rationale for this change
Upgraded from 47.0.0-rc1 post
https://github.com/apache/datafusion-comet/pull/1563
#
36 matches
Mail list logo