Dandandan commented on PR #1267:
URL:
https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926944302
> hey @Dandandan it does not, the change is focused more on aligning with
spark, which to my knowledge does not have multi scheduler setup.
>
> I'm consider using uli
milenkovicm commented on PR #1267:
URL:
https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926963201
when going through logs, for example, it makes it easier to reason about.
--
This is an automated message from the Apache Git Service.
To respond to the message, please
iffyio commented on code in PR #1867:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1867#discussion_r2118991085
##
src/parser/mod.rs:
##
@@ -10045,7 +10055,7 @@ impl<'a> Parser<'a> {
}
if self.parse_keywords(&[Keyword::GROUPING, Keyword::
Dandandan commented on PR #1216:
URL:
https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2926981538
As far as I can see, we don't have to validate the IPC files:
* Ballista has control over writing the output
* In a power down scenario where the file is being writ
alamb commented on PR #16216:
URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2926983181
> I remember back then in Oracle days, there was VARCHAR and VARCHAR2 data
types, just thinking aloud if it can be reused like VARCHAR is UTF8 ArrowType,
VARCHAR2 is UTF8View
Th
milenkovicm commented on PR #1267:
URL:
https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926982414
there is issue with having job id tied to physical directory, which may make
mess when scheduler is restarted without restarting executors, making
possibility to overlap
Dandandan commented on PR #1267:
URL:
https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926982786
Hm yeah. Not anything against it, but just my thoughts :). I think `ULID`
would be preferable over an atomic id.
--
This is an automated message from the Apache Git Servi
alamb opened a new pull request, #16222:
URL: https://github.com/apache/datafusion/pull/16222
This PR is for testing DataFusion with the code in the following PR
- https://github.com/apache/arrow-rs/pull/7513
This is the second of 2 experiments:
1. Is `Does ClickBench` performan
alamb commented on PR #16222:
URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927009365
š¤ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
alamb commented on PR #16222:
URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927065804
š¤: Benchmark completed
Details
```
Comparing HEAD and alamb_test_actual_pushdown
Benchmark clickbench_extended.json
---
zhuqi-lucas commented on PR #16222:
URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927075657
> š¤: Benchmark completed
>
> Details
>
> ```
> Comparing HEAD and alamb_test_actual_pushdown
>
> Benchmark clickbench_extended.json
>
alamb commented on PR #16208:
URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2927076431
I ran q24 locally and did see a small slowdown and did some profiling
As expected filtering is about 30% of the overall execution time of the
filtering time, about 1/2 goes to c
Dandandan opened a new pull request, #16223:
URL: https://github.com/apache/datafusion/pull/16223
## Which issue does this PR close?
- Closes #.
## Rationale for this change
Recently, I found `interleave_batches` to be faster than the existing code.
that takes in
alamb commented on code in PR #16221:
URL: https://github.com/apache/datafusion/pull/16221#discussion_r2119245089
##
datafusion/expr/src/logical_plan/builder.rs:
##
@@ -1626,12 +1626,19 @@ pub fn build_join_schema(
join_type,
left.fields().len(),
);
-l
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2927412038
I plan to merge this tomorrow so it can be included in DataFusion 48.0.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
alamb commented on PR #16195:
URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2927412259
@gabotechs / @LiaCastaneda please ping me when you think this PR is ready
for a review / merge
Thank you for the help getting it ready
--
This is an automated message from th
chenkovsky commented on code in PR #16221:
URL: https://github.com/apache/datafusion/pull/16221#discussion_r2119256983
##
datafusion/expr/src/logical_plan/builder.rs:
##
@@ -1626,12 +1626,19 @@ pub fn build_join_schema(
join_type,
left.fields().len(),
);
-
Dandandan closed pull request #16223: Concatenate inside hash repartition
URL: https://github.com/apache/datafusion/pull/16223
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
Dandandan opened a new pull request, #15768:
URL: https://github.com/apache/datafusion/pull/15768
## Which issue does this PR close?
Addresses https://github.com/apache/datafusion/issues/7957,
https://github.com/apache/datafusion/issues/6822,
https://github.com/apache/datafus
kazantsev-maksim commented on code in PR #1825:
URL: https://github.com/apache/datafusion-comet/pull/1825#discussion_r2119396171
##
native/proto/src/proto/expr.proto:
##
@@ -72,18 +72,17 @@ message Expr {
NormalizeNaNAndZero normalize_nan_and_zero = 45;
TruncDate trunc
hozan23 opened a new issue, #16224:
URL: https://github.com/apache/datafusion/issues/16224
Following the PR #11896, Iām going to open another PR for supporting
compound identifiers when parsing tuples
--
This is an automated message from the Apache Git Service.
To respond to the message,
hozan23 opened a new pull request, #16225:
URL: https://github.com/apache/datafusion/pull/16225
## Which issue does this PR close?
- Closes #16224
## Rationale for this change
We would like to support adding table name qualifiers to columns inside
tuples. Currently, this
parthchandra commented on code in PR #1817:
URL: https://github.com/apache/datafusion-comet/pull/1817#discussion_r2119749301
##
spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromS3Suite.scala:
##
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
renato2099 commented on PR #1132:
URL:
https://github.com/apache/datafusion-python/pull/1132#issuecomment-2927926602
As a user, these docs seem great! Looking forward to have them merged!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
nu commented on issue #1861:
URL:
https://github.com/apache/datafusion-sqlparser-rs/issues/1861#issuecomment-2927948906
I'm trying to take a look into that, and I'm trying to evaluate if a new
type shall be declared inside `src/ast/data_type.rs` or an existing one shall
be reused.
jonathanc-n commented on issue #16226:
URL: https://github.com/apache/datafusion/issues/16226#issuecomment-2928034844
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
jonathanc-n opened a new issue, #16226:
URL: https://github.com/apache/datafusion/issues/16226
### Is your feature request related to a problem or challenge?
Mentioned as a TODO statement here:
https://github.com/apache/datafusion/pull/16083.
### Describe the solution you'd lik
parthchandra commented on PR #1817:
URL:
https://github.com/apache/datafusion-comet/pull/1817#issuecomment-2928171165
> > I have another thought on this. Any number of users have developed
custom `AWSCredentialsProvider`s in Java but we would not have corresponding
implementations in Rust
jonathanc-n commented on code in PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#discussion_r2119831572
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -1126,6 +1153,28 @@ where
.collect()
}
+pub(crate) fn get_mark_indices(
+range: &Range,
+
adriangb commented on issue #16200:
URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2928411444
Sadly I doubt there's a correct answer. It might be the opposite for a local
SSD vs object storage.
--
This is an automated message from the Apache Git Service.
To respond to
jonathanc-n commented on PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2928463577
> Regarding the lack of support for RightMark joins in some join operators,
I believe it would be best to return an error in the constructor of those
operators if they do not sup
parthchandra commented on PR #1817:
URL:
https://github.com/apache/datafusion-comet/pull/1817#issuecomment-2928173195
One more thought. Would you be able to write some documentation on
configuring/using this?
--
This is an automated message from the Apache Git Service.
To respond to the
jonathanc-n commented on code in PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#discussion_r2119682441
##
datafusion/sql/src/unparser/plan.rs:
##
@@ -738,21 +739,38 @@ impl Unparser<'_> {
let negated = match join.join_type {
etseidl commented on issue #16200:
URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2928051534
> One challenge / tradeoff that would be interesting/required is that doing
another async load to read more of the metdata will be very bad if that has to
actually go to object
coderfender commented on PR #1736:
URL:
https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2928666274
Working on this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
coderfender commented on PR #1736:
URL:
https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2928669883
Working on this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
brayanjuls commented on issue #16158:
URL: https://github.com/apache/datafusion/issues/16158#issuecomment-2928912517
I am sorry for the delay on a resolution to this issue. I have been busy at
work and the workload will remain the same at least for the next 3 weeks so I
prefer to release th
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2928856563
> > Only those streams that call poll_next themselves in a loop, and as a
consequence may block for an extended period of time, would need to do this.
Are there that many of thos
ozankabak commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2926864588
> Only those streams that call poll_next themselves in a loop, and as a
consequence may block for an extended period of time, would need to do this.
Are there that many of those?
zhuqi-lucas commented on issue #16200:
URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2926814928
Created a arrow-rs issue, we can implement the interface first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
chenkovsky opened a new pull request, #16221:
URL: https://github.com/apache/datafusion/pull/16221
## Which issue does this PR close?
- Closes #15754.
## Rationale for this change
some optimization rules will swap left and right plan. then the metadata of
optimized p
milenkovicm merged PR #1226:
URL: https://github.com/apache/datafusion-ballista/pull/1226
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubsc
milenkovicm closed issue #1220: `JobStateEventStream` does not emit events
related to Session
URL: https://github.com/apache/datafusion-ballista/issues/1220
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
milenkovicm merged PR #1250:
URL: https://github.com/apache/datafusion-ballista/pull/1250
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubsc
shehabgamin commented on issue #15914:
URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2926895014
> [@shehabgamin](https://github.com/shehabgamin)
[@alamb](https://github.com/alamb) I created an epic in Comet for implementing
our current expressions as `ScalarUDFImpl` ra
Dandandan closed pull request #15768: Use `interleave` to speed up hash
repartitioning
URL: https://github.com/apache/datafusion/pull/15768
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
andygrove commented on code in PR #1825:
URL: https://github.com/apache/datafusion-comet/pull/1825#discussion_r2119347211
##
native/proto/src/proto/expr.proto:
##
@@ -72,18 +72,17 @@ message Expr {
NormalizeNaNAndZero normalize_nan_and_zero = 45;
TruncDate truncDate =
codecov-commenter commented on PR #1825:
URL:
https://github.com/apache/datafusion-comet/pull/1825#issuecomment-2927584168
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1825?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
Dandandan commented on PR #16223:
URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927495956
FYI @alamb this relates to your quest to remove `CoalesceBatches` (this
doesn't yet remove `concat` but it shows the potential for optimization).
--
This is an automated message
codecov-commenter commented on PR #1826:
URL:
https://github.com/apache/datafusion-comet/pull/1826#issuecomment-2927586200
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1826?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
alamb commented on PR #16223:
URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927590363
š¤: Benchmark completed
Details
```
Comparing HEAD and concat_in_repartition
Benchmark clickbench_extended.json
Dandandan commented on PR #16223:
URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927599722
> š¤: Benchmark completed
>
> Details
>
> ```
> Comparing HEAD and concat_in_repartition
>
> Benchmark clickbench_extended.json
> --
alamb commented on PR #16216:
URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2927523703
Thanks @comphead and @zhuqi-lucas
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
Dandandan commented on PR #16223:
URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927612968
One commit was missing, but not sure that explains the difference between my
result and this one.
--
This is an automated message from the Apache Git Service.
To respond to the m
zhuqi-lucas commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2926779798
> So it might actually be the case that the changed code is a bit slower for
this case. In the query there is only little data to copy (so concat batches ->
concat sort keys does
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2926803211
> Isn't that somewhat unavoidable when you're dealing with a cooperative
scheduler? There's no way for tokio to preempt.
I agree, so i added the most urgent and possible ca
zhuqi-lucas commented on PR #16216:
URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2926807164
> lgtm thanks @alamb
>
> I remember back then in Oracle days, there was VARCHAR and VARCHAR2 data
types, just thinking aloud if it can be reused like VARCHAR is UTF8 ArrowT
cj-zhukov opened a new pull request, #16220:
URL: https://github.com/apache/datafusion/pull/16220
## Which issue does this PR close?
- Closes #16218.
## Rationale for this change
## What changes are included in this PR?
## Are these changes
Dandandan closed pull request #16223: Concatenate inside hash repartition
URL: https://github.com/apache/datafusion/pull/16223
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
alamb commented on PR #16222:
URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927088025
> The clickbench only has several cases with real regression > 20%, and i
believe those cases can be improved by combined with adaptive, i think we are
at good state.
I agree --
kazantsev-maksim opened a new pull request, #1826:
URL: https://github.com/apache/datafusion-comet/pull/1826
## Which issue does this PR close?
Part of https://github.com/apache/datafusion-comet/issues/1819
## Rationale for this change
See https://github.com/apache/datafu
zhuqi-lucas commented on PR #16222:
URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927122992
> #16208 (comment)
https://github.com/apache/arrow-rs/pull/7524#issuecomment-2888412242
Thank you @alamb , from previous result, it will help Q14 Q24 Q30 Q31 ,
which
ctsk commented on PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2927498848
Alrighty!
Regarding the lack of support for RightMark joins in some join operators, I
believe it would be best to return an error in the constructor of those
operators if they do
ctsk commented on code in PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#discussion_r2119313686
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -1126,6 +1153,28 @@ where
.collect()
}
+pub(crate) fn get_mark_indices(
+range: &Range,
+inp
alamb merged PR #16216:
URL: https://github.com/apache/datafusion/pull/16216
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on PR #16216:
URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2927523426
Had to merge this so we had at least one commit today š
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
alamb commented on PR #16223:
URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927525203
š¤ `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
Dandandan commented on PR #16223:
URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927734943
. let me try some other approach later - buffering inputs for each output
partition until it reaches the target batch size (just like coalescebatches).
perhaps the extra copy for s
Dandandan closed pull request #16223: Concatenate inside hash repartition
URL: https://github.com/apache/datafusion/pull/16223
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
renato2099 commented on issue #1091:
URL:
https://github.com/apache/datafusion-python/issues/1091#issuecomment-2927812643
Hi @kevinjqliu , @timsaucer ,
Here is an initial PR for this
https://github.com/apache/datafusion-python/pull/1137. Let me know if there is
anything else to be d
renato2099 opened a new pull request, #1137:
URL: https://github.com/apache/datafusion-python/pull/1137
# Which issue does this PR close?
Closes #1091
# Rationale for this change
Similar to exposing FFI for TableProviders, this PR exposes the capability
for exposing Catalo
71 matches
Mail list logo