ozankabak commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2924371687
Seems like folks are on the way to finding a better solution -- let's see
where the discussion goes.
--
This is an automated message from the Apache Git Service.
To respond to th
2010YOUY01 commented on code in PR #16182:
URL: https://github.com/apache/datafusion/pull/16182#discussion_r2117240352
##
benchmarks/src/util/run.rs:
##
@@ -138,6 +144,13 @@ impl BenchmarkRun {
}
}
+/// Mark current query
+pub fn mark_failed(&mut self) {
2010YOUY01 commented on issue #16054:
URL: https://github.com/apache/datafusion/issues/16054#issuecomment-2924301139
> I've had a closer look, we simply wrap `- {foo}` in parentheses regardless
of whether that's strictly necessary. Following the same principle for
`BinaryExpr` won't look as
2010YOUY01 merged PR #16215:
URL: https://github.com/apache/datafusion/pull/16215
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dat
Copilot commented on code in PR #16182:
URL: https://github.com/apache/datafusion/pull/16182#discussion_r2117256270
##
benchmarks/compare.py:
##
@@ -156,8 +173,8 @@ def compare(
console.print(table)
# Calculate averages
-avg_baseline_time = total_baseline_time /
l1t1 closed issue #1131: union all +aggregate function in the recursive cte
results an infinite loop
URL: https://github.com/apache/datafusion-python/issues/1131
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
2010YOUY01 commented on PR #16210:
URL: https://github.com/apache/datafusion/pull/16210#issuecomment-2924177927
This feature is intended purely for performance, right? Not because there
are cases that can only be run with NLJ.
If that's the case, I suggest to do some benchmarks upfront to
comphead commented on code in PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#discussion_r2116905907
##
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##
@@ -221,9 +221,11 @@ impl SortMergeJoinExec {
// When output schema contains only the right
corwinjoy closed pull request #16219: Parquet encryption
URL: https://github.com/apache/datafusion/pull/16219
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-
corwinjoy opened a new pull request, #16219:
URL: https://github.com/apache/datafusion/pull/16219
Draft PR to add encryption to datafusion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
jonathanc-n commented on issue #16206:
URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2923567825
I'll try to give that a try, thanks @Dandandan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
Dandandan commented on issue #16206:
URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2923538632
This might help here as well I think?
https://github.com/apache/arrow-rs/pull/6427/files
--
This is an automated message from the Apache Git Service.
To respond to the
Dandandan commented on issue #16206:
URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2923526973
Yeah - one of the options to optimize join is by avoiding the `take` +
`concat` (which happens when generating fewer rows per input batch in join, so
I think that could use a
Dandandan commented on code in PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#discussion_r2116661763
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -1126,6 +1153,28 @@ where
.collect()
}
+pub(crate) fn get_mark_indices(
+range: &Range,
+
Dandandan commented on code in PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#discussion_r2116652162
##
datafusion/sql/src/unparser/plan.rs:
##
@@ -738,21 +739,38 @@ impl Unparser<'_> {
let negated = match join.join_type {
ctsk commented on PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2923436003
After updating `JoinType::supports_swap` to include LeftMark/RightMark join,
the join_selection rule *should* already plan right joins where appropriate.
Subsequently running the sqllog
codecov-commenter commented on PR #1822:
URL:
https://github.com/apache/datafusion-comet/pull/1822#issuecomment-2923397979
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1822?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
ctsk commented on PR #16083:
URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2923375501
Really cool work! I'll look over it in more detail over the weekend =)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
andygrove opened a new issue, #1824:
URL: https://github.com/apache/datafusion-comet/issues/1824
### Describe the bug
```
2025-05-30T18:23:30.5178844Z
2025-05-30T18:23:30.6051928Z [0m[[0m[0minfo[0m] [0m[0m[31m-
SPARK-32038: NormalizeFloatingNumbers should work on dis
andygrove commented on PR #1736:
URL:
https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2923366806
There are now many more test failures than when this PR was first created.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
andygrove opened a new issue, #1823:
URL: https://github.com/apache/datafusion-comet/issues/1823
### Describe the bug
In PR https://github.com/apache/datafusion-comet/pull/1736 there is a panic
during Comet shuffle when building structs.
```
2025-05-30T18:36:16.9739600Z [0m
andygrove opened a new pull request, #1822:
URL: https://github.com/apache/datafusion-comet/pull/1822
## Which issue does this PR close?
N/A
## Rationale for this change
Reduce verbose logging in CI
## What changes are included in this PR?
alamb commented on issue #14993:
URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2923352013
@adriangb and I spoke about this -- we think having some sort of motivating
example where the difference between schemas in the table and the file leads to
bad performance will he
kazuyukitanimura commented on PR #1602:
URL:
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2923351378
Merged Thank you @kazantsev-maksim @parthchandra @mbutrovich
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
kazuyukitanimura merged PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsub
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922200568
> > For example, FileStream could count how many batches it sends
back-to-back without yielding, and after a certain threshold, it yields. WDYT?
>
> Perhaps DataSourceExec
Dandandan commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2116490428
##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
/// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
AggregateFunction(Aggre
codecov-commenter commented on PR #1821:
URL:
https://github.com/apache/datafusion-comet/pull/1821#issuecomment-2922995270
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1821?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
andygrove commented on code in PR #1821:
URL: https://github.com/apache/datafusion-comet/pull/1821#discussion_r2116291551
##
native/core/src/execution/planner.rs:
##
@@ -146,15 +146,7 @@ pub struct PhysicalPlanner {
impl Default for PhysicalPlanner {
fn default() -> Self
andygrove opened a new pull request, #1821:
URL: https://github.com/apache/datafusion-comet/pull/1821
## Which issue does this PR close?
N/A
## Rationale for this change
As we add more expressions from `datafusion-spark` we would need to update
two identi
andygrove opened a new issue, #1820:
URL: https://github.com/apache/datafusion-comet/issues/1820
### What is the problem the feature request solves?
The `datafusion-spark` crate now provides a sha2 implementation
(https://github.com/apache/datafusion/pull/16168), so we should be able
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922908904
I'll leave this open for a few days to gather any remaining feedback
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
kazantsev-maksim commented on PR #1602:
URL:
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2922907576
Thank you for the feedback! @kazuyukitanimura @parthchandra
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
alamb commented on PR #16215:
URL: https://github.com/apache/datafusion/pull/16215#issuecomment-2922905121
> hope downstream users didn't use it
If they need to write data to IPC files, I think either of the upstream
arrow-rs choices is better:
1. https://docs.rs/arrow-ipc/55.1.0/a
alamb commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922901439
@ozankabak
> Pipeline-breaking operators creating cancellation issues is a universal
problem -- this is not an `AggregateExec` issue. Therefore, I firmly believe
that the solut
parthchandra commented on issue #1812:
URL:
https://github.com/apache/datafusion-comet/issues/1812#issuecomment-2922892698
Spark's instructions for using sbt have mostly be applicable to Comet for me
https://spark.apache.org/docs/latest/building-spark.html#building-with-sbt
--
This is
alamb commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922888436
I also agree we could make a query cancellation token work -- the potential
challenges of that approach are:
1. We have to find some way to trigger the cancellation token when the
s
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922659103
๐ค: Benchmark completed
Details
```
group main
reduce-expr-struct-size
-
dependabot[bot] opened a new pull request, #16212:
URL: https://github.com/apache/datafusion/pull/16212
Bumps
[testcontainers-modules](https://github.com/testcontainers/testcontainers-rs-modules-community)
from 0.12.0 to 0.12.1.
Release notes
Sourced from https://github.com/testco
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922813081
Drop off for today, i will start the work tomorrow. Thank you @pepijnve for
good discussion, i think it deserve we have the cancellation solution.
--
This is an automated messa
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922801898
> Would this require any changes to `SessionContext`? Cancellation is per
query, so I think you would want to add the `cancelled: AtomicBoolean`, `fn
cancel()` and `fn is_cancell
parthchandra commented on PR #1817:
URL:
https://github.com/apache/datafusion-comet/pull/1817#issuecomment-2922781217
I have another thought on this. Any number of users have developed custom
`AWSCredentialsProvider`s in Java but we would not have corresponding
implementations in Rust (the
pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922773141
Would this require any changes to `SessionContext`? Cancellation is per
query, so I think you would want to add the `cancelled: AtomicBoolean`, `fn
cancel()` and `fn is_cancelled()
jonathanc-n commented on PR #16210:
URL: https://github.com/apache/datafusion/pull/16210#issuecomment-2922765656
I believe a follow up pull request can be made to actually have this be used
in the physical plan based on some statistics. @comphead Does this look fine
for now?
--
This is a
parthchandra commented on code in PR #1817:
URL: https://github.com/apache/datafusion-comet/pull/1817#discussion_r2116175913
##
native/Cargo.lock:
##
@@ -375,6 +375,353 @@ version = "1.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index";
checksum = "ace50bade
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922755179
๐ค: Benchmark completed
Details
```
Comparing HEAD and reduce-expr-struct-size
Benchmark clickbench_extended.json
--
mbutrovich commented on issue #1789:
URL:
https://github.com/apache/datafusion-comet/issues/1789#issuecomment-2922749343
The investigation seems to be diverging from `map_values` and seems more
focused on map value nullability, and in particular struct field nullability
and how that's enco
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922745280
> After my previous comment I started thinking about how our Java code
handles this kind of thing. Typically when we do a `Future#cancel(mayInterrupt:
true)` we have `Thread#isIn
comphead commented on PR #16215:
URL: https://github.com/apache/datafusion/pull/16215#issuecomment-2922744744
hope downstream users didn't use it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
comphead merged PR #16213:
URL: https://github.com/apache/datafusion/pull/16213
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
alamb opened a new pull request, #16216:
URL: https://github.com/apache/datafusion/pull/16216
## Which issue does this PR close?
- Related to https://github.com/apache/datafusion/issues/15096
- Follow on to https://github.com/apache/datafusion/pull/16142
## Rationale for
parthchandra commented on PR #1815:
URL:
https://github.com/apache/datafusion-comet/pull/1815#issuecomment-2922733351
Changing this to draft while I look into the CI failures
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922714438
After my previous comment I started thinking about how our Java code handles
this kind of thing. Typically when we do a `Future#cancel(mayInterrupt: true)`
we have `Thread#isInterru
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922689219
Thank you @pepijnve , i may got it now, it means:
# Per-Batch Overhead Comparison
## 1. Counter + compare
```rust
this.batches_processed += 1; // I
Blizzara commented on issue #11201:
URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2922682982
Just fwiw, we already use Comet's Cast in our use of DF, through a rewrite
rule:
```
impl FunctionRewrite for OurCastRewrite {
fn name(&self) -> &str {
"
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922659200
๐ค `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922625826
> Pipeline-breaking operators creating cancellation issues is a universal
problem -- this is not an AggregateExec issue.
@ozankabak It's indeed a universal problem, but since
ozankabak commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922512158
Pipeline-breaking operators creating cancellation issues is a universal
problem -- this is not an `AggregateExec` issue. Therefore, I firmly believe
that the solution has to be uni
andygrove commented on issue #15914:
URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2922618245
@shehabgamin @alamb I created an epic in Comet for implementing our current
expressions as `ScalarUDFImpl` rather than `PhysicalExpr` as a first step
towards contributing them
codecov-commenter commented on PR #1818:
URL:
https://github.com/apache/datafusion-comet/pull/1818#issuecomment-2922612812
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1818?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
andygrove opened a new issue, #1819:
URL: https://github.com/apache/datafusion-comet/issues/1819
### What is the problem the feature request solves?
Many of Comet's Spark-compatible DataFusion expressions are currently
implemented as `PhysicalExpr`. We would like to update them to imp
alamb commented on code in PR #16216:
URL: https://github.com/apache/datafusion/pull/16216#discussion_r2116071996
##
docs/source/library-user-guide/upgrading.md:
##
@@ -21,6 +21,55 @@
## DataFusion `48.0.0`
+### `VARCHAR` SQL type changed to map to `Utf8View` Arrow type
+
+
andygrove opened a new pull request, #1818:
URL: https://github.com/apache/datafusion-comet/pull/1818
## Which issue does this PR close?
Closes #.
## Rationale for this change
## What changes are included in this PR?
## How are these changes
berkaysynnada commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922128094
> > For example, FileStream could count how many batches it sends
back-to-back without yielding, and after a certain threshold, it yields. WDYT?
>
> Perhaps DataSourceExe
zhuqi-lucas commented on code in PR #16216:
URL: https://github.com/apache/datafusion/pull/16216#discussion_r2116036299
##
docs/source/library-user-guide/upgrading.md:
##
@@ -21,6 +21,55 @@
## DataFusion `48.0.0`
+### `VARCHAR` SQL type changed to map to `Utf8View` Arrow ty
ozankabak commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922519861
Sounds good -- let's go with the first choice (moving fast with the release
and merging this right after) unless someone else brings a good argument to the
contrary.
--
This is
alamb opened a new pull request, #16213:
URL: https://github.com/apache/datafusion/pull/16213
## Which issue does this PR close?
- Follow on to https://github.com/apache/datafusion/pull/16139
## Rationale for this change
Implement suggestions from @xudong963:
https://git
ozankabak commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922549725
Makes sense, thanks @zhuqi-lucas. Both @berkaysynnada and I will be happy to
help with brainstorming/searching as you search for a good solution to this
problem ๐
--
This is an
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922500062
> I think the issue is for streams like AggregateExec that can always make
progress. So while this PR wraps the input to the AggregateExecStream, I think
logically the yielding i
zhuqi-lucas commented on issue #16149:
URL: https://github.com/apache/datafusion/issues/16149#issuecomment-2922543426
> > Do some changes in arrow-rs clickbench benchmark:
>
> Do I understand that the changes you report are due to simply rewriting
the parquet files to have a page inde
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922532031
> > Even if I couldn't get consensus on that direction, this version could
still be improved. For example, by inserting the YieldStream only when itโs
truly needed, such as in th
alamb commented on code in PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#discussion_r2116011961
##
datafusion/expr/src/expr.rs:
##
@@ -330,7 +331,7 @@ pub enum Expr {
/// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
AggregateFunction(Aggregate
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922513729
I kicked off some benchmarks for this PR -- thank you @hendrikmakait ๐
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on t
alamb commented on PR #16207:
URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2922512365
๐ค `./gh_compare_branch_bench.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~
ozankabak commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922481568
Let's definitely time the merge and DF48 release in a cooperative way. We
have two competing realities at play: (1) The PR attracts a lot of conflicts
and it is not easy to resolve
alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2115997415
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.a
alamb commented on issue #16149:
URL: https://github.com/apache/datafusion/issues/16149#issuecomment-2922493420
> Do some changes in arrow-rs clickbench benchmark:
Do I understand that the changes you report are due to simply rewriting the
parquet files to have a page index and be unc
alamb commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922489274
> Move fast with the release while this gets reviews, which has the
advantage of making DF48 a "checkpoint" before this PR. The downside is that we
will need to do extensive testing bo
alamb commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922479564
> Even if I couldn't get consensus on that direction, this version could
still be improved. For example, by inserting the YieldStream only when itโs
truly needed, such as in the empty
alamb commented on code in PR #16185:
URL: https://github.com/apache/datafusion/pull/16185#discussion_r2115978901
##
datafusion/physical-expr/src/equivalence/class.rs:
##
@@ -422,6 +423,60 @@ impl EquivalenceGroup {
self.bridge_classes()
}
+/// Returns a set
pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922419073
> However, the same example can also consume the pending return generated by
Aggregate, so the guarantee cannot be upheld in our model.
Indeed, which is why I've introduced co
chenkovsky commented on code in PR #16185:
URL: https://github.com/apache/datafusion/pull/16185#discussion_r2115970133
##
datafusion/physical-expr/src/equivalence/class.rs:
##
@@ -422,6 +423,60 @@ impl EquivalenceGroup {
self.bridge_classes()
}
+/// Returns a
chenkovsky closed pull request #16211: feat: improve with_constants performance
URL: https://github.com/apache/datafusion/pull/16211
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
xudong963 commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922442995
One thing on my mind is that maybe it's better to hold up the PR until DF48
release, then we can have enough time to check and verify the PR after it
merges.
--
This is an autom
cj-zhukov commented on issue #16218:
URL: https://github.com/apache/datafusion/issues/16218#issuecomment-2922435442
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
cj-zhukov opened a new issue, #16218:
URL: https://github.com/apache/datafusion/issues/16218
### Is your feature request related to a problem or challenge?
Following the discussion in [PR
#14227](https://github.com/apache/datafusion/pull/14227) about dict_ordered,
Iโm going to open a
ozankabak commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922392615
Thanks for the quick response @alamb -- I added some reviewing tips and
tricks to the PR body to help you guys out.
--
This is an automated message from the Apache Git Service.
T
alamb commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922389594
@wiedld @xudong963 and @suremarc perhaps you are also interested in this
PR given your past interest in the equivalence code. Also cc @chenkovsky
--
This is an automated message f
alamb commented on PR #16217:
URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2922387217
๐ฎ
I will put this on my list to review asap
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
alamb commented on issue #16202:
URL: https://github.com/apache/datafusion/issues/16202#issuecomment-2922385651
Thanks @kosiew -- here is a PR:
- https://github.com/apache/datafusion/pull/16214
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
alamb commented on issue #16206:
URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2922383864
I have noticed something similar -- that the pattern of "take / concat"
results in potentially non trival overhead.
I have been recently working on some related code (the sa
berkaysynnada commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922380421
@pepijnve thanks for the example. However, the same example can also consume
the pending return generated by Aggregate, so the guarantee cannot be upheld in
our model.
I
alamb commented on PR #16142:
URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2922372187
Thanks again @zhuqi-lucas and @xudong963
I have also made a PR to add a mention of this change to the upgrade guide:
- https://github.com/apache/datafusion/pull/16216
--
Th
ozankabak opened a new pull request, #16217:
URL: https://github.com/apache/datafusion/pull/16217
## Which issue does this PR close?
- Closes #13748.
## Rationale for this change
DF's theoretical approach to representing orderings, equivalences and
constants is sound, bu
Vedin opened a new pull request, #1867:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1867
`GROUPING SETS` currently accepts only tuples. It can be single elements as
well.
At least, I can see this behaviour in Hive and Snowflake:
https://hive.apache.org/docs/latest/3015
alamb merged PR #16148:
URL: https://github.com/apache/datafusion/pull/16148
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922322640
I agree with @pepijnve that adding a yield to the producer is not clearly a
better choice -- I think the issue is for streams like AggregateExec that can
always make progress
S
alamb commented on PR #1864:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1864#issuecomment-2922335147
The code just keeps churning along in this repo. Thank you @iffyio and
@dmzmk
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
xudong963 merged PR #16212:
URL: https://github.com/apache/datafusion/pull/16212
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
xudong963 commented on code in PR #16139:
URL: https://github.com/apache/datafusion/pull/16139#discussion_r2115539220
##
datafusion/common/src/pruning.rs:
##
@@ -122,3 +127,984 @@ pub trait PruningStatistics {
values: &HashSet,
) -> Option;
}
+
+/// Prune files ba
1 - 100 of 139 matches
Mail list logo