DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2040700866
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -765,6 +765,25 @@ impl ExternalSorter {
Ok(())
}
+
+/// Reserves memory to be able to accom
timsaucer commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798792530
I spoke too soon - I'm getting one error in our unit tests on `last_value`.
I'm trying to investigate this morning.
--
This is an automated message from the Apache Git Servi
2010YOUY01 commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798787828
Benchmark results:
(I think there is no significant regression for an extra round of re-spill,
if it's running on a machine with fast SSDs)
### Environment
MacBook Pro
2010YOUY01 commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798789296
I have made the following updates:
- Address review comments
- Introduced a new configuration option for max merge degree
- Add tests
It's ready for another look.
alamb merged PR #15639:
URL: https://github.com/apache/datafusion/pull/15639
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb closed issue #15605: Support window definitions with aliases
URL: https://github.com/apache/datafusion/issues/15605
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To un
alamb commented on PR #15589:
URL: https://github.com/apache/datafusion/pull/15589#issuecomment-2798807271
Thanks again @ding-young 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
alamb closed issue #15387: Trivial WHERE filter not eliminated when combined
with CTE
URL: https://github.com/apache/datafusion/issues/15387
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
alamb merged PR #15589:
URL: https://github.com/apache/datafusion/pull/15589
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb closed issue #15323: Reduce number of tokio blocking threads in SortExec
spill
URL: https://github.com/apache/datafusion/issues/15323
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
alamb closed issue #15323: Reduce number of tokio blocking threads in SortExec
spill
URL: https://github.com/apache/datafusion/issues/15323
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
alamb commented on PR #15639:
URL: https://github.com/apache/datafusion/pull/15639#issuecomment-2798807532
Thanks @chenkovsky and @comphead
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
alamb commented on issue #15686:
URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2798808874
> Coercing the LHS to Binary is less performant. Unconditionally parsing
literals as FixedSizeBytes would be a breaking change.
What about coercing the RHS to FixedSizeBinar
alamb commented on PR #15687:
URL: https://github.com/apache/datafusion/pull/15687#issuecomment-2798809026
I have an alternate suggestion here:
https://github.com/apache/datafusion/issues/15686#issuecomment-2798808874
--
This is an automated message from the Apache Git Service.
To respond
timsaucer commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798806510
I did a little investigating, but I don't have time for a couple of days to
dive in deeper. This appears to be related to
https://github.com/apache/datafusion/pull/15542 @UBar
timsaucer merged PR #15581:
URL: https://github.com/apache/datafusion/pull/15581
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
alamb merged PR #15654:
URL: https://github.com/apache/datafusion/pull/15654
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798810667
> I did a little investigating, but I don't have time for a couple of days
to dive in deeper. This appears to be related to
[#15542](https://github.com/apache/datafusion/pull/1554
alamb commented on PR #15581:
URL: https://github.com/apache/datafusion/pull/15581#issuecomment-2798809598
Very exciting -- thanks @timsaucer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
alamb commented on PR #15654:
URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2798807598
🚀
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe
alamb merged PR #15682:
URL: https://github.com/apache/datafusion/pull/15682
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
timsaucer opened a new issue, #12376:
URL: https://github.com/apache/datafusion/issues/12376
### Describe the bug
Less of a bug per se, but it would be nice to have identical function
signatures between first_value and last_value
### To Reproduce
_No response_
###
acking-you commented on issue #15631:
URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798742956
> @acking-you the code needs to be extended to support nulls (you can take a
look at the true_count implementation in arrow-rs to do this efficiently).
I have an idea f
timsaucer commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798821873
> > I did a little investigating, but I don't have time for a couple of days
to dive in deeper. This appears to be related to
[#15542](https://github.com/apache/datafusion/pul
milenkovicm closed issue #1235: executor can't read s3 config in push-staged
mode.
URL: https://github.com/apache/datafusion-ballista/issues/1235
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
milenkovicm merged PR #1236:
URL: https://github.com/apache/datafusion-ballista/pull/1236
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubsc
LucaCappelletti94 commented on code in PR #1806:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1806#discussion_r2040654851
##
src/parser/mod.rs:
##
@@ -7070,13 +7071,22 @@ impl<'a> Parser<'a> {
}
}
-/// Parse configuration like partitioning, cl
LucaCappelletti94 commented on code in PR #1806:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1806#discussion_r2040655023
##
tests/sqlparser_postgres.rs:
##
@@ -2733,6 +2733,39 @@ fn parse_create_brin() {
}
}
+#[test]
+fn parse_create_table_with_inherits(
andygrove opened a new pull request, #1642:
URL: https://github.com/apache/datafusion-comet/pull/1642
## Which issue does this PR close?
Closes #.
## Rationale for this change
## What changes are included in this PR?
## How are these changes
kumarlokesh commented on code in PR #15594:
URL: https://github.com/apache/datafusion/pull/15594#discussion_r2040707194
##
datafusion/core/tests/sql/runtime_config.rs:
##
@@ -0,0 +1,73 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lice
Dandandan commented on issue #15375:
URL: https://github.com/apache/datafusion/issues/15375#issuecomment-2798960792
@2010YOUY01 that sound like a very promising future direction. I might try
something experimenting on this soon if none beats me to it.
--
This is an automated message from
dependabot[bot] opened a new pull request, #1105:
URL: https://github.com/apache/datafusion-python/pull/1105
Bumps [mimalloc](https://github.com/purpleprotocol/mimalloc_rust) from
0.1.43 to 0.1.46.
Release notes
Sourced from https://github.com/purpleprotocol/mimalloc_rust/releases"
dependabot[bot] closed pull request #1095: build(deps): bump mimalloc from
0.1.43 to 0.1.45
URL: https://github.com/apache/datafusion-python/pull/1095
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
dependabot[bot] commented on PR #1095:
URL:
https://github.com/apache/datafusion-python/pull/1095#issuecomment-2798993546
Superseded by #1105.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
acking-you commented on PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340
The error in `cargo test` is caused by an incorrect calculation of the
pre-selection. The correct steps for calculating the pre-selection are as
follows:
1. Compute the boo
leoyvens commented on issue #15686:
URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2799013192
@alamb thank you for looking at this. Avoiding a config flag would be nice.
But I'm skeptical of the proposed coercion.
If we coerce `binary` to `fixed(N)` when encounter
iffyio merged PR #1806:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1806
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2040698187
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -529,6 +523,12 @@ impl ExternalSorter {
/// Sorts the in-memory batches and merges them into a single sort
duongcongtoai commented on issue #14554:
URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2798943345
I think we can break down this story into multiple step:
1. unify the optimizor for correlated query, regardless the query type
(exists query, scalar query etc)
2. su
Dandandan commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798955035
Thanks for sharing the results @zhuqi-lucas this is really interesting!
I think it mainly shows that we probably should try and use more efficient
in memory sorting (e.g. an
iffyio commented on code in PR #1808:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2040586609
##
src/ast/mod.rs:
##
@@ -8368,6 +8387,22 @@ pub enum CreateFunctionBody {
///
/// [BigQuery]:
https://cloud.google.com/bigquery/docs/referen
2010YOUY01 commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798782989
I didn't implement the parallel merge optimization for now, my major concern
is: this optimization requires one extra configuration, and users have to learn
and correctly set 2 co
qstommyshu commented on code in PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#discussion_r2040791232
##
datafusion/core/tests/memory_limit/mod.rs:
##
@@ -615,6 +616,104 @@ async fn test_disk_spill_limit_not_reached() ->
Result<()> {
Ok(())
}
+// Test c
qstommyshu commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2799447585
> ### Intended optimization
> If the memory pool is enough to hold more batches at a time (while
`spill_max_spill_merge_degree` is still limited to 4, in case the merge-degree
qstommyshu commented on code in PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#discussion_r2040846556
##
docs/source/user-guide/configs.md:
##
@@ -84,6 +84,7 @@ Environment variables are read during `SessionConfig`
initialisation so they mus
| datafusion.execu
djellemah commented on PR #15597:
URL: https://github.com/apache/datafusion/pull/15597#issuecomment-2799539091
Seems to me there's a confluence of several factors here:
- testing this kind of functionality is not simple.
- it's user facing, so if it breaks somebody will notice a
jayzhan211 opened a new pull request, #15695:
URL: https://github.com/apache/datafusion/pull/15695
## Which issue does this PR close?
- Closes #12376.
## Rationale for this change
## What changes are included in this PR?
## Are these changes
jayzhan211 commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2799545774
> My only remaining question is if we want to upgrade arrow in this release
as well
+1 for upgrading all the dependencies
--
This is an automated message from the Ap
Dandandan commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798712702
I think this is already looking quite nice. What do you need to finalize
this @zhuqi-lucas
--
This is an automated message from the Apache Git Service.
To respond to the message
iffyio commented on code in PR #1809:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2040595052
##
src/parser/mod.rs:
##
@@ -617,6 +623,9 @@ impl<'a> Parser<'a> {
}
// `COMMENT` is snowflake specific
https://docs.
Dandandan commented on issue #15375:
URL: https://github.com/apache/datafusion/issues/15375#issuecomment-2798726588
Really nice observation! I think we should drive this further.
Some further observations I saw when looking at the current implementation
on master for the in memory mer
DerGut opened a new pull request, #15692:
URL: https://github.com/apache/datafusion/pull/15692
## Which issue does this PR close?
Closes https://github.com/apache/datafusion/issues/15675
## Rationale for this change
I noticed an internal error in the sort implementation.
zhuqi-lucas commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798857034
> I think this is already looking quite nice. What do you need to finalize
this @zhuqi-lucas
Thank you @Dandandan for review, i think we just need to add the benchmark
res
chenkovsky opened a new pull request, #15693:
URL: https://github.com/apache/datafusion/pull/15693
## Which issue does this PR close?
- Closes #15688.
## Rationale for this change
There are two issues.
1. projection in scan table is dropped in
try_transform_to_sim
zhuqi-lucas commented on code in PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#discussion_r2040673419
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -645,7 +645,36 @@ impl ExternalSorter {
return self.sort_batch_stream(batch, metrics, reserva
zhuqi-lucas commented on issue #15375:
URL: https://github.com/apache/datafusion/issues/15375#issuecomment-2798866385
Thank you @Dandandan , addressed your comments. And we can make it as the
first version. And in future we may can improve it as described by @2010YOUY01 :
https://github.c
zhuqi-lucas commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798868399
@alamb Do we have the CI benchmark running now? If no, i need your help to
run... Thanks a lot!
And also for the sort-tpch itself, i was running for the improvement result,
andygrove merged PR #1641:
URL: https://github.com/apache/datafusion-comet/pull/1641
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@
andygrove closed issue #1640: Standardize Spark diff hash length
URL: https://github.com/apache/datafusion-comet/issues/1640
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
acking-you opened a new pull request, #15694:
URL: https://github.com/apache/datafusion/pull/15694
## Which issue does this PR close?
- Closes #15636
## Rationale for this change
Many thanks to @kosiew for doing a tremendous amount of work. Based on his
P
acking-you commented on issue #15631:
URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798871646
> > [@acking-you](https://github.com/acking-you) the code needs to be
extended to support nulls (you can take a look at the true_count implementation
in arrow-rs to do this e
delamarch3 commented on PR #15160:
URL: https://github.com/apache/datafusion/pull/15160#issuecomment-2798766441
Thanks for the reviews!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
ctsk commented on PR #15532:
URL: https://github.com/apache/datafusion/pull/15532#issuecomment-2798754505
Heads up: `SUM(DISTINCT (x + 5))` is **not** equivalent to `SUM(DISTINCT x)
+ 5 * COUNT(DISTINCT x)`
--
This is an automated message from the Apache Git Service.
To respond to the me
alamb commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2798878082
> > Do we want to hold DF 47 release for the arrow upgrade too?
> > I think it is possible (arrow will hopefully be released at the end of
this week -- and we could make the DF
zhuqi-lucas commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2798879716
Latest result based current latest code:
```rust
Benchmark sort_tpch1.json
┏━━┳━━┳━
alamb commented on issue #10451:
URL: https://github.com/apache/datafusion/issues/10451#issuecomment-2798883874
Hi @marvelshan -- I also took a look at the docs. It actually looks like
the format options are largely documented, but the documentation could be
improved
I suggest:
2010YOUY01 commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2040878250
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -529,6 +523,12 @@ impl ExternalSorter {
/// Sorts the in-memory batches and merges them into a single
changsun20 opened a new pull request, #15696:
URL: https://github.com/apache/datafusion/pull/15696
## Which issue does this PR close?
Closes #14434
## Rationale for this change
This PR addresses a common SQL anti-pattern where users accidentally use `=
NULL` instead of `
changsun20 commented on PR #15696:
URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2799781696
Hi @eliaperantoni,
Thank you for your patience and guidance throughout this issue. I've
implemented the core functionality per our discussions, but would like to
confirm a
andygrove commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2799561705
I am also +1 for upgrading the dependencies (for selfish reasons; we are
waiting on an arrow feature to help with INT96 timestamps in Parquet)
--
This is an automated messag
70 matches
Mail list logo