dependabot[bot] opened a new pull request, #15331:
URL: https://github.com/apache/datafusion/pull/15331
Bumps [blake3](https://github.com/BLAKE3-team/BLAKE3) from 1.6.1 to 1.7.0.
Release notes
Sourced from https://github.com/BLAKE3-team/BLAKE3/releases";>blake3's
releases.
1
dependabot[bot] opened a new pull request, #15333:
URL: https://github.com/apache/datafusion/pull/15333
Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.7.1 to
2.8.0.
Changelog
Sourced from https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md";>indexmap's
joroKr21 commented on PR #15149:
URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2739679149
> I recommend following whatever DuckDB (or postgres do) -- there is not
muchv alue in DataFusion having different semantics from other systems
* DuckDB doesn't have union for
aectaan commented on issue #15291:
URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2739694259
Ok, probably it's related to `Analyzer`. After disabling it optimisations
are ok
--
This is an automated message from the Apache Git Service.
To respond to the message, please
alamb commented on code in PR #15168:
URL: https://github.com/apache/datafusion/pull/15168#discussion_r2004518126
##
datafusion/spark/src/function/math/expm1.rs:
##
@@ -0,0 +1,169 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license a
korowa commented on PR #15324:
URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2739393073
Thank you @waynexia, I'm planning to check it out at most tomorrow.
I have a question in advance before reviewing -- have you been considering
to implement groups accumulator fo
comphead commented on issue #1553:
URL:
https://github.com/apache/datafusion-comet/issues/1553#issuecomment-2737306709
interesting why the panic is nonunwinding, by default the `panic` on Rust
for release should be unwinding
--
This is an automated message from the Apache Git Service.
To
eliaperantoni commented on issue #15276:
URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2739463296
Hey @jsai28, that looks good! I have some points that I'd like to hear your
opinion on:
1. I think `FnCallSpans.args` should have exactly as many elements as the
ar
xudong963 commented on issue #15271:
URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2739420835
@alamb Thank you for summarizing, I'm also interested in this topic and may
have more time to join the game in May, but I will keep an eye on the progress.
--
This is an aut
xudong963 commented on code in PR #15330:
URL: https://github.com/apache/datafusion/pull/15330#discussion_r2005013876
##
datafusion/datasource-parquet/src/file_format.rs:
##
@@ -797,10 +797,34 @@ pub async fn fetch_statistics(
statistics_from_parquet_meta_calc(&metadata, ta
xudong963 opened a new pull request, #15330:
URL: https://github.com/apache/datafusion/pull/15330
## Which issue does this PR close?
- Part of https://github.com/apache/datafusion/pull/15289
## Rationale for this change
I'm refactor the method `statistics_from
andygrove commented on code in PR #1525:
URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2004547775
##
docs/source/user-guide/tuning.md:
##
@@ -17,18 +17,96 @@ specific language governing permissions and limitations
under the License.
-->
-# Tuning Guid
PokIsemaine commented on code in PR #15183:
URL: https://github.com/apache/datafusion/pull/15183#discussion_r2005632036
##
datafusion/sql/src/planner.rs:
##
@@ -560,11 +558,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
SQLDataType::SmallInt(_) | SQLDataType::
xudong963 merged PR #15332:
URL: https://github.com/apache/datafusion/pull/15332
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
wForget commented on code in PR #1555:
URL: https://github.com/apache/datafusion-comet/pull/1555#discussion_r2005775054
##
native/core/src/parquet/mod.rs:
##
@@ -641,6 +640,8 @@ pub unsafe extern "system" fn
Java_org_apache_comet_parquet_Native_initRecordBat
session_timezo
westhide commented on code in PR #15311:
URL: https://github.com/apache/datafusion/pull/15311#discussion_r2005735108
##
datafusion/proto/src/physical_plan/mod.rs:
##
@@ -247,6 +247,15 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode {
.with_file_compressio
robtandy opened a new pull request, #86:
URL: https://github.com/apache/datafusion-ray/pull/86
This PR is long but it does not affect the core functionality of DataFusion
for Ray, and does not differ from `0.1.0rc1` which has been extensively used by
me in benchmarking from `test.pypi`.
robtandy commented on PR #86:
URL: https://github.com/apache/datafusion-ray/pull/86#issuecomment-2740336109
@andygrove Here is the PR I mentioned to you that I would submit with
benchmarking code and results.
I have some good graphs of the results, but i'll submit them in a subseque
deanm commented on issue #1064:
URL:
https://github.com/apache/datafusion-python/issues/1064#issuecomment-2740490227
Is this something a PR would be accepted for or no?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
alan910127 commented on code in PR #15110:
URL: https://github.com/apache/datafusion/pull/15110#discussion_r2005666997
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> {
right: Expr,
right_schema:
alan910127 commented on code in PR #15110:
URL: https://github.com/apache/datafusion/pull/15110#discussion_r2005666997
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> {
right: Expr,
right_schema:
alan910127 commented on code in PR #15110:
URL: https://github.com/apache/datafusion/pull/15110#discussion_r2005677168
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> {
right: Expr,
right_schema:
Dandandan commented on code in PR #15324:
URL: https://github.com/apache/datafusion/pull/15324#discussion_r2005787338
##
datafusion/functions-aggregate/src/count.rs:
##
@@ -752,10 +761,245 @@ impl Accumulator for DistinctCountAccumulator {
}
}
+/// GroupsAccumulator for
wForget commented on PR #1554:
URL:
https://github.com/apache/datafusion-comet/pull/1554#issuecomment-2740681759
> I just wanted to see in what condition the NativeBatchReader can be called
after close has been called.
The scenario I encountered was not NativeBatchReader called afte
Dandandan commented on code in PR #15324:
URL: https://github.com/apache/datafusion/pull/15324#discussion_r2005787338
##
datafusion/functions-aggregate/src/count.rs:
##
@@ -752,10 +761,245 @@ impl Accumulator for DistinctCountAccumulator {
}
}
+/// GroupsAccumulator for
westhide commented on code in PR #15335:
URL: https://github.com/apache/datafusion/pull/15335#discussion_r2005989730
##
datafusion/proto/proto/datafusion.proto:
##
@@ -997,6 +997,7 @@ message FileScanExecConf {
reserved 10;
datafusion_common.Constraints constraints = 11;
alamb commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740980237
> Thanks for checking [@alamb](https://github.com/alamb) !
>
> I think a large portion is spent in the hash join (repartitioning the
right side input) - I think because it r
alamb commented on PR #15316:
URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2741010424
> where does Memtable belong datasource or catalog? it is TableProvider
implementation so I thought It was going to be in catalog, but I m not so sure
anymore as it has dependency on d
alamb commented on PR #15313:
URL: https://github.com/apache/datafusion/pull/15313#issuecomment-2740902341
FYI @blaginin
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
logan-keede commented on PR #15316:
URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2740991793
where does Memtable belong datasource or catalog? it is TableProvider
implementation so I thought It was going to be in catalog, but I m not so sure
anymore as it has dependency
andygrove commented on code in PR #1561:
URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2005998677
##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -1334,26 +1334,46 @@ object CometSparkSessionExtensions extends Logging {
logan-keede commented on PR #15316:
URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2741075889
I thought it mattered because `datasource` has an dependency on `catalog`
but on a second look it is only `Session`. Any plans on pulling `Session` out?
also corresponding `
alamb commented on PR #60:
URL: https://github.com/apache/datafusion-site/pull/60#issuecomment-2741065799
And it is live:
https://datafusion.apache.org/blog/2025/03/20/parquet-pruning/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
adriangb commented on PR #15261:
URL: https://github.com/apache/datafusion/pull/15261#issuecomment-2741174091
Marking as ready for review. The main TODO is an API for transmitting
statistics information for generated columns before they get generated, but
that can even be a followup PR.
-
Omega359 commented on code in PR #61:
URL: https://github.com/apache/datafusion-site/pull/61#discussion_r2006154315
##
content/blog/2025-03-21-parquet-pushdown.md:
##
@@ -0,0 +1,259 @@
+---
+layout: post
+title: Efficient Filter Pushdown in Parquet
+date: 2025-03-21
+author: Xia
XiangpengHao commented on PR #62:
URL: https://github.com/apache/datafusion-site/pull/62#issuecomment-2741287848
Thank you @kevinjqliu
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
arpity22 commented on issue #102:
URL: https://github.com/apache/datafusion/issues/102#issuecomment-2741307866
Since this issue was opened a while ago, has it been resolved but not
updated here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please
alamb merged PR #15253:
URL: https://github.com/apache/datafusion/pull/15253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
Dandandan commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740739642
> Note that late materialization (the join / semi join rewrite) needs join
operator support that DataFusion doesn't yet have (we could add it but it will
take non trivial effo
blaginin commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2004184344
##
datafusion/core/tests/parquet/custom_reader.rs:
##
@@ -96,17 +97,15 @@ async fn
route_data_access_ops_to_parquet_file_reader_factory() {
let task_ctx = s
alamb commented on PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2740745436
Thank you @irenjj @jayzhan211 and @xudong963 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above t
alamb closed issue #15252: [tree explain] Simplify display format of
`AggregateFunctionExpr`
URL: https://github.com/apache/datafusion/issues/15252
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
findepi commented on issue #13814:
URL: https://github.com/apache/datafusion/issues/13814#issuecomment-2740776452
> I was wondering recently if the the probably could be related to all the
re-exports (`pub use `) we do in DataFusion. Maybe we could see if reducing
them helped 🤔
i
alamb merged PR #14547:
URL: https://github.com/apache/datafusion/pull/14547
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on PR #14547:
URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2740784949
I also added this feature to the list of things we should document with the
next release
- https://github.com/apache/datafusion/issues/15072
Thanks again @geoffreyclaude and @
goldmedal opened a new pull request, #15334:
URL: https://github.com/apache/datafusion/pull/15334
## Which issue does this PR close?
- Closes #13486
## Rationale for this change
When working on unparsing the plan optimized by `ScalarSubqueryToJoin`, I
notice
alamb commented on code in PR #15335:
URL: https://github.com/apache/datafusion/pull/15335#discussion_r2005949069
##
datafusion/proto/proto/datafusion.proto:
##
@@ -997,6 +997,7 @@ message FileScanExecConf {
reserved 10;
datafusion_common.Constraints constraints = 11;
+
alamb closed issue #1209: Unsupported NdJsonExec plan and extension codec
URL: https://github.com/apache/datafusion-ballista/issues/1209
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
alamb commented on issue #15037:
URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2740932019
Thanks @adriangb -- I will try and review it asap (hopefully tomorrow
afternoon or tomorrow)
--
This is an automated message from the Apache Git Service.
To respond to the mess
Dandandan commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740936826
Thanks for checking @alamb !
I think a large portion is spent in the h join (repartitioning the right
input) - I think because it runs as `Partitioned` hash join, instea
alamb merged PR #15311:
URL: https://github.com/apache/datafusion/pull/15311
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740888007
I tried the rewrite into a Semi join and indeed it is over 2x slower (5.3sec
vs 12sec)
```sql
> SELECT * from 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY
"Ev
alamb commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740900315
I am not really sure where the time is going 🤔
output of explain analyze:
[explain.txt](https://github.com/user-attachments/files/19370532/explain.txt)
--
This
alamb commented on code in PR #15327:
URL: https://github.com/apache/datafusion/pull/15327#discussion_r2005971323
##
datafusion/physical-expr/src/expressions/binary.rs:
##
@@ -793,8 +793,10 @@ impl BinaryExpr {
BitwiseShiftRight => bitwise_shift_right_dyn(left, righ
alamb commented on code in PR #15316:
URL: https://github.com/apache/datafusion/pull/15316#discussion_r2005982815
##
datafusion/physical-expr/src/physical_expr.rs:
##
@@ -146,6 +148,38 @@ pub fn create_ordering(
Ok(all_sort_orders)
}
+/// Create a physical sort expressio
andygrove commented on code in PR #1561:
URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2006287257
##
spark/src/main/scala/org/apache/comet/CometExecIterator.scala:
##
@@ -63,9 +64,28 @@ class CometExecIterator(
}.toArray
private val plan = {
va
kazuyukitanimura commented on code in PR #1556:
URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2006238491
##
spark/src/test/scala/org/apache/comet/WithHdfsCluster.scala:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under on
Dandandan opened a new pull request, #15339:
URL: https://github.com/apache/datafusion/pull/15339
## Which issue does this PR close?
- Closes #.
## Rationale for this change
## What changes are included in this PR?
## Are these changes teste
101 - 158 of 158 matches
Mail list logo