Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-06 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3043682960 > > I would appreciate it, it would greatly help me > > @rluvaton I opened a pr on your fork. Would you take a look when you have some time? I **really** appriciate the

Re: [I] Statistics: Migrate to `Distribution` from `Precision` [datafusion]

2025-07-06 Thread via GitHub
cj-zhukov commented on issue #14896: URL: https://github.com/apache/datafusion/issues/14896#issuecomment-3043537397 unassign me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Statistics: Migrate to `Distribution` from `Precision` [datafusion]

2025-07-06 Thread via GitHub
cj-zhukov commented on issue #14896: URL: https://github.com/apache/datafusion/issues/14896#issuecomment-3043536847 I’m unassigning myself from this issue for now as I haven’t been able to make progress. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Per file filter evaluation [datafusion]

2025-07-06 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2188925012 ## datafusion-examples/examples/variant_shredding.rs: ## @@ -0,0 +1,398 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Per file filter evaluation [datafusion]

2025-07-06 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2188925012 ## datafusion-examples/examples/variant_shredding.rs: ## @@ -0,0 +1,398 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Overflow happened on: `Long.MinValue div -1` [datafusion-comet]

2025-07-06 Thread via GitHub
wForget commented on issue #1477: URL: https://github.com/apache/datafusion-comet/issues/1477#issuecomment-3043375532 > I can actually get a successful output ( in parity with spark) if I remove the `ORDER BY ` clause . However, the query fails when I do add the `ORDER BY` . [@wForget](htt

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-07-06 Thread via GitHub
kosiew closed pull request #15295: Enhance Schema adapter to accommodate evolving struct URL: https://github.com/apache/datafusion/pull/15295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-3043339058 Reimplementation completed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188857055 ## datafusion/expr/src/test/mod.rs: ## @@ -16,3 +16,5 @@ // under the License. pub mod function_stub; +#[cfg(test)] +pub mod udf_equals; Review Comment: Goo

[PR] feat: Add JNI-based Hadoop FileSystem support for S3 and other Hadoop-compatible stores [datafusion-comet]

2025-07-06 Thread via GitHub
drexler-sky opened a new pull request, #1992: URL: https://github.com/apache/datafusion-comet/pull/1992 ## Which issue does this PR close? Closes #. ## Rationale for this change This PR adds support for Approach 2 (JNI-based Hadoop FileSystem access) to e

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188833299 ## datafusion/expr/src/udf.rs: ## @@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { /// Return true if this scalar UDF is equal to the oth

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188826793 ## docs/source/library-user-guide/upgrading.md: ## @@ -62,6 +62,36 @@ DataFusionError::SchemaError( [#16652]: https://github.com/apache/datafusion/issues/16652

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188817382 ## datafusion/expr/src/udf.rs: ## @@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { /// Return true if this scalar UDF is equal to the oth

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188804048 ## datafusion/expr/src/test/udf_equals.rs: ## @@ -0,0 +1,187 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188802671 ## datafusion/expr/src/test/udf_equals.rs: ## @@ -0,0 +1,187 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [I] Blog post for the DataFusion 47, 48, and 49 releases [datafusion]

2025-07-06 Thread via GitHub
Omega359 commented on issue #16347: URL: https://github.com/apache/datafusion/issues/16347#issuecomment-3043214803 https://github.com/apache/datafusion-site/pull/83 https://github.com/apache/datafusion-site/pull/84 -- This is an automated message from the Apache Git Service. To resp

[PR] DF 48 blog post [datafusion-site]

2025-07-06 Thread via GitHub
Omega359 opened a new pull request, #84: URL: https://github.com/apache/datafusion-site/pull/84 First cut at a DF 48 blog post as mentioned in https://github.com/apache/datafusion/issues/15072. Please let me know of anything you wish to add/modify -- This is an automated message fr

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-06 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3043202973 > I would appreciate it, it would greatly help me @rluvaton I opened a pr on your fork. Would you take a look when you have some time? -- This is an automated message fr

Re: [PR] chore: extract CreateArray from QueryPlanSerde [datafusion-comet]

2025-07-06 Thread via GitHub
codecov-commenter commented on PR #1991: URL: https://github.com/apache/datafusion-comet/pull/1991#issuecomment-3043001484 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1991?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Add support for Snowflake identifier function [datafusion-sqlparser-rs]

2025-07-06 Thread via GitHub
yoavcloud opened a new pull request, #1929: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1929 Expand the support for the `IDENTIFIER` function in Snowflake, which is used to generate identifiers dynamically. See here: https://docs.snowflake.com/en/sql-reference/identifier-lit

[PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-06 Thread via GitHub
Omega359 opened a new pull request, #83: URL: https://github.com/apache/datafusion-site/pull/83 First cut at a DF 47 blog post as mentioned in https://github.com/apache/datafusion/issues/15072. Please let me know of anything you wish to add/modify -- This is an automated message fr

[I] Unnested fields are not filterable when using subqueries. [datafusion]

2025-07-06 Thread via GitHub
hmadison opened a new issue, #16695: URL: https://github.com/apache/datafusion/issues/16695 ### Describe the bug When attempting to evaluate a logical plan which mixes un-nesting and sub-queries, an unfiltered call to show will return the full set of columns, but attempting to filter

[PR] Fix test running compatibility [datafusion]

2025-07-06 Thread via GitHub
mjgarton opened a new pull request, #16694: URL: https://github.com/apache/datafusion/pull/16694 sqllogictests.rs already has various parts of it's Options struct that provide compatibility with the standard test running options. Extend this to include the standard `--test-threads` op

[I] Running tests with `--test-threads` option fails. [datafusion]

2025-07-06 Thread via GitHub
mjgarton opened a new issue, #16693: URL: https://github.com/apache/datafusion/issues/16693 ### Describe the bug Running `cargo test -- --test-threads 1` in the project root fails, complaining about unrecognised option. ### To Reproduce Run `cargo test -- --test-threads

Re: [PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-06 Thread via GitHub
liamzwbao commented on code in PR #16686: URL: https://github.com/apache/datafusion/pull/16686#discussion_r2188415579 ## datafusion/datasource/src/source.rs: ## @@ -325,6 +328,9 @@ impl ExecutionPlan for DataSourceExec { new_node.data_source = data_source;

Re: [PR] Remove unused AggregateUDF struct [datafusion]

2025-07-06 Thread via GitHub
xudong963 merged PR #16683: URL: https://github.com/apache/datafusion/pull/16683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Release DataFusion `48.0.1` [datafusion]

2025-07-06 Thread via GitHub
xudong963 commented on issue #16486: URL: https://github.com/apache/datafusion/issues/16486#issuecomment-3041906556 > [@xudong963](https://github.com/xudong963) do you have additional suggestions for issues that should be backported? Sorry, just come back, lgtm. -- This is an autom

Re: [PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-06 Thread via GitHub
xudong963 commented on PR #16686: URL: https://github.com/apache/datafusion/pull/16686#issuecomment-3041900763 I'll verify the PR in our fork tomorrow, thanks @liamzwbao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-06 Thread via GitHub
xudong963 commented on code in PR #16686: URL: https://github.com/apache/datafusion/pull/16686#discussion_r2188374284 ## datafusion/datasource/src/source.rs: ## @@ -372,6 +378,20 @@ impl DataSourceExec { self } +/// Add filters' equivalence info +fn add_f

[I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-06 Thread via GitHub
timsaucer opened a new issue, #1181: URL: https://github.com/apache/datafusion-python/issues/1181 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `PyLogicalPlan` can currently only serialize or deserialize built in functions a

Re: [PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-06 Thread via GitHub
liamzwbao commented on code in PR #16686: URL: https://github.com/apache/datafusion/pull/16686#discussion_r2188365661 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -239,6 +349,23 @@ physical_plan 05)RepartitionExec: partitioning=RoundRobinBatc

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-07-06 Thread via GitHub
timsaucer closed issue #1032: Google Summer of Code - Ideas and Coordination URL: https://github.com/apache/datafusion-python/issues/1032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Add CatalogProvider API [datafusion-python]

2025-07-06 Thread via GitHub
timsaucer commented on issue #1103: URL: https://github.com/apache/datafusion-python/issues/1103#issuecomment-3041858894 Closed by https://github.com/apache/datafusion-python/pull/1156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Add CatalogProvider API [datafusion-python]

2025-07-06 Thread via GitHub
timsaucer closed issue #1103: Add CatalogProvider API URL: https://github.com/apache/datafusion-python/issues/1103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[PR] chore: extract CreateArray from QueryPlanSerde [datafusion-comet]

2025-07-06 Thread via GitHub
tglanz opened a new pull request, #1991: URL: https://github.com/apache/datafusion-comet/pull/1991 ## Which issue does this PR close? Closes #1990 . ## Rationale for this change Organization. ## What changes are included in this PR? Move the relevant mat

[I] chore: extract CreateArray from QueryPlanSerde [datafusion-comet]

2025-07-06 Thread via GitHub
tglanz opened a new issue, #1990: URL: https://github.com/apache/datafusion-comet/issues/1990 Should use `CometExpressionSerde` and extract `CreateArray` from `QueryPlanSerde` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-06 Thread via GitHub
JigaoLuo commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041731900 Have my final pass. It looks very nice :fire: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Blog : Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-06 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041575868 > I pushed some non trivial changes to this blog: > > 1. Added @JigaoLuo as an author (hope this is ok @zhuqi-lucas ) > 2. Added a section with a high level overview of a

Re: [PR] Blog :Extending Apache Parquet with User Defined Indexes to Accelerate Query Processing with DataFusion [datafusion-site]

2025-07-06 Thread via GitHub
alamb commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2188284374 ## content/blog/2025-07-07-user-defined-parquet-indexes.md: ## @@ -0,0 +1,542 @@ +--- +layout: post +title: Extending Apache Parquet with User Defined Indexes to Acce

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-06 Thread via GitHub
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041532117 I pushed some non trivial changes to this blog: 1. Added @JigaoLuo as an author (hope this is ok @zhuqi-lucas ) 2. Added a section with a high level overview of adding user define

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-06 Thread via GitHub
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041511194 Thank you -- I have spent a while this morning adding additional content -- I will push an update soon -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-06 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041421377 > zhuqi-lucas#1 Thank you @JigaoLuo , merged your changes! > Two small nitpicks I came across today: > > * "Footer" vs. "Metadata" ?: Apologies for bein

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-06 Thread via GitHub
JigaoLuo commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041396574 And the outlook section I did is in this PR: https://github.com/zhuqi-lucas/datafusion-site/pull/1 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-06 Thread via GitHub
JigaoLuo commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3041395147 Two small nitpicks I came across today: - "Footer" vs. "Metadata" ?: Apologies for being pedantic, but I think we’re consistently referring to metadata here, not just the footer. X

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-06 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3041327170 Thank you @alamb , a minor topic is i may pick up this: http://github.com/apache/datafusion/pull/13933 To use this user-defined index or parquet SortColumn metad

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-06 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3041322628 > User-Defined Index. I think this is a really good term -- I will update the blog post in https://github.com/apache/datafusion-site/pull/79 to use that -- This is an aut

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-06 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3041320800 > Thank you [@alamb](https://github.com/alamb) [@JigaoLuo](https://github.com/JigaoLuo) [@adriangb](https://github.com/adriangb) , i agree current example is the start, we can fu

[PR] Clarify the generality of the embedded parquet index [datafusion]

2025-07-06 Thread via GitHub
alamb opened a new pull request, #16692: URL: https://github.com/apache/datafusion/pull/16692 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/16395 - Related to https://github.com/apache/datafusion/issues/16374 ## Rationale

Re: [PR] Make `GenericDialect` support trailing commas in projections [datafusion-sqlparser-rs]

2025-07-06 Thread via GitHub
alamb commented on PR #1921: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1921#issuecomment-3041302746 Thank you -- I think this will be really nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` [datafusion]

2025-07-06 Thread via GitHub
Samyak2 commented on PR #16500: URL: https://github.com/apache/datafusion/pull/16500#issuecomment-3041233637 Fixed the formatting issues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] [substrait] [sqllogictest] Cannot convert to Substrait [datafusion]

2025-07-06 Thread via GitHub
ViggoC commented on issue #16281: URL: https://github.com/apache/datafusion/issues/16281#issuecomment-3041228438 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] implement handle_scalar_subquery [datafusion]

2025-07-06 Thread via GitHub
ViggoC opened a new pull request, #16691: URL: https://github.com/apache/datafusion/pull/16691 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[PR] Fix for Postgres regex and like binary operators [datafusion-sqlparser-rs]

2025-07-06 Thread via GitHub
solontsev opened a new pull request, #1928: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1928 Closes #1776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Upgrade to sqlparser 0.56.0 [datafusion]

2025-07-06 Thread via GitHub
Standing-Man commented on issue #16405: URL: https://github.com/apache/datafusion/issues/16405#issuecomment-3041178377 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] 1000x slowdown opening parquet file due to partitions [datafusion]

2025-07-06 Thread via GitHub
asayers commented on issue #16676: URL: https://github.com/apache/datafusion/issues/16676#issuecomment-3041069977 I _think_ the correct solution was for me to enable the metadata cache (I haven't confirmed this). So perhaps the "bug" (if there is one) is just that the metadata cache is off

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-06 Thread via GitHub
zhuqi-lucas commented on code in PR #16630: URL: https://github.com/apache/datafusion/pull/16630#discussion_r2188072028 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -288,6 +288,64 @@ impl CursorArray for StringViewArray { } } +/// Todo use arrow-rs side api aft

Re: [PR] Clickhouse: support empty parenthesized options [datafusion-sqlparser-rs]

2025-07-06 Thread via GitHub
iffyio merged PR #1925: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Clickhouse CREATE TABLE .... Engine = MergeTree() should be supported [datafusion-sqlparser-rs]

2025-07-06 Thread via GitHub
iffyio closed issue #1853: Clickhouse CREATE TABLE Engine = MergeTree() should be supported URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a