Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
berkaysynnada merged PR #15973: URL: https://github.com/apache/datafusion/pull/15973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] Box field to reduce DatafusionError size [datafusion]

2025-05-08 Thread via GitHub
ctsk opened a new pull request, #15990: URL: https://github.com/apache/datafusion/pull/15990 The `field: Column` field in SchemaError::AmbiguousReference grew the size of DatafusionError from 72 bytes to 112 bytes. Putting it on the heap fixes that. ## Are there any user-facing chang

[PR] refactor: remove deprecated `CsvExec` [datafusion]

2025-05-08 Thread via GitHub
miroim opened a new pull request, #15991: URL: https://github.com/apache/datafusion/pull/15991 ## Which issue does this PR close? Part of #15950 . ## Rationale for this change The `CsvExec` structure was deprecated in DataFusion 46 and is scheduled for removal. Deve

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-08 Thread via GitHub
UBarney commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2079325033 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -

Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-05-08 Thread via GitHub
Chen-Yuan-Lai commented on PR #15946: URL: https://github.com/apache/datafusion/pull/15946#issuecomment-2862197349 @comphead Thank you for your review. You're right - my current implementation doesn't preserve backtraces. To preserve backtraces, I think there are two options: 1. **

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
berkaysynnada commented on code in PR #15973: URL: https://github.com/apache/datafusion/pull/15973#discussion_r2079129222 ## datafusion/datasource-parquet/src/mod.rs: ## @@ -32,511 +30,18 @@ mod row_group_filter; pub mod source; mod writer; -use std::any::Any; -use std::fmt:

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
berkaysynnada commented on PR #15973: URL: https://github.com/apache/datafusion/pull/15973#issuecomment-2862140172 Thank you @miroim. There are also other deprecated source execs (json, csv, arrow, avro, memory). You are welcomed to remove them too -- This is an automated message from th

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
miroim commented on PR #15973: URL: https://github.com/apache/datafusion/pull/15973#issuecomment-2862177085 > There are also other deprecated source execs (json, csv, arrow, avro, memory). You are welcomed to remove them too I'll take a look at them -- This is an automated message

Re: [I] Implement method to apply scalar or aggregate function to Array elements [datafusion]

2025-05-08 Thread via GitHub
KR-bluejay commented on issue #15882: URL: https://github.com/apache/datafusion/issues/15882#issuecomment-2862498585 @alamb Following your comment about making array functions more general, I suggest we create a common directory for array operations to reduce code duplication:

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-08 Thread via GitHub
UBarney commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2079325033 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -

[I] Add flag to control reordering metrics for display [datafusion]

2025-05-08 Thread via GitHub
niebayes opened a new issue, #15992: URL: https://github.com/apache/datafusion/issues/15992 `DisplayableExecution::indent` always reorder metrics. However, ones might expect the order of the metrics conform to the registration order. I propose to add a flag to control the reordering b

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15973: URL: https://github.com/apache/datafusion/pull/15973#issuecomment-2862608755 Thank you again @miroim and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] chore(deps): bump insta from 1.42.2 to 1.43.1 [datafusion]

2025-05-08 Thread via GitHub
dependabot[bot] opened a new pull request, #15988: URL: https://github.com/apache/datafusion/pull/15988 Bumps [insta](https://github.com/mitsuhiko/insta) from 1.42.2 to 1.43.1. Release notes Sourced from https://github.com/mitsuhiko/insta/releases";>insta's releases. 1.43.1

Re: [PR] Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed [datafusion]

2025-05-08 Thread via GitHub
xudong963 commented on PR #15539: URL: https://github.com/apache/datafusion/pull/15539#issuecomment-2862820995 Let's close the one, I'll open a new PR to solve this, because we have made many changes about how to merge stats -- This is an automated message from the Apache Git Service. To

Re: [PR] chore(deps): bump insta from 1.42.2 to 1.43.1 [datafusion]

2025-05-08 Thread via GitHub
xudong963 merged PR #15988: URL: https://github.com/apache/datafusion/pull/15988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed [datafusion]

2025-05-08 Thread via GitHub
xudong963 closed pull request #15539: Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed URL: https://github.com/apache/datafusion/pull/15539 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Re-Add CodeCov [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2862874818 Starting with manual triggering sounds like a good idea to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-08 Thread via GitHub
jonahgao commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2079814968 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
vaibhawvipul commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2863263757 Thank you for mention! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2863268668 Thanks @huaxingao. The implementation changes LGTM, but I would like to understand how this will be tested. -- This is an automated message from the Apache Git Service. To r

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
andygrove commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2863253288 @shehabgamin @alamb fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] feat: bucketed scan for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
mbutrovich commented on issue #1719: URL: https://github.com/apache/datafusion-comet/issues/1719#issuecomment-2863289065 Interestingly, native_datafusion currently passes the "bucketed table" Comet test. I suspect the Spark SQL tests are doing some bucket pruning, which is where we end up

[PR] Fix: `build_predicate_expression` method doesn't process `false` expr correctly [datafusion]

2025-05-08 Thread via GitHub
xudong963 opened a new pull request, #15995: URL: https://github.com/apache/datafusion/pull/15995 ## Which issue does this PR close? - Closes #. ## Rationale for this change `build_predicate_expression` method doesn't process `false` expr correctly, it'll

[PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-08 Thread via GitHub
irenjj opened a new pull request, #15993: URL: https://github.com/apache/datafusion/pull/15993 ## Which issue does this PR close? - Part of #15886 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080056995 ## datafusion/expr/src/expr.rs: ## @@ -1775,6 +1775,27 @@ impl Expr { | Expr::SimilarTo(Like { expr, pattern, .. }) => { rewri

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080123467 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1507,6 +1515,9 @@ impl LogicalPlan { (_, Some(dt)) => {

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.

[PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
andygrove opened a new pull request, #15994: URL: https://github.com/apache/datafusion/pull/15994 ## Which issue does this PR close? N/A ## Rationale for this change Add another Spark expression. This expression was originally contributed to Comet by @vai

[PR] chore(deps): bump testcontainers from 0.23.3 to 0.24.0 [datafusion]

2025-05-08 Thread via GitHub
dependabot[bot] opened a new pull request, #15989: URL: https://github.com/apache/datafusion/pull/15989 Bumps [testcontainers](https://github.com/testcontainers/testcontainers-rs) from 0.23.3 to 0.24.0. Release notes Sourced from https://github.com/testcontainers/testcontainers-rs/

Re: [PR] Update extending-operators.md [datafusion]

2025-05-08 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2863750241 > > > hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ? > > > > > >

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-08 Thread via GitHub
andygrove commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2079751198 ## datafusion/spark/src/function/math/ceil_floor.rs: ## @@ -0,0 +1,720 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove merged PR #1721: URL: https://github.com/apache/datafusion-comet/pull/1721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-08 Thread via GitHub
ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2863005079 SInce partition does not appear to be a limiting factor in aggregations, I wonder if it makes sense to investigate a lower-quality pre-aggregation (i.e. let more tuples pass to the fina

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
mbutrovich commented on PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720#issuecomment-2863012148 Interestingly, we were passing the Comet version of bucketed scan before: https://github.com/apache/datafusion-comet/actions/runs/14893567863/job/41846178932?pr=1720#st

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
huaxingao commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2079865065 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1217,34 +1217,6 @@ abstract class ParquetReadSuite extends CometTestBase {

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-08 Thread via GitHub
andygrove commented on PR #15947: URL: https://github.com/apache/datafusion/pull/15947#issuecomment-2863214007 I agree with @alamb that we should go ahead and merge this, and then I can update Comet to use it so we have confidence that this approach is working (I think it is). Thanks for th

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-08 Thread via GitHub
andygrove merged PR #15947: URL: https://github.com/apache/datafusion/pull/15947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-08 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2863584067 The current benchmark results in my 16cores local: - target_partitions = 8 (amazing 1.71x faster) ``` // main Query 7 iteration 0 took 4256.6 ms and returned 10 rows

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-08 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2863608739 BTW, I also simplified codes although not help to performance. like `NullState`, I found we actually don't need to introduce blocked approach for it(even will lead to slight

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
aharpervc commented on code in PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#discussion_r2080184176 ## tests/sqlparser_common.rs: ## @@ -666,6 +666,23 @@ fn parse_select_with_table_alias() { ); } +#[test] +fn parse_consecutive_queries() { R

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-08 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2863265862 I think something like that is done already in the "convert to state" logic - it will dynamically decide to skip aggregating once it sees that the group vs input rows ratio is smal

[PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
mbutrovich opened a new pull request, #1724: URL: https://github.com/apache/datafusion-comet/pull/1724 ## Which issue does this PR close? Closes #. ## Rationale for this change DataSourceExec (and by extension, underlying arrow-rs Parquet reader) does not

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
aharpervc commented on code in PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#discussion_r2080184176 ## tests/sqlparser_common.rs: ## @@ -666,6 +666,23 @@ fn parse_select_with_table_alias() { ); } +#[test] +fn parse_consecutive_queries() { R

[PR] feat(proto): udf decoding fallback [datafusion]

2025-05-08 Thread via GitHub
leoyvens opened a new pull request, #15997: URL: https://github.com/apache/datafusion/pull/15997 ## Which issue does this PR close? - Closes #15996. ## Rationale for this change We need this for more flexible logical plan decoding. ## Are these changes tested?

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
aharpervc commented on code in PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#discussion_r2078041514 ## tests/sqlparser_mssql.rs: ## @@ -100,48 +100,52 @@ fn parse_mssql_delimited_identifiers() { #[test] fn parse_create_procedure() { -let sql

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2863794206 Thanks @Spaarsh -- good luck with your examps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2080285112 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] Implement RightSemi join for SortMergeJoin [datafusion]

2025-05-08 Thread via GitHub
irenjj commented on PR #15972: URL: https://github.com/apache/datafusion/pull/15972#issuecomment-2864850900 > for _ in 0..1000 { Have already run it over 1000 times locally, and there were no errors. ``` running 1 test test fuzz_cases::join_fuzz::test_right_semi_join_1k_fi

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-08 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2864869245 > there's 10 v1 files, 5 v2 files, 3 v3 files and 1 v4 files. Ideally ListingTableConfig could just derive the mapping from each. Is that possible with your abstraction? # Here

Re: [PR] style: simplify some strings for readability [datafusion]

2025-05-08 Thread via GitHub
kosiew commented on code in PR #15999: URL: https://github.com/apache/datafusion/pull/15999#discussion_r2080846902 ## datafusion/ffi/src/plan_properties.rs: ## @@ -321,7 +321,7 @@ mod tests { let foreign_props: PlanProperties = local_props_ptr.try_into()?; -

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove merged PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
tlm365 commented on code in PR #15994: URL: https://github.com/apache/datafusion/pull/15994#discussion_r2080913084 ## datafusion/spark/src/function/string/char.rs: ## @@ -0,0 +1,130 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Handle dicts for distinct count [datafusion]

2025-05-08 Thread via GitHub
blaginin commented on PR #15871: URL: https://github.com/apache/datafusion/pull/15871#issuecomment-2864380303 ``` group main pr -

[PR] style: simplify some strings for readability [datafusion]

2025-05-08 Thread via GitHub
hamirmahal opened a new pull request, #15999: URL: https://github.com/apache/datafusion/pull/15999 ## Which issue does this PR close? - Closes #15998 ## Rationale for this change The goal of this pull request is to improve code readability and maintainability.

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove merged PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2080419849 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -28,33 +28,33 @@ public class Utils { - /** This method is called from Apache Iceberg

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2080426070 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -33,7 +33,7 @@ import org.apache.spark.sql.execution.datasources.v2.parquet.Parquet

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-08 Thread via GitHub
alamb merged PR #15869: URL: https://github.com/apache/datafusion/pull/15869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps-dev): bump org.apache.parquet:parquet-avro from 1.13.1 to 1.15.2 [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on PR #1717: URL: https://github.com/apache/datafusion-comet/pull/1717#issuecomment-2864261088 Spark SQL tests might have issues downloading JARs? One failed with ``` [info] - SPARK-21617: ALTER TABLE for non-compatible DataSource tables *** FAILED *** (3

Re: [I] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-08 Thread via GitHub
alamb closed issue #15868: Substrait: Handle inner map fields in schema renaming URL: https://github.com/apache/datafusion/issues/15868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15869: URL: https://github.com/apache/datafusion/pull/15869#issuecomment-2864264598 Thanks again everyone@! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] refactor: remove deprecated `CsvExec` [datafusion]

2025-05-08 Thread via GitHub
alamb merged PR #15991: URL: https://github.com/apache/datafusion/pull/15991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] refactor: remove deprecated `CsvExec` [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15991: URL: https://github.com/apache/datafusion/pull/15991#issuecomment-2864266158 Thank you @miroim and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2864278552 I think this is a draft so marking it as such to try and make it clearer what PRs are waiting on review -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-08 Thread via GitHub
blaginin commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r2080589228 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -1569,37 +1810,60 @@ mod test { let empty = empty_with_type(DataType::Boolean); let

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080599583 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-08 Thread via GitHub
Copilot commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r2080588006 ## datafusion/optimizer/src/test/mod.rs: ## @@ -99,29 +99,20 @@ pub fn get_tpch_table_schema(table: &str) -> Schema { } } Review Comment: [nitpick] Con

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080601400 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

[I] Pass `PartitionedFile` into `FileSource` for late file stats based pruning [datafusion]

2025-05-08 Thread via GitHub
adriangb opened a new issue, #16000: URL: https://github.com/apache/datafusion/issues/16000 ### Is your feature request related to a problem or challenge? As we continue to make progress landing dynamic filters it opens up the opportunity for new optimizations. This one deals w

Re: [I] Pass `PartitionedFile` into `FileSource` for late file stats based pruning [datafusion]

2025-05-08 Thread via GitHub
adriangb commented on issue #16000: URL: https://github.com/apache/datafusion/issues/16000#issuecomment-2865140566 cc @berkaysynnada @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-08 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2865192764 @kosiew thanks I tried that but am getting this error: ``` Error fetching table metadata: Failed to collect data frame results: Shared(ArrowError(ExternalError(Executio

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
codecov-commenter commented on PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724#issuecomment-2864336010 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1724?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080478955 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4695,34 +4698,37 @@ fn test_infer_types_from_between_predicate() { // replace params with values let

Re: [PR] refactor: remove deprecated `AvroExec` [datafusion]

2025-05-08 Thread via GitHub
comphead merged PR #15987: URL: https://github.com/apache/datafusion/pull/15987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2080640522 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_str(), N

Re: [PR] Add support for the MATCH and REGEXP binary operators [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio merged PR #1840: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-08 Thread via GitHub
shehabgamin commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2080822728 ## datafusion/sqllogictest/test_files/spark/math/ceil.slt: ## @@ -0,0 +1,141 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-08 Thread via GitHub
shehabgamin commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2080822728 ## datafusion/sqllogictest/test_files/spark/math/ceil.slt: ## @@ -0,0 +1,141 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080604387 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-08 Thread via GitHub
kosiew commented on code in PR #15993: URL: https://github.com/apache/datafusion/pull/15993#discussion_r2080888952 ## datafusion/sql/src/relation/mod.rs: ## @@ -241,7 +246,7 @@ fn optimize_subquery_sort(plan: LogicalPlan) -> Result> } match c { Lo

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
codecov-commenter commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2864777078 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1723?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-08 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2865358754 > EnforceSorting and EnforceDistribution are initially designed to operate orthogonally, and there are many tests for that. But it seems somehow we broke it. Maybe worth conf

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2080647857 ## src/ast/data_type.rs: ## @@ -716,7 +724,15 @@ impl fmt::Display for DataType { DataType::Unspecified => Ok(()), DataType::T

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-05-08 Thread via GitHub
Spaarsh commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2863419930 @alamb sure. I have my uni exams going on right now. That, along with some other commitments, might take me a while for porting these functions. If that is alright, then I'll be happ

Re: [PR] Add `PrimitiveDistinctCountGroupsAccumulator` [datafusion]

2025-05-08 Thread via GitHub
Dandandan commented on PR #15985: URL: https://github.com/apache/datafusion/pull/15985#issuecomment-2863220782 This gets a small performance boost on clickbench query 9 (~9% on my end). I am actually wondering if we can do further. I think we could store something like HashSet<(T::Na

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2079832390 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1217,34 +1217,6 @@ abstract class ParquetReadSuite extends CometTestBase {

Re: [PR] Fix: `build_predicate_expression` method doesn't process `false` expr correctly [datafusion]

2025-05-08 Thread via GitHub
xudong963 commented on code in PR #15995: URL: https://github.com/apache/datafusion/pull/15995#discussion_r2080006481 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -3585,12 +3605,10 @@ mod tests { prune_with_expr( // false -// constan

Re: [PR] Handle dicts for distinct count [datafusion]

2025-05-08 Thread via GitHub
blaginin commented on code in PR #15871: URL: https://github.com/apache/datafusion/pull/15871#discussion_r2080581411 ## datafusion/functions-aggregate/src/count.rs: ## @@ -764,4 +774,49 @@ mod tests { assert_eq!(accumulator.evaluate()?, ScalarValue::Int64(Some(0)));

[I] [feature] allow pretty-printing [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
lovasoa opened a new issue, #1845: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1845 In rust, the Display trait allows specifying an "alternate" display style for every type https://github.com/apache/datafusion-sqlparser-rs/issues/1634 I suggest using this to a

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
huaxingao commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2080636065 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -28,33 +28,33 @@ public class Utils { - /** This method is called from Apache Iceber

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio merged PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080637131 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4695,34 +4698,37 @@ fn test_infer_types_from_between_predicate() { // replace params with values

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio commented on code in PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#discussion_r2080638281 ## tests/sqlparser_mssql.rs: ## @@ -100,48 +100,52 @@ fn parse_mssql_delimited_identifiers() { #[test] fn parse_create_procedure() { -let sql =