Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-08 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2865358754 > EnforceSorting and EnforceDistribution are initially designed to operate orthogonally, and there are many tests for that. But it seems somehow we broke it. Maybe worth conf

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-08 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2865192764 @kosiew thanks I tried that but am getting this error: ``` Error fetching table metadata: Failed to collect data frame results: Shared(ArrowError(ExternalError(Executio

[I] Pass `PartitionedFile` into `FileSource` for late file stats based pruning [datafusion]

2025-05-08 Thread via GitHub
adriangb opened a new issue, #16000: URL: https://github.com/apache/datafusion/issues/16000 ### Is your feature request related to a problem or challenge? As we continue to make progress landing dynamic filters it opens up the opportunity for new optimizations. This one deals w

Re: [I] Pass `PartitionedFile` into `FileSource` for late file stats based pruning [datafusion]

2025-05-08 Thread via GitHub
adriangb commented on issue #16000: URL: https://github.com/apache/datafusion/issues/16000#issuecomment-2865140566 cc @berkaysynnada @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
tlm365 commented on code in PR #15994: URL: https://github.com/apache/datafusion/pull/15994#discussion_r2080913084 ## datafusion/spark/src/function/string/char.rs: ## @@ -0,0 +1,130 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-08 Thread via GitHub
kosiew commented on code in PR #15993: URL: https://github.com/apache/datafusion/pull/15993#discussion_r2080888952 ## datafusion/sql/src/relation/mod.rs: ## @@ -241,7 +246,7 @@ fn optimize_subquery_sort(plan: LogicalPlan) -> Result> } match c { Lo

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove merged PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] style: simplify some strings for readability [datafusion]

2025-05-08 Thread via GitHub
kosiew commented on code in PR #15999: URL: https://github.com/apache/datafusion/pull/15999#discussion_r2080846902 ## datafusion/ffi/src/plan_properties.rs: ## @@ -321,7 +321,7 @@ mod tests { let foreign_props: PlanProperties = local_props_ptr.try_into()?; -

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-08 Thread via GitHub
shehabgamin commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2080822728 ## datafusion/sqllogictest/test_files/spark/math/ceil.slt: ## @@ -0,0 +1,141 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-08 Thread via GitHub
shehabgamin commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2080822728 ## datafusion/sqllogictest/test_files/spark/math/ceil.slt: ## @@ -0,0 +1,141 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-08 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2864869245 > there's 10 v1 files, 5 v2 files, 3 v3 files and 1 v4 files. Ideally ListingTableConfig could just derive the mapping from each. Is that possible with your abstraction? # Here

Re: [PR] Implement RightSemi join for SortMergeJoin [datafusion]

2025-05-08 Thread via GitHub
irenjj commented on PR #15972: URL: https://github.com/apache/datafusion/pull/15972#issuecomment-2864850900 > for _ in 0..1000 { Have already run it over 1000 times locally, and there were no errors. ``` running 1 test test fuzz_cases::join_fuzz::test_right_semi_join_1k_fi

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
codecov-commenter commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2864777078 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1723?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2080647857 ## src/ast/data_type.rs: ## @@ -716,7 +724,15 @@ impl fmt::Display for DataType { DataType::Unspecified => Ok(()), DataType::T

Re: [PR] Add support for the MATCH and REGEXP binary operators [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio merged PR #1840: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2080640522 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_str(), N

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio commented on code in PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#discussion_r2080638281 ## tests/sqlparser_mssql.rs: ## @@ -100,48 +100,52 @@ fn parse_mssql_delimited_identifiers() { #[test] fn parse_create_procedure() { -let sql =

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
iffyio merged PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080637131 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4695,34 +4698,37 @@ fn test_infer_types_from_between_predicate() { // replace params with values

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
huaxingao commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2080636065 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -28,33 +28,33 @@ public class Utils { - /** This method is called from Apache Iceber

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080604387 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080601400 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080599583 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-08 Thread via GitHub
Copilot commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r2080588006 ## datafusion/optimizer/src/test/mod.rs: ## @@ -99,29 +99,20 @@ pub fn get_tpch_table_schema(table: &str) -> Schema { } } Review Comment: [nitpick] Con

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-08 Thread via GitHub
blaginin commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r2080589228 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -1569,37 +1810,60 @@ mod test { let empty = empty_with_type(DataType::Boolean); let

Re: [PR] Handle dicts for distinct count [datafusion]

2025-05-08 Thread via GitHub
blaginin commented on code in PR #15871: URL: https://github.com/apache/datafusion/pull/15871#discussion_r2080581411 ## datafusion/functions-aggregate/src/count.rs: ## @@ -764,4 +774,49 @@ mod tests { assert_eq!(accumulator.evaluate()?, ScalarValue::Int64(Some(0)));

Re: [PR] Fix: `build_predicate_expression` method doesn't process `false` expr correctly [datafusion]

2025-05-08 Thread via GitHub
xudong963 commented on code in PR #15995: URL: https://github.com/apache/datafusion/pull/15995#discussion_r2080006481 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -3585,12 +3605,10 @@ mod tests { prune_with_expr( // false -// constan

Re: [PR] Add `PrimitiveDistinctCountGroupsAccumulator` [datafusion]

2025-05-08 Thread via GitHub
Dandandan commented on PR #15985: URL: https://github.com/apache/datafusion/pull/15985#issuecomment-2863220782 This gets a small performance boost on clickbench query 9 (~9% on my end). I am actually wondering if we can do further. I think we could store something like HashSet<(T::Na

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2079832390 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1217,34 +1217,6 @@ abstract class ParquetReadSuite extends CometTestBase {

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-05-08 Thread via GitHub
Spaarsh commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2863419930 @alamb sure. I have my uni exams going on right now. That, along with some other commitments, might take me a while for porting these functions. If that is alright, then I'll be happ

Re: [PR] Handle dicts for distinct count [datafusion]

2025-05-08 Thread via GitHub
blaginin commented on PR #15871: URL: https://github.com/apache/datafusion/pull/15871#issuecomment-2864380303 ``` group main pr -

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2080478955 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4695,34 +4698,37 @@ fn test_infer_types_from_between_predicate() { // replace params with values let

Re: [PR] refactor: remove deprecated `AvroExec` [datafusion]

2025-05-08 Thread via GitHub
comphead merged PR #15987: URL: https://github.com/apache/datafusion/pull/15987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
codecov-commenter commented on PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724#issuecomment-2864336010 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1724?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] [feature] allow pretty-printing [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
lovasoa opened a new issue, #1845: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1845 In rust, the Display trait allows specifying an "alternate" display style for every type https://github.com/apache/datafusion-sqlparser-rs/issues/1634 I suggest using this to a

Re: [PR] refactor: remove deprecated `CsvExec` [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15991: URL: https://github.com/apache/datafusion/pull/15991#issuecomment-2864266158 Thank you @miroim and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2864278552 I think this is a draft so marking it as such to try and make it clearer what PRs are waiting on review -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] refactor: remove deprecated `CsvExec` [datafusion]

2025-05-08 Thread via GitHub
alamb merged PR #15991: URL: https://github.com/apache/datafusion/pull/15991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15869: URL: https://github.com/apache/datafusion/pull/15869#issuecomment-2864264598 Thanks again everyone@! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] chore(deps-dev): bump org.apache.parquet:parquet-avro from 1.13.1 to 1.15.2 [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on PR #1717: URL: https://github.com/apache/datafusion-comet/pull/1717#issuecomment-2864261088 Spark SQL tests might have issues downloading JARs? One failed with ``` [info] - SPARK-21617: ALTER TABLE for non-compatible DataSource tables *** FAILED *** (3

Re: [I] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-08 Thread via GitHub
alamb closed issue #15868: Substrait: Handle inner map fields in schema renaming URL: https://github.com/apache/datafusion/issues/15868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-08 Thread via GitHub
alamb merged PR #15869: URL: https://github.com/apache/datafusion/pull/15869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2080426070 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -33,7 +33,7 @@ import org.apache.spark.sql.execution.datasources.v2.parquet.Parquet

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2080419849 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -28,33 +28,33 @@ public class Utils { - /** This method is called from Apache Iceberg

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove merged PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] style: simplify some strings for readability [datafusion]

2025-05-08 Thread via GitHub
hamirmahal opened a new pull request, #15999: URL: https://github.com/apache/datafusion/pull/15999 ## Which issue does this PR close? - Closes #15998 ## Rationale for this change The goal of this pull request is to improve code readability and maintainability.

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
aharpervc commented on code in PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#discussion_r2080184176 ## tests/sqlparser_common.rs: ## @@ -666,6 +666,23 @@ fn parse_select_with_table_alias() { ); } +#[test] +fn parse_consecutive_queries() { R

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-08 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2080285112 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
aharpervc commented on code in PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#discussion_r2078041514 ## tests/sqlparser_mssql.rs: ## @@ -100,48 +100,52 @@ fn parse_mssql_delimited_identifiers() { #[test] fn parse_create_procedure() { -let sql

[PR] feat(proto): udf decoding fallback [datafusion]

2025-05-08 Thread via GitHub
leoyvens opened a new pull request, #15997: URL: https://github.com/apache/datafusion/pull/15997 ## Which issue does this PR close? - Closes #15996. ## Rationale for this change We need this for more flexible logical plan decoding. ## Are these changes tested?

[PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
mbutrovich opened a new pull request, #1724: URL: https://github.com/apache/datafusion-comet/pull/1724 ## Which issue does this PR close? Closes #. ## Rationale for this change DataSourceExec (and by extension, underlying arrow-rs Parquet reader) does not

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-08 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2863265862 I think something like that is done already in the "convert to state" logic - it will dynamically decide to skip aggregating once it sees that the group vs input rows ratio is smal

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-05-08 Thread via GitHub
aharpervc commented on code in PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#discussion_r2080184176 ## tests/sqlparser_common.rs: ## @@ -666,6 +666,23 @@ fn parse_select_with_table_alias() { ); } +#[test] +fn parse_consecutive_queries() { R

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2863794206 Thanks @Spaarsh -- good luck with your examps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] chore(deps): bump testcontainers from 0.23.3 to 0.24.0 [datafusion]

2025-05-08 Thread via GitHub
dependabot[bot] opened a new pull request, #15989: URL: https://github.com/apache/datafusion/pull/15989 Bumps [testcontainers](https://github.com/testcontainers/testcontainers-rs) from 0.23.3 to 0.24.0. Release notes Sourced from https://github.com/testcontainers/testcontainers-rs/

[PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
andygrove opened a new pull request, #15994: URL: https://github.com/apache/datafusion/pull/15994 ## Which issue does this PR close? N/A ## Rationale for this change Add another Spark expression. This expression was originally contributed to Comet by @vai

Re: [PR] Update extending-operators.md [datafusion]

2025-05-08 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2863750241 > > > hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ? > > > > > >

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080123467 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1507,6 +1515,9 @@ impl LogicalPlan { (_, Some(dt)) => {

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080056995 ## datafusion/expr/src/expr.rs: ## @@ -1775,6 +1775,27 @@ impl Expr { | Expr::SimilarTo(Like { expr, pattern, .. }) => { rewri

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-08 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2863608739 BTW, I also simplified codes although not help to performance. like `NullState`, I found we actually don't need to introduce blocked approach for it(even will lead to slight

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-08 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2863584067 The current benchmark results in my 16cores local: - target_partitions = 8 (amazing 1.71x faster) ``` // main Query 7 iteration 0 took 4256.6 ms and returned 10 rows

[PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-08 Thread via GitHub
irenjj opened a new pull request, #15993: URL: https://github.com/apache/datafusion/pull/15993 ## Which issue does this PR close? - Part of #15886 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

[PR] Fix: `build_predicate_expression` method doesn't process `false` expr correctly [datafusion]

2025-05-08 Thread via GitHub
xudong963 opened a new pull request, #15995: URL: https://github.com/apache/datafusion/pull/15995 ## Which issue does this PR close? - Closes #. ## Rationale for this change `build_predicate_expression` method doesn't process `false` expr correctly, it'll

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-08 Thread via GitHub
andygrove merged PR #15947: URL: https://github.com/apache/datafusion/pull/15947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-08 Thread via GitHub
andygrove commented on PR #15947: URL: https://github.com/apache/datafusion/pull/15947#issuecomment-2863214007 I agree with @alamb that we should go ahead and merge this, and then I can update Comet to use it so we have confidence that this approach is working (I think it is). Thanks for th

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
huaxingao commented on code in PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#discussion_r2079865065 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1217,34 +1217,6 @@ abstract class ParquetReadSuite extends CometTestBase {

Re: [I] feat: bucketed scan for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
mbutrovich commented on issue #1719: URL: https://github.com/apache/datafusion-comet/issues/1719#issuecomment-2863289065 Interestingly, native_datafusion currently passes the "bucketed table" Comet test. I suspect the Spark SQL tests are doing some bucket pruning, which is where we end up

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
andygrove commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2863253288 @shehabgamin @alamb fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2863268668 Thanks @huaxingao. The implementation changes LGTM, but I would like to understand how this will be tested. -- This is an automated message from the Apache Git Service. To r

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-08 Thread via GitHub
vaibhawvipul commented on PR #15994: URL: https://github.com/apache/datafusion/pull/15994#issuecomment-2863263757 Thank you for mention! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-08 Thread via GitHub
jonahgao commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2079814968 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-08 Thread via GitHub
andygrove commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2079751198 ## datafusion/spark/src/function/math/ceil_floor.rs: ## @@ -0,0 +1,720 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-08 Thread via GitHub
andygrove merged PR #1721: URL: https://github.com/apache/datafusion-comet/pull/1721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-08 Thread via GitHub
mbutrovich commented on PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720#issuecomment-2863012148 Interestingly, we were passing the Comet version of bucketed scan before: https://github.com/apache/datafusion-comet/actions/runs/14893567863/job/41846178932?pr=1720#st

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-08 Thread via GitHub
ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2863005079 SInce partition does not appear to be a limiting factor in aggregations, I wonder if it makes sense to investigate a lower-quality pre-aggregation (i.e. let more tuples pass to the fina

Re: [PR] Re-Add CodeCov [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2862874818 Starting with manual triggering sounds like a good idea to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed [datafusion]

2025-05-08 Thread via GitHub
xudong963 closed pull request #15539: Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed URL: https://github.com/apache/datafusion/pull/15539 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed [datafusion]

2025-05-08 Thread via GitHub
xudong963 commented on PR #15539: URL: https://github.com/apache/datafusion/pull/15539#issuecomment-2862820995 Let's close the one, I'll open a new PR to solve this, because we have made many changes about how to merge stats -- This is an automated message from the Apache Git Service. To

Re: [PR] chore(deps): bump insta from 1.42.2 to 1.43.1 [datafusion]

2025-05-08 Thread via GitHub
xudong963 merged PR #15988: URL: https://github.com/apache/datafusion/pull/15988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
alamb commented on PR #15973: URL: https://github.com/apache/datafusion/pull/15973#issuecomment-2862608755 Thank you again @miroim and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] chore(deps): bump insta from 1.42.2 to 1.43.1 [datafusion]

2025-05-08 Thread via GitHub
dependabot[bot] opened a new pull request, #15988: URL: https://github.com/apache/datafusion/pull/15988 Bumps [insta](https://github.com/mitsuhiko/insta) from 1.42.2 to 1.43.1. Release notes Sourced from https://github.com/mitsuhiko/insta/releases";>insta's releases. 1.43.1

[I] Add flag to control reordering metrics for display [datafusion]

2025-05-08 Thread via GitHub
niebayes opened a new issue, #15992: URL: https://github.com/apache/datafusion/issues/15992 `DisplayableExecution::indent` always reorder metrics. However, ones might expect the order of the metrics conform to the registration order. I propose to add a flag to control the reordering b

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-08 Thread via GitHub
UBarney commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2079325033 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -

Re: [I] Implement method to apply scalar or aggregate function to Array elements [datafusion]

2025-05-08 Thread via GitHub
KR-bluejay commented on issue #15882: URL: https://github.com/apache/datafusion/issues/15882#issuecomment-2862498585 @alamb Following your comment about making array functions more general, I suggest we create a common directory for array operations to reduce code duplication:

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-08 Thread via GitHub
UBarney commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2079325033 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -

[PR] refactor: remove deprecated `CsvExec` [datafusion]

2025-05-08 Thread via GitHub
miroim opened a new pull request, #15991: URL: https://github.com/apache/datafusion/pull/15991 ## Which issue does this PR close? Part of #15950 . ## Rationale for this change The `CsvExec` structure was deprecated in DataFusion 46 and is scheduled for removal. Deve

[PR] Box field to reduce DatafusionError size [datafusion]

2025-05-08 Thread via GitHub
ctsk opened a new pull request, #15990: URL: https://github.com/apache/datafusion/pull/15990 The `field: Column` field in SchemaError::AmbiguousReference grew the size of DatafusionError from 72 bytes to 112 bytes. Putting it on the heap fixes that. ## Are there any user-facing chang

Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-05-08 Thread via GitHub
Chen-Yuan-Lai commented on PR #15946: URL: https://github.com/apache/datafusion/pull/15946#issuecomment-2862197349 @comphead Thank you for your review. You're right - my current implementation doesn't preserve backtraces. To preserve backtraces, I think there are two options: 1. **

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
miroim commented on PR #15973: URL: https://github.com/apache/datafusion/pull/15973#issuecomment-2862177085 > There are also other deprecated source execs (json, csv, arrow, avro, memory). You are welcomed to remove them too I'll take a look at them -- This is an automated message

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
berkaysynnada commented on PR #15973: URL: https://github.com/apache/datafusion/pull/15973#issuecomment-2862140172 Thank you @miroim. There are also other deprecated source execs (json, csv, arrow, avro, memory). You are welcomed to remove them too -- This is an automated message from th

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
berkaysynnada commented on code in PR #15973: URL: https://github.com/apache/datafusion/pull/15973#discussion_r2079129222 ## datafusion/datasource-parquet/src/mod.rs: ## @@ -32,511 +30,18 @@ mod row_group_filter; pub mod source; mod writer; -use std::any::Any; -use std::fmt:

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-08 Thread via GitHub
berkaysynnada merged PR #15973: URL: https://github.com/apache/datafusion/pull/15973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@