[PR] POC Varchar default mapping to utf8view [datafusion]

2025-05-21 Thread via GitHub
zhuqi-lucas opened a new pull request, #16142: URL: https://github.com/apache/datafusion/pull/16142 ## Which issue does this PR close? - Closes [#15096](https://github.com/apache/datafusion/issues/15096) ## Rationale for this change ## What changes are inc

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101741744 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -130,15 +133,15 @@ where let hash = key.hash

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Rachelint commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2900067762 > πŸ€–: Benchmark completed Thanks, q4 and q15 are the target, and it seems indeed get faster! -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101735859 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101735859 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

Re: [PR] Set `TrackConsumersPool` as default in datafusion-cli [datafusion]

2025-05-21 Thread via GitHub
2010YOUY01 merged PR #16081: URL: https://github.com/apache/datafusion/pull/16081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101732218 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs: ## @@ -0,0 +1,139 @@ +// Licensed to the Apache Software Fo

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-21 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2900055370 From my understanding DelimScan is a `LogicalPlan::Aggregate` wrapped around a `LogicalPlan::TableScan`, but maybe @irenjj can provide more information ---

Re: [PR] Improve `unproject_sort_expr` to handle arbitrary expressions [datafusion]

2025-05-21 Thread via GitHub
phillipleblanc commented on PR #16127: URL: https://github.com/apache/datafusion/pull/16127#issuecomment-282065 cc @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Spark executors failing occasionally on SIGSEGV [datafusion-comet]

2025-05-21 Thread via GitHub
Kontinuation commented on issue #1714: URL: https://github.com/apache/datafusion-comet/issues/1714#issuecomment-2899728834 Iceberg 1.6.1 is known to cause segfault on Spark 3.5.4. See https://github.com/apache/iceberg/pull/11731 and https://github.com/apache/iceberg/issues/12178. Yo

Re: [PR] #5483 [datafusion]

2025-05-21 Thread via GitHub
github-actions[bot] commented on PR #15307: URL: https://github.com/apache/datafusion/pull/15307#issuecomment-2899697745 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-21 Thread via GitHub
adriangb commented on code in PR #15295: URL: https://github.com/apache/datafusion/pull/15295#discussion_r2100981813 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1178,6 +1207,31 @@ impl ListingTable { } } +/// Extension trait for FileSource to allow schema

Re: [PR] feat: Support Type widening: byte β†’ short/int/long, short β†’ int/long [datafusion-comet]

2025-05-21 Thread via GitHub
codecov-commenter commented on PR #1770: URL: https://github.com/apache/datafusion-comet/pull/1770#issuecomment-2899454881 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1770?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: default values for native_datafusion scan [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2101451928 ## native/core/src/parquet/parquet_exec.rs: ## @@ -61,12 +63,14 @@ pub(crate) fn init_datasource_exec( file_groups: Vec>, projection_vector: Opt

Re: [PR] fix: translate missing or corrupt file exceptions in NativeUtil, fall back native scans if asked to ignore [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on code in PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765#discussion_r2101431726 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -133,23 +136,51 @@ class CometExecIterator( } } - def getNextBatch(): O

Re: [I] Q23 fails when running TPC-DS SF=1 because of invalid offset buffer being exported for empty StringArray. [datafusion-comet]

2025-05-21 Thread via GitHub
Kontinuation commented on issue #1615: URL: https://github.com/apache/datafusion-comet/issues/1615#issuecomment-2899609086 Apache Arrow Java has made a new release with the fix: https://github.com/apache/arrow-java/releases/tag/v18.3.0. We can bump the version of arrow-java to close

Re: [I] Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-21 Thread via GitHub
comphead closed issue #16118: Fix typos and minor grammatical issues in Architecture docs URL: https://github.com/apache/datafusion/issues/16118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-21 Thread via GitHub
comphead merged PR #16119: URL: https://github.com/apache/datafusion/pull/16119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] Fallback to Spark in native_datafusion/native_iceberg_compat if encryption is enabled [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra opened a new issue, #1772: URL: https://github.com/apache/datafusion-comet/issues/1772 ### Describe the bug Not entirely sure if this will work but per the [docs](https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#columnar-encryption) - ``` // Act

Re: [PR] feat: Support Type widening: byte β†’ short/int/long, short β†’ int/long [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on PR #1770: URL: https://github.com/apache/datafusion-comet/pull/1770#issuecomment-2899581968 Do we need to consider unsigned integer types as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix: cast map types correctly in schema adapter [datafusion-comet]

2025-05-21 Thread via GitHub
codecov-commenter commented on PR #1771: URL: https://github.com/apache/datafusion-comet/pull/1771#issuecomment-2899508183 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1771?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#issuecomment-2899517869 Not sure why this would cause the ci failures that we see here. Maybe defer this until some more of the known issues are fixed? -- This is an automated message from the A

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2101363175 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -93,21 +93,63 @@ case class CometScanRule(session: SparkSession) extends Rule[S

[PR] fix: cast map types correctly in schema adapter [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra opened a new pull request, #1771: URL: https://github.com/apache/datafusion-comet/pull/1771 ## Which issue does this PR close? #1754 ## Rationale for this change In schema_adapter Map types are cast using arrow's cast which assumes that all nested f

Re: [PR] fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove commented on PR #1764: URL: https://github.com/apache/datafusion-comet/pull/1764#issuecomment-2899455164 Moving this to draft because we may want to merge https://github.com/apache/datafusion-comet/pull/1771 instead -- This is an automated message from the Apache Git Service. T

Re: [PR] Move prepare/parameter handling tests into `params.rs` [datafusion]

2025-05-21 Thread via GitHub
liamzwbao commented on PR #16141: URL: https://github.com/apache/datafusion/pull/16141#issuecomment-2899460526 Resolved! Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-21 Thread via GitHub
qstommyshu commented on PR #16019: URL: https://github.com/apache/datafusion/pull/16019#issuecomment-2899443238 > Thank you @qstommyshu for all the help getting this PR to a good state You’re welcome! I'm happy to help! -- This is an automated message from the Apache Git Service. To

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-21 Thread via GitHub
patrickcsullivan commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2101301182 ## datafusion/core/src/lib.rs: ## @@ -311,9 +311,9 @@ //! ``` //! //! A [`TableProvider`] provides information for planning and -//! an [`ExecutionPlan

Re: [PR] fix: cast map types correctly in schema adapter [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on PR #1771: URL: https://github.com/apache/datafusion-comet/pull/1771#issuecomment-2899428414 Marking as draft. Spark diffs may need to be updated as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-21 Thread via GitHub
patrickcsullivan commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2101295764 ## datafusion/core/src/lib.rs: ## @@ -488,16 +488,16 @@ //! DataFusion automatically runs each plan with multiple CPU cores using //! a [Tokio] [`Runtim

Re: [I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-05-21 Thread via GitHub
adriangb commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2898896876 As discussed a bit in https://github.com/apache/datafusion/pull/16086#discussion_r2100826502 there is a fundamental problem that all of the predicates are planned at the table

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2898865038 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-2898907668 @Omega359 I wonder if you might have time to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-21 Thread via GitHub
alamb commented on code in PR #16019: URL: https://github.com/apache/datafusion/pull/16019#discussion_r2100968491 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4650,7 +4650,7 @@ fn test_prepare_statement_infer_types_from_join() { assert_snapshot!( plan,

Re: [PR] Re-Add CodeCov [datafusion]

2025-05-21 Thread via GitHub
blaginin commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2899311669 Thank you for your feedback! I think we can do it this way: - We make sure it doesn't affect test CI speed (I'll do that in this PR) - We merge this PR - We test for a few w

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2898960798 πŸ€–: Benchmark completed Details ``` Comparing HEAD and improve-primitive-group-values Benchmark clickbench_extended.json ---

[PR] feat: Support Type widening: byte β†’ short/int/long, short β†’ int/long [datafusion-comet]

2025-05-21 Thread via GitHub
huaxingao opened a new pull request, #1770: URL: https://github.com/apache/datafusion-comet/pull/1770 ## Which issue does this PR close? Closes #. ## Rationale for this change Support type widening in Spark 4.0. This PR updates the code to pass the following test

Re: [PR] chore: Upgrade rand crate and some other minor crates [datafusion]

2025-05-21 Thread via GitHub
comphead merged PR #16062: URL: https://github.com/apache/datafusion/pull/16062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Move prepare/parameter handling tests into `params.rs` [datafusion]

2025-05-21 Thread via GitHub
alamb commented on code in PR #16141: URL: https://github.com/apache/datafusion/pull/16141#discussion_r2101199745 ## datafusion/expr/src/logical_plan/statement.rs: ## @@ -110,7 +110,7 @@ impl Statement { Statement::Prepare(Prepare {

Re: [PR] chore(deps): bump syn from 2.0.100 to 2.0.101 [datafusion]

2025-05-21 Thread via GitHub
dependabot[bot] commented on PR #16128: URL: https://github.com/apache/datafusion/pull/16128#issuecomment-2899294917 Looks like syn is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] chore(deps): bump syn from 2.0.100 to 2.0.101 [datafusion]

2025-05-21 Thread via GitHub
dependabot[bot] closed pull request #16128: chore(deps): bump syn from 2.0.100 to 2.0.101 URL: https://github.com/apache/datafusion/pull/16128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] chore(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion]

2025-05-21 Thread via GitHub
dependabot[bot] commented on PR #16129: URL: https://github.com/apache/datafusion/pull/16129#issuecomment-2899294805 Looks like object_store is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] chore(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion]

2025-05-21 Thread via GitHub
dependabot[bot] closed pull request #16129: chore(deps): bump object_store from 0.12.0 to 0.12.1 URL: https://github.com/apache/datafusion/pull/16129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Include datatypes in logicalplan for inferred statements [datafusion]

2025-05-21 Thread via GitHub
alamb closed issue #16018: Include datatypes in logicalplan for inferred statements URL: https://github.com/apache/datafusion/issues/16018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-21 Thread via GitHub
alamb merged PR #16019: URL: https://github.com/apache/datafusion/pull/16019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Stop running Comet tests with JDK 8 on pull requests [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove closed pull request #1769: chore: Stop running Comet tests with JDK 8 on pull requests URL: https://github.com/apache/datafusion-comet/pull/1769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-21 Thread via GitHub
alamb commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2899282132 > Thanks [@duongcongtoai](https://github.com/duongcongtoai) ! This is a very good idea! I also think we can start with simple unnest, we may need to introduce some DuckDB structures

Re: [PR] fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs [datafusion-comet]

2025-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1764: URL: https://github.com/apache/datafusion-comet/pull/1764#discussion_r2101105764 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -153,7 +153,8 @@ class CometNativeReaderSuite extends CometTestBase

Re: [PR] fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove commented on PR #1764: URL: https://github.com/apache/datafusion-comet/pull/1764#issuecomment-2899221853 One test keeps failing without any actual test failures. This is happening on other PRs as well. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove commented on code in PR #1764: URL: https://github.com/apache/datafusion-comet/pull/1764#discussion_r2101157036 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -153,7 +153,8 @@ class CometNativeReaderSuite extends CometTestBase with A

Re: [PR] Move prepare/parameter handling tests into `params.rs` [datafusion]

2025-05-21 Thread via GitHub
liamzwbao commented on PR #16141: URL: https://github.com/apache/datafusion/pull/16141#issuecomment-2899217415 @alamb PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[PR] Move prepare/parameter handling tests into `params.rs` [datafusion]

2025-05-21 Thread via GitHub
liamzwbao opened a new pull request, #16141: URL: https://github.com/apache/datafusion/pull/16141 ## Which issue does this PR close? - Closes #16056. ## Rationale for this change ## What changes are included in this PR? Remove the redundant spac

Re: [PR] Mysql: Add `SRID` column option [datafusion-sqlparser-rs]

2025-05-21 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1852: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1852#discussion_r2101148965 ## src/parser/mod.rs: ## @@ -16571,6 +16575,23 @@ mod tests { } } +#[test] +fn test_mysql_srid_create_table() { +

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-21 Thread via GitHub
blaginin commented on code in PR #15928: URL: https://github.com/apache/datafusion/pull/15928#discussion_r2101076229 ## datafusion/functions/src/regex/regexpinstr.rs: ## @@ -0,0 +1,979 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-21 Thread via GitHub
logan-keede commented on PR #16071: URL: https://github.com/apache/datafusion/pull/16071#issuecomment-2899131520 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] chore: Stop running Comet tests with JDK 8 on pull requests [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove opened a new pull request, #1769: URL: https://github.com/apache/datafusion-comet/pull/1769 ## Which issue does this PR close? N/A ## Rationale for this change We have many PRs that we cannot merge due to workflows failing due to CI runs getting

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16019: URL: https://github.com/apache/datafusion/pull/16019#issuecomment-2898938075 Thank you @qstommyshu for all the help getting this PR to a good state -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] job data cleanup does not work if `pull-staged` strategy selected [datafusion-ballista]

2025-05-21 Thread via GitHub
milenkovicm commented on issue #1219: URL: https://github.com/apache/datafusion-ballista/issues/1219#issuecomment-2899124192 with `pull-staged` strategy, executor does not expose grpc service, thus scheduler can not connect to executor to execute removal of data. -- This is an automated

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100966267 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -222,3 +184,61 @@ where self.map.shrink_to(count, |_| 0)

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-21 Thread via GitHub
Omega359 commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-2899035444 Of course @alamb, not sure how I missed this one. It may be a day or two though -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] `ParquetEncryptionITCase` fails with `native_iceberg_compat` [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove closed issue #1488: `ParquetEncryptionITCase` fails with `native_iceberg_compat` URL: https://github.com/apache/datafusion-comet/issues/1488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] feat: job id is incremental [datafusion-ballista]

2025-05-21 Thread via GitHub
milenkovicm opened a new pull request, #1267: URL: https://github.com/apache/datafusion-ballista/pull/1267 # Which issue does this PR close? Closes #. # Rationale for this change previously `job id` is generated randomly, without any ordering guarantees, which make is r

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

Re: [PR] chore: Use materialized data for filter pushdown tests [datafusion]

2025-05-21 Thread via GitHub
comphead merged PR #16123: URL: https://github.com/apache/datafusion/pull/16123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Improve the DML / DDL Documentation [datafusion]

2025-05-21 Thread via GitHub
alamb merged PR #16115: URL: https://github.com/apache/datafusion/pull/16115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-21 Thread via GitHub
alamb commented on code in PR #15295: URL: https://github.com/apache/datafusion/pull/15295#discussion_r2100961322 ## datafusion/datasource/src/nested_schema_adapter.rs: ## @@ -0,0 +1,943 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-21 Thread via GitHub
alamb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2100929304 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2898922232 > This PR is large and cumbersome to review. > I propose to close it and re-implement as: The break up as you suggest sounds reasonable to me @mbutrovich and @adrian

Re: [PR] chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet encryption tests [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove merged PR #1763: URL: https://github.com/apache/datafusion-comet/pull/1763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-21 Thread via GitHub
alamb merged PR #16087: URL: https://github.com/apache/datafusion/pull/16087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Use materialized data for filter pushdown tests [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16123: URL: https://github.com/apache/datafusion/pull/16123#issuecomment-2898909729 > So if I just regenerate the parquet using new rand and substitute values I cannot guarantee that predicate is chosen carefully I see -- that makes sense. Let's go with the gene

Re: [I] commit 304488d3... (2025-02-05) broke JOIN ... USING("UPPERCASE_FIELD_NAME") [datafusion]

2025-05-21 Thread via GitHub
alamb commented on issue #16120: URL: https://github.com/apache/datafusion/issues/16120#issuecomment-2898890449 Thank you for the report @brunal and for taking this @jfahne -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16087: URL: https://github.com/apache/datafusion/pull/16087#issuecomment-2898904176 This is better than what is on main and we can do always optimize further. Thanks a log @tlm365 @findepi and @Dandandan -- always up and to the right -- This is an automated m

Re: [PR] chore(deps): Update sqlparser to `0.54.0` [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #14255: URL: https://github.com/apache/datafusion/pull/14255#issuecomment-2898891364 This appears to have caused a regression: - https://github.com/apache/datafusion/issues/16120 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] Rust API - "contains" function expression wrongly declared, not usable [datafusion]

2025-05-21 Thread via GitHub
alamb closed issue #15866: Rust API - "contains" function expression wrongly declared, not usable URL: https://github.com/apache/datafusion/issues/15866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Fix `contains` function expression [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16046: URL: https://github.com/apache/datafusion/pull/16046#issuecomment-2898886124 Thanks @liamzwbao @jonathanc-n and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Fix `contains` function expression [datafusion]

2025-05-21 Thread via GitHub
alamb merged PR #16046: URL: https://github.com/apache/datafusion/pull/16046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve the DML / DDL Documentation [datafusion]

2025-05-21 Thread via GitHub
alamb commented on PR #16115: URL: https://github.com/apache/datafusion/pull/16115#issuecomment-2898881322 Thanks again @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-21 Thread via GitHub
etseidl commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2100927019 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-05-21 Thread via GitHub
alamb commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r2100915637 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-21 Thread via GitHub
adriangb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2100914890 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] chore: Add test to confirm correctness of queries with string predicate filters [datafusion-comet]

2025-05-21 Thread via GitHub
codecov-commenter commented on PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768#issuecomment-2898853701 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1768?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-21 Thread via GitHub
alamb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2100907909 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-21 Thread via GitHub
adriangb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2100900247 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove commented on PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#issuecomment-2898701486 @parthchandra which Spark SQL tests does this PR help with? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-21 Thread via GitHub
etseidl commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2100826502 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs [datafusion-comet]

2025-05-21 Thread via GitHub
codecov-commenter commented on PR #1764: URL: https://github.com/apache/datafusion-comet/pull/1764#issuecomment-2898695884 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1764?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Substrait doesn't support alias in projection [datafusion]

2025-05-21 Thread via GitHub
Blizzara commented on issue #6489: URL: https://github.com/apache/datafusion/issues/6489#issuecomment-2898710590 Yea @waynexia I think this one should be fixed! The included test case almost works, the plan isn't exactly the same after the roundtrip, but effectively still the same:

[I] Substrait: Reading a plan with aggregation with two identical grouping exprs fails to produce correct output columns [datafusion]

2025-05-21 Thread via GitHub
Blizzara opened a new issue, #16140: URL: https://github.com/apache/datafusion/issues/16140 ### Describe the bug A Substrait plan with an aggregation that has duplicate entries doesn't provide output columns for all of the duplicates. This causes issues downstream, since expected col

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-21 Thread via GitHub
adriangb commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2898676926 @xudong963 any chance you can review this since you've already approved the same code (with less tests!) in the original PR? -- This is an automated message from the Apache Git Se

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-21 Thread via GitHub
parthchandra commented on code in PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2100776128 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1233,7 +1233,9 @@ abstract class ParquetReadSuite extends CometTestBase {

[PR] chore: Add test to confirm correctness of queries with string predicate filters [datafusion-comet]

2025-05-21 Thread via GitHub
andygrove opened a new pull request, #1768: URL: https://github.com/apache/datafusion-comet/pull/1768 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1767 ## Rationale for this change Confirm that the Spark SQL

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100768086 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs: ## @@ -0,0 +1,139 @@ +// Licensed to the Apache Software Fo

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100764581 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -130,15 +133,15 @@ where let hash = key.hash

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100764581 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -130,15 +133,15 @@ where let hash = key.hash

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100764581 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -130,15 +133,15 @@ where let hash = key.hash

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-21 Thread via GitHub
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100719962 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f3

  1   2   >