[PR] chore(deps): bump prost-derive from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] opened a new pull request, #14622: URL: https://github.com/apache/datafusion/pull/14622 Bumps [prost-derive](https://github.com/tokio-rs/prost) from 0.13.4 to 0.13.5. Release notes Sourced from https://github.com/tokio-rs/prost/releases";>prost-derive's releases.

Re: [PR] equivalence classes: use normalized mapping for projection [datafusion]

2025-02-12 Thread via GitHub
berkaysynnada commented on PR #14327: URL: https://github.com/apache/datafusion/pull/14327#issuecomment-2652990411 and sorry for the late response 😞 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Add DataFrame fill_nan/fill_null [datafusion-python]

2025-02-12 Thread via GitHub
kosiew opened a new pull request, #1019: URL: https://github.com/apache/datafusion-python/pull/1019 # Which issue does this PR close? Closes #922. # Rationale for this change DataFusion currently lacks built-in methods for handling missing values (nulls and

Re: [I] Ad-hoc or scheduled mutation based testing [datafusion]

2025-02-12 Thread via GitHub
2010YOUY01 commented on issue #14589: URL: https://github.com/apache/datafusion/issues/14589#issuecomment-2653111614 > [@alamb](https://github.com/alamb) they did, although they went out of space! If you click on the "this" hyperlink in the text of the issue, you get here https://github.co

Re: [PR] chore(deps): bump prost-build from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14623: URL: https://github.com/apache/datafusion/pull/14623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add DataFrame fill_nan/fill_null [datafusion-python]

2025-02-12 Thread via GitHub
kosiew commented on code in PR #1019: URL: https://github.com/apache/datafusion-python/pull/1019#discussion_r1952245124 ## python/tests/test_functions.py: ## @@ -1173,3 +1173,57 @@ def test_between_default(df): actual = df.collect()[0].to_pydict() assert actual == e

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1952259827 ## datafusion/expr-common/src/signature.rs: ## @@ -460,6 +521,44 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone, Eq

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1952259827 ## datafusion/expr-common/src/signature.rs: ## @@ -460,6 +521,44 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone, Eq

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1952259827 ## datafusion/expr-common/src/signature.rs: ## @@ -460,6 +521,44 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone, Eq

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1952259827 ## datafusion/expr-common/src/signature.rs: ## @@ -460,6 +521,44 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone, Eq

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952299587 ## datafusion/common/src/dfschema.rs: ## @@ -1028,20 +1028,48 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_ty

Re: [PR] chore(deps): bump prost from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] closed pull request #14621: chore(deps): bump prost from 0.13.4 to 0.13.5 URL: https://github.com/apache/datafusion/pull/14621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] chore(deps): bump prost from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] commented on PR #14621: URL: https://github.com/apache/datafusion/pull/14621#issuecomment-2653139423 Looks like prost is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] Fix ci test [datafusion]

2025-02-12 Thread via GitHub
xudong963 opened a new pull request, #14625: URL: https://github.com/apache/datafusion/pull/14625 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] Fix ci test [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14625: URL: https://github.com/apache/datafusion/pull/14625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fix ci test [datafusion]

2025-02-12 Thread via GitHub
xudong963 commented on PR #14625: URL: https://github.com/apache/datafusion/pull/14625#issuecomment-2653399521 thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-12 Thread via GitHub
berkaysynnada commented on PR #14538: URL: https://github.com/apache/datafusion/pull/14538#issuecomment-2653402466 I'll take a look ASAP, but any additional reviewers are welcomed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952422238 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet_

Re: [I] Optimize `repeat` function [datafusion]

2025-02-12 Thread via GitHub
zjregee commented on issue #14610: URL: https://github.com/apache/datafusion/issues/14610#issuecomment-2653494265 Hi, @alamb, I have a few questions and hope to get some help. Is the optimization mentioned here similar to using the following method instead? ```rust for _ in

[PR] chore(deps): bump substrait from 0.53.1 to 0.53.2 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] opened a new pull request, #14627: URL: https://github.com/apache/datafusion/pull/14627 Bumps [substrait](https://github.com/substrait-io/substrait-rs) from 0.53.1 to 0.53.2. Release notes Sourced from https://github.com/substrait-io/substrait-rs/releases";>substrai

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653488012 Yes and not only the 3 I mentioned, FileFormat, FileFormatFactory, etc. I think `File` related structure should belong into it own crate and then the implementation like parquet,

Re: [PR] chore(deps): group `prost` and `pbjson` dependabot updates [datafusion]

2025-02-12 Thread via GitHub
alamb merged PR #14626: URL: https://github.com/apache/datafusion/pull/14626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14616: URL: https://github.com/apache/datafusion/pull/14616#discussion_r1952536901 ## datafusion/catalog-listing/src/file_stream_part.rs: ## @@ -0,0 +1,214 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

[PR] Add support for `ORDER BY ALL` [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
PokIsemaine opened a new pull request, #1724: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1724 Add support for `ORDER BY ALL` ``` SELECT * FROM addresses ORDER BY ALL; ``` https://duckdb.org/docs/sql/query_syntax/orderby.html#order-by-all-examples https://

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952544758 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet_

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952543911 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet_

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653497557 `Catalog -> Schema -> Table -> FileFormat -> QueryPlanner` is the dependency based on my previous research. -- This is an automated message from the Apache Git S

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14616: URL: https://github.com/apache/datafusion/pull/14616#discussion_r1952536136 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -31,49 +31,17 @@ use crate::datasource::listing::PartitionedFile; use crate::datasource:

[I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-12 Thread via GitHub
EmilyMatt opened a new issue, #1389: URL: https://github.com/apache/datafusion-comet/issues/1389 ### Describe the bug In cases where we support a HashAggregate's aggregate functions, we will convert the partial stage HashAggregate, execute it in DF, then use native shuffle to forward

Re: [I] Discuss: Check in Cargo.lock file? [datafusion]

2025-02-12 Thread via GitHub
timsaucer commented on issue #14135: URL: https://github.com/apache/datafusion/issues/14135#issuecomment-2653606832 When I needed to add a new third party github action to the Apache list of approved actions for `datafusion-python`, it was a very easy process. If that's a blocker I can file

[PR] WIP: Update to arrow 52.0.0 [datafusion]

2025-02-12 Thread via GitHub
alamb opened a new pull request, #14628: URL: https://github.com/apache/datafusion/pull/14628 ## Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/7083 ## Rationale for this change Testing that DataFusion has no issues when upgrading to lates

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952513001 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet

Re: [I] Improve consistency and documentation on error handling in in UDFs [datafusion]

2025-02-12 Thread via GitHub
davidhewitt commented on issue #11618: URL: https://github.com/apache/datafusion/issues/11618#issuecomment-2653788835 I just got here after discovering that the `get_field` UDF returns `exec_err!` when passed an invalid argument type. (We treat `exec_err!` as 500 and `plan_err!` as 400 in P

[PR] Ok push for now until I understand how to regenerate golden files [datafusion-comet]

2025-02-12 Thread via GitHub
EmilyMatt opened a new pull request, #1390: URL: https://github.com/apache/datafusion-comet/pull/1390 ## What issue does this close? Closes #1389 . ## Rationale for this change As described in the issue, we'd like to prevent situations where despite the Partial aggregate

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-02-12 Thread via GitHub
ch-sc commented on PR #14523: URL: https://github.com/apache/datafusion/pull/14523#issuecomment-2653779520 @berkaysynnada do you have time to take another look? :) `NotEq` leads to the removal of the sort operator. I debugged into this and noticed that the `EnforceSorting` opti

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-12 Thread via GitHub
parthchandra commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1952694138 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

[PR] Update Community Events in concepts-readings-events.md [datafusion]

2025-02-12 Thread via GitHub
oznur-synnada opened a new pull request, #14629: URL: https://github.com/apache/datafusion/pull/14629 * Remove "(Upcoming)" from Community Events held in the past * Correct the date of Amsterdam Apache DataFusion Meetup (January 23, not January 25) ## Which issue does this PR close

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
logan-keede commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653644761 ![image](https://github.com/user-attachments/assets/d4e27727-d734-4d6f-86bb-f86a845b1ecb) mostly for my clarity also, I am not suggesting that we keep the file_format an

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952620730 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1952620440 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
jonahgao commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952677078 ## datafusion/common/src/dfschema.rs: ## @@ -1028,21 +1028,41 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_types

Re: [PR] WIP: Add LogicalScalar [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14617: URL: https://github.com/apache/datafusion/pull/14617#discussion_r1952808036 ## datafusion/common/src/scalar/logical/mod.rs: ## @@ -0,0 +1,400 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

[PR] feat: add table source to DML proto to eliminate need for table lookup after deserialisation [datafusion]

2025-02-12 Thread via GitHub
milenkovicm opened a new pull request, #14631: URL: https://github.com/apache/datafusion/pull/14631 ## Which issue does this PR close? this is more request for comment change, at the moment No PR at the moment, would open it after discussion - Closes #. ## Rationale f

Re: [PR] fix: type checking [datafusion-python]

2025-02-12 Thread via GitHub
chenkovsky commented on PR #993: URL: https://github.com/apache/datafusion-python/pull/993#issuecomment-2653946007 @kylebarron should I remove type annotation, and review bug fix first? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
Blizzara commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1952817778 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr:

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
Blizzara commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1952814846 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -300,6 +300,17 @@ async fn aggregate_grouping_rollup() -> Result<()> { ).await } +#[t

Re: [PR] Minor: Add docs and examples for `DataFusionErrorBuilder` [datafusion]

2025-02-12 Thread via GitHub
jonahgao commented on code in PR #14551: URL: https://github.com/apache/datafusion/pull/14551#discussion_r1952818429 ## datafusion/common/src/error.rs: ## @@ -602,6 +607,9 @@ impl DataFusionError { DiagnosticsIterator { head: self }.next() } +/// Return an it

Re: [PR] Update Community Events in concepts-readings-events.md [datafusion]

2025-02-12 Thread via GitHub
berkaysynnada merged PR #14629: URL: https://github.com/apache/datafusion/pull/14629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653971828 Ok, I am going to merge this one in and we can keep working on this in follow on PRs. @logan-keede can you update the plan on https://github.com/apache/datafusion/issues/1

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
alamb merged PR #14616: URL: https://github.com/apache/datafusion/pull/14616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Add docs and examples for `DataFusionErrorBuilder` [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14551: URL: https://github.com/apache/datafusion/pull/14551#issuecomment-2653981501 Thank you very much for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Minor: Add docs and examples for `DataFusionErrorBuilder` [datafusion]

2025-02-12 Thread via GitHub
alamb commented on code in PR #14551: URL: https://github.com/apache/datafusion/pull/14551#discussion_r1952833858 ## datafusion/common/src/error.rs: ## @@ -602,6 +607,9 @@ impl DataFusionError { DiagnosticsIterator { head: self }.next() } +/// Return an itera

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952620730 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1952625044 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1952626361 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952513001 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-12 Thread via GitHub
UBarney commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1952625400 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1711,66 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'foo%'`, w

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952620730 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet

Re: [PR] fix: Reduce timestamp issues in native_datafusion and native_icerberg_compat Parquet modes [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1952625044 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-12 Thread via GitHub
findepi commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1952641084 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1711,76 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'const_pre

Re: [PR] chore(deps): bump substrait from 0.53.1 to 0.53.2 [datafusion]

2025-02-12 Thread via GitHub
jonahgao commented on PR #14627: URL: https://github.com/apache/datafusion/pull/14627#issuecomment-2653835578 Thank you @mbrobbel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] chore(deps): bump substrait from 0.53.1 to 0.53.2 [datafusion]

2025-02-12 Thread via GitHub
jonahgao merged PR #14627: URL: https://github.com/apache/datafusion/pull/14627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] StaticInvoke class checked with elided types [datafusion-comet]

2025-02-12 Thread via GitHub
EmilyMatt opened a new issue, #1391: URL: https://github.com/apache/datafusion-comet/issues/1391 ### Describe the bug In here https://github.com/apache/datafusion-comet/blob/f099e6e40aa18441c7882e5bffd9d6dfb10c6c19/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala#L2

[PR] Minor: remove unused `AutoFinishBzEncoder` [datafusion]

2025-02-12 Thread via GitHub
jonahgao opened a new pull request, #14630: URL: https://github.com/apache/datafusion/pull/14630 ## Which issue does this PR close? N/A ## Rationale for this change Follow-up of [chore(deps): bump bzip2 from 0.5.0 to 0.5.1 #14620](https://github.com/apache/datafusion/pul

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-12 Thread via GitHub
UBarney commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1952755081 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1711,76 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'const_pre

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-12 Thread via GitHub
UBarney commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1952755081 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1711,76 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'const_pre

[PR] fix: Passthrough condition in StaticInvoke case block [datafusion-comet]

2025-02-12 Thread via GitHub
EmilyMatt opened a new pull request, #1392: URL: https://github.com/apache/datafusion-comet/pull/1392 ## Which issue does this PR close? Closes #1391 . ## Rationale for this change It is a logic error. ## What changes are included in this PR? Moved to a dire

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653893209 > also, I am not suggesting that we keep the file_format and physical_plan folders in new solution, > though I was going to suggest(only one folder for all file related) if we move

Re: [I] DDL Statement Propagation (`INSERT INTO` support) [datafusion-ballista]

2025-02-12 Thread via GitHub
milenkovicm commented on issue #1164: URL: https://github.com/apache/datafusion-ballista/issues/1164#issuecomment-2653893307 hey @alamb sorry to bother you again, I put some effort into implementing INSERT INTO, implementation is close to option 1, `Replace TableReference with actual table

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-12 Thread via GitHub
logan-keede commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2654016041 This has been an update in plan, specifically addition of `datafusion-datasource` crate - `datafusion-catalog-listing`: `ListingTable` and associated types like `Part

Re: [I] Optimize `repeat` function [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #14610: URL: https://github.com/apache/datafusion/issues/14610#issuecomment-2654019734 Hi @zjregee -- would it be possible to iterate through the input once and calculate how much space is needed and then create a StringBuilder with the appropriate capacity via [`

Re: [I] Feb 4, 2025: This week(s) in DataFusion [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #14491: URL: https://github.com/apache/datafusion/issues/14491#issuecomment-2654024105 Great writeup of the Amsterdam meetup: - https://github.com/apache/datafusion/discussions/12988#discussioncomment-12140634 Thanks @oznur-synnada -- This is an automa

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2654007044 Update here is that @logan-keede is cranking right along: - https://github.com/apache/datafusion/pull/14616 After some discussion with @jayzhan211 I think we have a good

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
Blizzara commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1952853782 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr:

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952860229 ## datafusion/common/src/dfschema.rs: ## @@ -1028,21 +1028,41 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_ty

Re: [PR] Add support for `ORDER BY ALL` [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
iffyio commented on code in PR #1724: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1724#discussion_r1952866285 ## src/ast/query.rs: ## @@ -2205,31 +2205,50 @@ pub enum JoinConstraint { #[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] #[cfg_attr(fea

Re: [I] Optimize `repeat` function [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #14610: URL: https://github.com/apache/datafusion/issues/14610#issuecomment-2654015159 Thanks @zjregee > This seems to work well when the number of repetitions is small, because it reduces the number of memory copies, but when the number of repetitions is la

Re: [PR] fix: type checking [datafusion-python]

2025-02-12 Thread via GitHub
kylebarron commented on PR #993: URL: https://github.com/apache/datafusion-python/pull/993#issuecomment-2653967414 Let's cc @timsaucer for thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-12 Thread via GitHub
Weijun-H commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2654051392 > > I wonder why tpch_mem_sf10 is slower for some queries? Might it be possible the created memtable is not created evenly because of the new round robin (that might be fixable e.g.

Re: [PR] Introducing mutation testing [datafusion]

2025-02-12 Thread via GitHub
2010YOUY01 commented on PR #14590: URL: https://github.com/apache/datafusion/pull/14590#issuecomment-2653100779 > > Thanks @edmondop I cannot see this flow in the list of PR checks, how long it takes? > > On my fork it didn't appear either, you need to merge the PR. Maybe you can try

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653103730 `FileCompressionType`, `PartitionedFile`, `FileRange` can be move to `datasource`. If A and B is tightly couple, you need to pull partial structure out to C and import C fo

Re: [PR] chore(deps): bump prost-derive from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] closed pull request #14622: chore(deps): bump prost-derive from 0.13.4 to 0.13.5 URL: https://github.com/apache/datafusion/pull/14622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore(deps): bump prost-derive from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] commented on PR #14622: URL: https://github.com/apache/datafusion/pull/14622#issuecomment-2653137159 Looks like prost-derive is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix ci test [datafusion]

2025-02-12 Thread via GitHub
xudong963 commented on PR #14625: URL: https://github.com/apache/datafusion/pull/14625#issuecomment-2653202385 The pr is fixing main ci once the ci of the pr passes I'll merge it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on PR #14572: URL: https://github.com/apache/datafusion/pull/14572#issuecomment-2653205051 The CI error is caused by: https://github.com/apache/datafusion/pull/14625 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] chore(deps): bump clap from 4.5.28 to 4.5.29 [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14619: URL: https://github.com/apache/datafusion/pull/14619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Nullable doesn't work when create memory table [datafusion]

2025-02-12 Thread via GitHub
xudong963 commented on issue #14522: URL: https://github.com/apache/datafusion/issues/14522#issuecomment-2653115249 Thanks @blaginin, I add a test for the case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] chore(deps): bump bzip2 from 0.5.0 to 0.5.1 [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14620: URL: https://github.com/apache/datafusion/pull/14620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-12 Thread via GitHub
xudong963 commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2653171072 +1 for the change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952317116 ## datafusion/sqllogictest/test_files/insert.slt: ## @@ -296,8 +296,11 @@ insert into table_without_values(field1) values(3); 1 # insert NULL values for th

Re: [PR] chore(deps): bump sqllogictest from 0.26.4 to 0.27.0 [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14598: URL: https://github.com/apache/datafusion/pull/14598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1952352719 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -60,17 +60,16 @@ STORED AS parquet LOCATION 'test_files/scratch/insert_to_external/parquet

Re: [I] Equivalence class projection does not find new equivalent classes correctly [datafusion]

2025-02-12 Thread via GitHub
berkaysynnada closed issue #14326: Equivalence class projection does not find new equivalent classes correctly URL: https://github.com/apache/datafusion/issues/14326 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] equivalence classes: use normalized mapping for projection [datafusion]

2025-02-12 Thread via GitHub
berkaysynnada merged PR #14327: URL: https://github.com/apache/datafusion/pull/14327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-12 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2653026106 Hi @PokIsemaine, we applied independently (not due to any particular reason, we simply weren't aware of these three projects applying jointly). We will announce here if we bec

[PR] Add test for nullable doesn't work when create memory table [datafusion]

2025-02-12 Thread via GitHub
xudong963 opened a new pull request, #14624: URL: https://github.com/apache/datafusion/pull/14624 ## Which issue does this PR close? - Closes #14522 ## Rationale for this change Add a test for #14522 ## What changes are included in this PR?

Re: [PR] fix: case-sensitive quoted identifiers in DELETE statements [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14584: URL: https://github.com/apache/datafusion/pull/14584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] DELETE statement fails to preserve quoted identifiers [datafusion]

2025-02-12 Thread via GitHub
xudong963 closed issue #14583: DELETE statement fails to preserve quoted identifiers URL: https://github.com/apache/datafusion/issues/14583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] chore(deps): group `prost` and `pbjson` dependabot updates [datafusion]

2025-02-12 Thread via GitHub
mbrobbel opened a new pull request, #14626: URL: https://github.com/apache/datafusion/pull/14626 ## Which issue does this PR close? None. ## Rationale for this change Groups PRs like the ones listed into one: - https://github.com/apache/datafusion/pull/14621 - https

Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653423772 > `FileCompressionType`, `PartitionedFile`, `FileRange` can be move to `datasource`. > > If A and B is tightly couple, you need to pull partial structure out to C and import C f

  1   2   3   >