Re: [I] [EPIC] Redesign DataFusion main page [datafusion]

2025-02-01 Thread via GitHub
alamb commented on issue #14389: URL: https://github.com/apache/datafusion/issues/14389#issuecomment-2628908272 > I also was very confused to find the [Rust Version Compatibility Policy](https://github.com/apache/datafusion#rust-version-compatibility-policy) in the README.md As long

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628913239 > > So what is the actual regressions to be resolved? I don't remember signature coericible is consistent with the docs before so it is not a regression to be fixed now. If suppor

[PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-01 Thread via GitHub
alamb opened a new pull request, #14397: URL: https://github.com/apache/datafusion/pull/14397 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/14008 ## Rationale for this change Need to release ## What changes are included in

Re: [I] bug: array_position not working as expected [datafusion]

2025-02-01 Thread via GitHub
bubbajoe closed issue #6694: bug: array_position not working as expected URL: https://github.com/apache/datafusion/issues/6694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] test: add regression test for unnesting dictionary encoded columns [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14395: URL: https://github.com/apache/datafusion/pull/14395#discussion_r1938255523 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -3325,6 +3326,74 @@ async fn unnest_columns() -> Result<()> { Ok(()) } +#[tokio::test] +async fn unnest_dict

[I] Inconsistent and incorrect `struct` field coercion [datafusion]

2025-02-01 Thread via GitHub
alamb opened a new issue, #14396: URL: https://github.com/apache/datafusion/issues/14396 ### Describe the bug When coercing structs with different types DataFusion is inconsistent in its behavior. Sometimes it errors and in other times it is inconsistent ### To Reproduc

Re: [PR] Do not rename struct fields when coercing types in `CASE` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14384: URL: https://github.com/apache/datafusion/pull/14384#discussion_r1938257756 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -961,23 +961,31 @@ fn struct_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 I have created a PR for a version increase and - https://github.com/apache/datafusion/pull/14397 Once that is approved / merged I'll create a release-45 branch to make RC

Re: [I] `Case` coercion of Structs loses field names [datafusion]

2025-02-01 Thread via GitHub
alamb closed issue #14383: `Case` coercion of Structs loses field names URL: https://github.com/apache/datafusion/issues/14383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: metadata columns [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2628924834 Update here is we are very close to cutting the 45 release branch. See - More details on https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 Once we do t

[PR] Remove dependency on datafusion_catalog from datafusion-cli [datafusion]

2025-02-01 Thread via GitHub
alamb opened a new pull request, #14398: URL: https://github.com/apache/datafusion/pull/14398 ## Which issue does this PR close? I noticed this unnecessary dependency while working on - https://github.com/apache/datafusion/issues/14008 ## Rationale for this change

Re: [PR] Remove dependency on datafusion_catalog from datafusion-cli [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14398: URL: https://github.com/apache/datafusion/pull/14398#discussion_r1938262671 ## datafusion-cli/Cargo.toml: ## @@ -48,7 +48,6 @@ datafusion = { path = "../datafusion/core", version = "44.0.0", features = [ "unicode_expressions", "co

Re: [PR] Do not rename struct fields when coercing types in `CASE` [datafusion]

2025-02-01 Thread via GitHub
alamb merged PR #14384: URL: https://github.com/apache/datafusion/pull/14384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2628924982 Update here is we are very close to cutting the 45 release branch. See - More details on https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 Once we do t

Re: [PR] feat: Speed up `struct` and `named_struct` using `invoke_with_args` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14276: URL: https://github.com/apache/datafusion/pull/14276#issuecomment-2628925076 Update here is we are very close to cutting the 45 release branch. See - More details on https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 I plan to me

Re: [PR] Add `TableProvider::insert_into` into FFI Bindings [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14391: URL: https://github.com/apache/datafusion/pull/14391#issuecomment-2628925389 I merged up from main to make sure CI is running on the latest commit. Once tests pass I plan to merge -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Add `TableProvider::insert_into` into FFI Bindings [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14391: URL: https://github.com/apache/datafusion/pull/14391#issuecomment-2628930486 I have to go do something else now -- @timsaucer any chance you can figure out the CI failure? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] feat: metadata columns [datafusion]

2025-02-01 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2628928295 > Update here is we are very close to cutting the 45 release branch. See > > * More details on [Release DataFusion `45.0.0` #14008 (comment)](https://github.com/apache/dataf

Re: [PR] Add `TableProvider::insert_into` into FFI Bindings [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14391: URL: https://github.com/apache/datafusion/pull/14391#issuecomment-2628929574 The CI failure https://github.com/apache/datafusion/actions/runs/13088375990/job/3651854?pr=14391 Seems to be due to a logical conflict with https://github.com/apache/dat

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628936121 @shehabgamin can we try to use coerce types for those functions, the change is relatively trivial and less breakage compare to the current one -- This is an automated message fr

Re: [PR] Do not parse ASOF and MATCH_CONDITION as table factor aliases [datafusion-sqlparser-rs]

2025-02-01 Thread via GitHub
iffyio merged PR #1698: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628939208 > @shehabgamin can we try to use Singatur::User-defined coerce types for those functions, the change is relatively trivial and the impact is smaller than the current one @f

Re: [PR] doc: update ballista client front page [datafusion-ballista]

2025-02-01 Thread via GitHub
milenkovicm closed pull request #1171: doc: update ballista client front page URL: https://github.com/apache/datafusion-ballista/pull/1171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: metadata columns [datafusion]

2025-02-01 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2629205242 @alamb @adriangb please review it again. I copied some tests from @adriangb and some tests from spark, supported join, project, subqueryalias and dataframe api. -- This is an

Re: [PR] feat: metadata columns [datafusion]

2025-02-01 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2629206595 > Great let's fix those unit tests on your branch then we can look at the pros/cons of the approaches we've come up with. I don't know whether you agree, when make a design,

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629245515 > If the approach looks good to you too, we can go ahead! Sounds good! This works for now since we'll be working towards a better design in the future. I may add a new `

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629247219 > > If the approach looks good to you too, we can go ahead! > > Sounds good! This works for now since we'll be working towards a better design in the future. > > I ma

Re: [I] Limits are not applied correctly [datafusion]

2025-02-01 Thread via GitHub
adriangb commented on issue #14406: URL: https://github.com/apache/datafusion/issues/14406#issuecomment-2629268321 Here's an assortment of test plans under slight variations that both contain the bug and not. ``` ataFusion CLI v44.0.0 > explain analyze with selection as (

Re: [I] Limits are not applied correctly [datafusion]

2025-02-01 Thread via GitHub
adriangb commented on issue #14406: URL: https://github.com/apache/datafusion/issues/14406#issuecomment-2629269661 Here's another angle of attack. If I disable the optimizer I end up with 12 partitions and... 12 output rows. ``` DataFusion CLI v44.0.0 > with selection as (

Re: [I] [DISCUSSION] Lowering the barrier to new users (Lessons from-799 CMU Optimizer Class) [datafusion]

2025-02-01 Thread via GitHub
ozankabak commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2629269833 I will think about some optimizer-focused projects and circle back next week. @alamb, IMHO the tickets you mention are partly optimizer related, but probably more so about the

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-01 Thread via GitHub
shehabgamin commented on code in PR #14392: URL: https://github.com/apache/datafusion/pull/14392#discussion_r1938420167 ## datafusion/spark/src/agg_funcs/avg.rs: ## @@ -0,0 +1,344 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Do not sort rows in `FirstValueAccumulator` [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on code in PR #14402: URL: https://github.com/apache/datafusion/pull/14402#discussion_r1938420622 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -559,29 +550,18 @@ impl LastValueAccumulator { let sort_columns = ordering_values

Re: [PR] Do not sort rows in `FirstValueAccumulator` [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on code in PR #14402: URL: https://github.com/apache/datafusion/pull/14402#discussion_r1938420622 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -559,29 +550,18 @@ impl LastValueAccumulator { let sort_columns = ordering_values

Re: [PR] bug: Fix NULL handling in array_slice, introduce `NullHandling` enum to `Signature` [datafusion]

2025-02-01 Thread via GitHub
jkosh44 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1938280439 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -186,6 +187,15 @@ impl PhysicalExpr for ScalarFunctionExpr { .map(|e| e.evaluate(batch))

Re: [PR] bug: Fix NULL handling in array_slice, introduce `NullHandling` enum to `Signature` [datafusion]

2025-02-01 Thread via GitHub
jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2629048825 @jayzhan211 I just pushed another commit that will correctly return null for batch inputs. This also updates the behavior of `array_pop_front`, `array_pop_back`, and array slicing (i

Re: [PR] build(deps): bump tokio from 1.41.1 to 1.42.0 [datafusion-python]

2025-02-01 Thread via GitHub
timsaucer closed pull request #968: build(deps): bump tokio from 1.41.1 to 1.42.0 URL: https://github.com/apache/datafusion-python/pull/968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] build(deps): bump tokio from 1.41.1 to 1.42.0 [datafusion-python]

2025-02-01 Thread via GitHub
dependabot[bot] commented on PR #968: URL: https://github.com/apache/datafusion-python/pull/968#issuecomment-2628955424 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor ver

Re: [PR] build(deps): bump tokio from 1.41.1 to 1.42.0 [datafusion-python]

2025-02-01 Thread via GitHub
timsaucer commented on PR #968: URL: https://github.com/apache/datafusion-python/pull/968#issuecomment-2628955407 Close due to no longer applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Do not sort rows in `FirstValueAccumulator` [datafusion]

2025-02-01 Thread via GitHub
blaginin commented on PR #14402: URL: https://github.com/apache/datafusion/pull/14402#issuecomment-2628995591 ``` groupmain new_lexcmp first_last_ignore_nulls 2.02 4.3±0

Re: [PR] Add nulls checks to generated pruning predicates [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14297: URL: https://github.com/apache/datafusion/pull/14297#discussion_r1938285557 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -287,11 +285,6 @@ pub trait PruningStatistics { /// predicate can never possibly be true). The container ca

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14397: URL: https://github.com/apache/datafusion/pull/14397#issuecomment-2628967728 Thank you for the review @andygrove -- I will merge this in shortly after one more round of changelog updates and get a release branch started -- This is an automated message from t

[I] Return `null` for `null` maps in `map_keys` and `map_values` [datafusion]

2025-02-01 Thread via GitHub
cht42 opened a new issue, #14400: URL: https://github.com/apache/datafusion/issues/14400 ### Describe the bug `map_keys` and `map_values` should return `null` instead of empty array for `null` inputs ### To Reproduce _No response_ ### Expected behavior _No

Re: [PR] feat: metadata columns [datafusion]

2025-02-01 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2629001039 Great let's fix those unit tests on your branch then we can look at the pros/cons of the approaches we've come up with. -- This is an automated message from the Apache Git Service

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-01 Thread via GitHub
alamb merged PR #14397: URL: https://github.com/apache/datafusion/pull/14397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add regexp_extract func [datafusion]

2025-02-01 Thread via GitHub
rluvaton commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1938303236 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,322 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629031385 I have created a branch: - https://github.com/apache/datafusion/tree/branch-45 Let's start merging stuff to main again. Before I make the RC I want to verify again

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-01 Thread via GitHub
shehabgamin commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629161924 > cc @Blizzara and @shehabgamin as you have mentioned interest in helping here > Will catch up on this thread tonight. So much happening so fast, exciting! -- Th

[PR] Improve Unparser (scalar_to_sql) to respect dialect timestamp type overrides [datafusion]

2025-02-01 Thread via GitHub
sgrebnov opened a new pull request, #14407: URL: https://github.com/apache/datafusion/pull/14407 ## Which issue does this PR close? Unparser dialects specify timestamp data type overrides via `timestamp_cast_dtype` which is NOT currently used when converting timestamp literals result

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb merged PR #13664: URL: https://github.com/apache/datafusion/pull/13664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2629058844 🚀 BTW @mkarbo and @eliaperantoni it would be great to write a blog post about this feature / solicit some additional help -- This is an automated message from the Apache Git

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2629059247 @eliaperantoni / @mkarbo -- any chance you can file a `"EPIC" ticket listing the various subtasks you highlighted on https://github.com/apache/datafusion/pull/13664#issuecomment-262019

Re: [I] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb closed issue #13662: Add related source code locations to errors URL: https://github.com/apache/datafusion/issues/13662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] bug: Fix NULL handling in array_slice, introduce `NullHandling` enum to `Signature` [datafusion]

2025-02-01 Thread via GitHub
jayzhan211 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1938285811 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -186,6 +187,15 @@ impl PhysicalExpr for ScalarFunctionExpr { .map(|e| e.evaluate(batch))

[I] Limits are not applied correctly [datafusion]

2025-02-01 Thread via GitHub
adriangb opened a new issue, #14406: URL: https://github.com/apache/datafusion/issues/14406 ### Describe the bug Outer limits seem to be able to impact the inner limits of a subquery ### To Reproduce Run the following python script to create test data: ```python

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629106680 Thanks @Omega359 The delta-rs upgrade seems to have gone pretty smootly: https://github.com/delta-io/delta-rs/pull/3175 In terms of releasing DataFusion 45 I think i

Re: [I] Ensure `to_timestamp` behaves consistently with PostgreSQL [datafusion]

2025-02-01 Thread via GitHub
logan-keede commented on issue #13351: URL: https://github.com/apache/datafusion/issues/13351#issuecomment-2629106897 ![Image](https://github.com/user-attachments/assets/6e35ec72-a747-4ace-8c7a-44407bbc8ab7) Do we want go with this? -- This is an automated message from the Apache Git S

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub
kevinjqliu commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629109322 hey @alamb quick question about the release process, are the release for datafusion and datafusion-python in locksteps? I see the next release for datafusion is 45.0.0 meanwh

Re: [I] Limits are not applied correctly [datafusion]

2025-02-01 Thread via GitHub
adriangb commented on issue #14406: URL: https://github.com/apache/datafusion/issues/14406#issuecomment-2629110100 @alamb this is not new (should not block release) but seems like a pretty major bug -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Fix `null` input in `map_keys/values` [datafusion]

2025-02-01 Thread via GitHub
cht42 commented on code in PR #14401: URL: https://github.com/apache/datafusion/pull/14401#discussion_r1938281147 ## datafusion/functions-nested/src/map_keys.rs: ## @@ -126,9 +126,36 @@ fn map_keys_inner(args: &[ArrayRef]) -> Result { }; Ok(Arc::new(ListArray::new( -

Re: [PR] Fix `null` input in `map_keys/values` [datafusion]

2025-02-01 Thread via GitHub
cht42 commented on code in PR #14401: URL: https://github.com/apache/datafusion/pull/14401#discussion_r1938281147 ## datafusion/functions-nested/src/map_keys.rs: ## @@ -126,9 +126,36 @@ fn map_keys_inner(args: &[ArrayRef]) -> Result { }; Ok(Arc::new(ListArray::new( -

Re: [PR] perf: 2X faster no grouping median() function [datafusion]

2025-02-01 Thread via GitHub
Rachelint commented on code in PR #14399: URL: https://github.com/apache/datafusion/pull/14399#discussion_r1938281192 ## datafusion/functions-aggregate/src/median.rs: ## @@ -242,14 +242,26 @@ impl Debug for MedianAccumulator { impl Accumulator for MedianAccumulator { fn

[PR] Fix `null` input in `map_keys/values` [datafusion]

2025-02-01 Thread via GitHub
cht42 opened a new pull request, #14401: URL: https://github.com/apache/datafusion/pull/14401 ## Which issue does this PR close? Closes #14400. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] perf: 2X faster no grouping median() function [datafusion]

2025-02-01 Thread via GitHub
2010YOUY01 commented on code in PR #14399: URL: https://github.com/apache/datafusion/pull/14399#discussion_r1938282667 ## datafusion/functions-aggregate/src/median.rs: ## @@ -242,14 +242,26 @@ impl Debug for MedianAccumulator { impl Accumulator for MedianAccumulator { fn

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-01 Thread via GitHub
getChan commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938297005 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,12 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 00:

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-01 Thread via GitHub
getChan commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938297005 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,12 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 00:

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-01 Thread via GitHub
andygrove commented on code in PR #14392: URL: https://github.com/apache/datafusion/pull/14392#discussion_r1938313126 ## datafusion/spark/src/agg_funcs/avg.rs: ## @@ -0,0 +1,344 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-01 Thread via GitHub
getChan commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938313572 ## datafusion/common/src/error.rs: ## @@ -131,6 +131,10 @@ pub enum DataFusionError { /// Errors from either mapping LogicalPlans to/from Substrait plans

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-01 Thread via GitHub
getChan commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938315198 ## datafusion/common/src/error.rs: ## @@ -131,6 +131,10 @@ pub enum DataFusionError { /// Errors from either mapping LogicalPlans to/from Substrait plans

[PR] Do not sort rows in `FirstValueAccumulator` [datafusion]

2025-02-01 Thread via GitHub
blaginin opened a new pull request, #14402: URL: https://github.com/apache/datafusion/pull/14402 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/14215 ## Rationale for this change Right now, merging / updating batches in `first_value`

Re: [PR] Add `TableProvider::insert_into` into FFI Bindings [datafusion]

2025-02-01 Thread via GitHub
timsaucer commented on PR #14391: URL: https://github.com/apache/datafusion/pull/14391#issuecomment-2628942411 > I have to go do something else now -- @timsaucer any chance you can figure out the CI failure? Yes, I can do this right now -- This is an automated message from the Apac

Re: [PR] feat: remove DataFusion pyarrow feat [datafusion-python]

2025-02-01 Thread via GitHub
timsaucer commented on code in PR #1000: URL: https://github.com/apache/datafusion-python/pull/1000#discussion_r1938270795 ## src/config.rs: ## @@ -40,7 +42,7 @@ impl PyConfig { #[staticmethod] pub fn from_env() -> PyResult { Review Comment: Excellent suggestion!

Re: [PR] docs: Clarify join behavior in `DataFrame::join` [datafusion]

2025-02-01 Thread via GitHub
alamb merged PR #14393: URL: https://github.com/apache/datafusion/pull/14393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2628949060 @wiedld tested with InfluxDB and this upgrade works for us -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Fix join type coercion when joining 2 relations with the same name via `DataFrame` API [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14387: URL: https://github.com/apache/datafusion/pull/14387#discussion_r1938309570 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -1121,6 +1121,39 @@ async fn join() -> Result<()> { Ok(()) } +#[tokio::test] +async fn join_coercion_unnname

Re: [I] Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN [datafusion]

2025-02-01 Thread via GitHub
alamb closed issue #13510: Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN URL: https://github.com/apache/datafusion/issues/13510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2629032156 I have made a release branch for 45 so let's get this one merged now - https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 I merged up from main to resolv

Re: [PR] feat: Speed up `struct` and `named_struct` using `invoke_with_args` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14276: URL: https://github.com/apache/datafusion/pull/14276#issuecomment-2629032363 We have a release branch for 45 now -- see https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 Let's keep the main moving! -- This is an automated messa

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629035590 Given I feel like we have not yet reached consensus on this issue I made a release branch for 45 without this change - https://github.com/apache/datafusion/issues/14008#issuecomment-

Re: [PR] feat: Speed up `struct` and `named_struct` using `invoke_with_args` [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14276: URL: https://github.com/apache/datafusion/pull/14276#issuecomment-2629032401 Thanks again @pepijnve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Avoid deriving Fields for each invocation of `struct` and `named_struct` [datafusion]

2025-02-01 Thread via GitHub
alamb closed issue #14275: Avoid deriving Fields for each invocation of `struct` and `named_struct` URL: https://github.com/apache/datafusion/issues/14275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: Speed up `struct` and `named_struct` using `invoke_with_args` [datafusion]

2025-02-01 Thread via GitHub
alamb merged PR #14276: URL: https://github.com/apache/datafusion/pull/14276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix join type coercion when joining 2 relations with the same name via `DataFrame` API [datafusion]

2025-02-01 Thread via GitHub
alamb merged PR #14387: URL: https://github.com/apache/datafusion/pull/14387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix join type coercion when joining 2 relations with the same name via `DataFrame` API [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14387: URL: https://github.com/apache/datafusion/pull/14387#discussion_r1938309987 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -1121,6 +1121,39 @@ async fn join() -> Result<()> { Ok(()) } +#[tokio::test] +async fn join_coercion_unnname

[PR] Minor: fix typo in test name [datafusion]

2025-02-01 Thread via GitHub
alamb opened a new pull request, #14403: URL: https://github.com/apache/datafusion/pull/14403 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/14387 ## Rationale for this change @jonahgao found a typo 🤦 https://github.com/apach

Re: [PR] Fix join type coercion when joining 2 relations with the same name via `DataFrame` API [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #14387: URL: https://github.com/apache/datafusion/pull/14387#issuecomment-2629032730 Thanks for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938310453 ## datafusion/common/src/error.rs: ## @@ -131,6 +131,10 @@ pub enum DataFusionError { /// Errors from either mapping LogicalPlans to/from Substrait plans /

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14392: URL: https://github.com/apache/datafusion/pull/14392#discussion_r1938312400 ## datafusion/spark/src/agg_funcs/avg.rs: ## @@ -0,0 +1,344 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-01 Thread via GitHub
alamb commented on code in PR #14392: URL: https://github.com/apache/datafusion/pull/14392#discussion_r1938312603 ## datafusion/spark/src/comet_scalar_funcs.rs: ## @@ -0,0 +1,192 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-01 Thread via GitHub
alamb commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2629041356 And merged up again due to another conflict... 🤦 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-01 Thread via GitHub
andygrove commented on code in PR #14392: URL: https://github.com/apache/datafusion/pull/14392#discussion_r1938313058 ## datafusion/spark/src/agg_funcs/avg.rs: ## @@ -0,0 +1,344 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-01 Thread via GitHub
getChan commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938313572 ## datafusion/common/src/error.rs: ## @@ -131,6 +131,10 @@ pub enum DataFusionError { /// Errors from either mapping LogicalPlans to/from Substrait plans

Re: [I] Support complex datatypes in Comet Scan [datafusion-comet]

2025-02-01 Thread via GitHub
mattwparas commented on issue #434: URL: https://github.com/apache/datafusion-comet/issues/434#issuecomment-2629047198 awesome, thank you for the update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] docs: Fix create_udf examples [datafusion]

2025-02-01 Thread via GitHub
nuno-faria opened a new pull request, #14405: URL: https://github.com/apache/datafusion/pull/14405 ## Which issue does this PR close? Closes #14404. ## Rationale for this change Provide examples using the most up-to-date API. ## What changes are inc

[I] Fix code example in `library-user-guide/adding-udfs` [datafusion]

2025-02-01 Thread via GitHub
nuno-faria opened a new issue, #14404: URL: https://github.com/apache/datafusion/issues/14404 ### Describe the bug The code example in `library-user-guide/adding-udfs` appears to use an older syntax, resulting in compilation errors. ### To Reproduce Execute the code prov

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-01 Thread via GitHub
Omega359 commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629085175 I just upgraded my project to latest main from DF 42. The primary compilation and test suite issues I encountered after setting `datafusion.execution.parquet.schema_force_view_

Re: [I] Add instructions to reduce rebuild time [datafusion-python]

2025-02-01 Thread via GitHub
timsaucer closed issue #971: Add instructions to reduce rebuild time URL: https://github.com/apache/datafusion-python/issues/971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: remove DataFusion pyarrow feat [datafusion-python]

2025-02-01 Thread via GitHub
timsaucer merged PR #1000: URL: https://github.com/apache/datafusion-python/pull/1000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] build(deps): bump uuid from 1.12.0 to 1.12.1 [datafusion-python]

2025-02-01 Thread via GitHub
dependabot[bot] commented on PR #1002: URL: https://github.com/apache/datafusion-python/pull/1002#issuecomment-2628954719 Looks like uuid is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] build(deps): bump uuid from 1.12.0 to 1.12.1 [datafusion-python]

2025-02-01 Thread via GitHub
dependabot[bot] closed pull request #1002: build(deps): bump uuid from 1.12.0 to 1.12.1 URL: https://github.com/apache/datafusion-python/pull/1002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Do not sort rows in `FirstValueAccumulator` [datafusion]

2025-02-01 Thread via GitHub
blaginin commented on code in PR #14402: URL: https://github.com/apache/datafusion/pull/14402#discussion_r1938293491 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -627,24 +607,19 @@ impl Accumulator for LastValueAccumulator { // last index contains is_set f

  1   2   >