Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-02 Thread via GitHub
ozankabak commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2630036258 @Weijun-H has been working on this with the Synnada team for a while. The initial benchmark results were promising, so we decided to continue development while receiving community

Re: [PR] Add related source code locations to errors [datafusion]

2025-02-02 Thread via GitHub
mkarbo commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2629303588 We are planning to write a blog on this and on the sqlparser work in the very near future! And yes we can file that during the week. Thanks for the help everyone! --

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
shehabgamin commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629306233 > In optimizer, we rely on the name to do such optimization so if we rename it to name like 'spark_count' we might need to add the spark name to those optimize rules as well, whi

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-02 Thread via GitHub
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629300546 > I'll get to this sometime in the next few hours. Taking a detour to review and comment on https://github.com/apache/datafusion/pull/14392. @jayzhan211 @alamb, I bit off m

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
jayzhan211 commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629310335 > > In optimizer, we rely on the name to do such optimization so if we rename it to name like 'spark_count' we might need to add the spark name to those optimize rules as well, wh

Re: [PR] Minor: fix typo in test name [datafusion]

2025-02-02 Thread via GitHub
jonahgao merged PR #14403: URL: https://github.com/apache/datafusion/pull/14403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] bug: Fix NULL handling in array_slice, introduce `NullHandling` enum to `Signature` [datafusion]

2025-02-02 Thread via GitHub
jayzhan211 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2629388087 We can merge this after the next release is out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: metadata columns [datafusion]

2025-02-02 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2629393602 @adriangb as I have said, it seems that you are thinking about this from database side, I'm talking about compute engine problem. the users i mean is big data engineer. changing

Re: [PR] perf: Improve `median` with no grouping by 2X [datafusion]

2025-02-02 Thread via GitHub
Rachelint merged PR #14399: URL: https://github.com/apache/datafusion/pull/14399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] feat: Support On-Demand Repartition [datafusion]

2025-02-02 Thread via GitHub
Weijun-H opened a new pull request, #14411: URL: https://github.com/apache/datafusion/pull/14411 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [I] Test DataFusion 45 with datafusion-python [datafusion]

2025-02-02 Thread via GitHub
kevinjqliu commented on issue #14410: URL: https://github.com/apache/datafusion/issues/14410#issuecomment-2629479654 Thanks! Happy to help. I see #1009 bumps datafusion-python from 43.0.0 to 44.0.0 I'll create a PR to bump datafusion-python to 45.0.0 and take on dependencies from

[PR] String agg missing functionality [datafusion]

2025-02-02 Thread via GitHub
gabotechs opened a new pull request, #14412: URL: https://github.com/apache/datafusion/pull/14412 ## Which issue does this PR close? Closes #8260. ## Rationale for this change Complete the missing functionality of the STRING_AGG function. ## What changes are includ

Re: [PR] Add Common Subexpression Elimination for `PhysicalExpr` trees [datafusion]

2025-02-02 Thread via GitHub
peter-toth commented on PR #13046: URL: https://github.com/apache/datafusion/pull/13046#issuecomment-2629478989 @andygrove, no problem, I will try to update this PR from `main` soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] Add support for DISTINCT and ORDER BY in ARRAY_AGG [datafusion]

2025-02-02 Thread via GitHub
gabotechs opened a new pull request, #14413: URL: https://github.com/apache/datafusion/pull/14413 ## Which issue does this PR close? Closes #12371. ## Rationale for this change Completing ARRAY_AGG functionality as a prerequisite for adding the full functionality of STRI

Re: [I] array_agg cannot perform both distinct and order_by [datafusion]

2025-02-02 Thread via GitHub
gabotechs commented on issue #12371: URL: https://github.com/apache/datafusion/issues/12371#issuecomment-2629487123 Hi @Rachelint 👋, not sure if you were still looking at this, I need support for DISTINCT + ORDER BY in ARRAY_AGG as a prerequisite for https://github.com/apache/datafusion/pu

Re: [PR] Add support for DISTINCT + ORDER BY in ARRAY_AGG [datafusion]

2025-02-02 Thread via GitHub
gabotechs commented on code in PR #14413: URL: https://github.com/apache/datafusion/pull/14413#discussion_r1938546489 ## datafusion/functions-aggregate-common/src/merge_arrays.rs: ## @@ -193,3 +193,149 @@ pub fn merge_ordered_arrays( Ok((merged_values, merged_orderings))

[PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu opened a new pull request, #1010: URL: https://github.com/apache/datafusion-python/pull/1010 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938548438 ## Cargo.toml: ## @@ -35,13 +35,13 @@ substrait = ["dep:datafusion-substrait"] [dependencies] tokio = { version = "1.42", features = ["macros", "rt",

Re: [I] Test DataFusion 45 with datafusion-python [datafusion]

2025-02-02 Thread via GitHub
kevinjqliu commented on issue #14410: URL: https://github.com/apache/datafusion/issues/14410#issuecomment-2629500454 Opened https://github.com/apache/datafusion-python/pull/1010 to upgrade datafusion-python to use datafusion 45.0.0 -- This is an automated message from the Apache Git Servi

Re: [PR] Prepare release 44.0.0 [datafusion-python]

2025-02-02 Thread via GitHub
andygrove commented on PR #1009: URL: https://github.com/apache/datafusion-python/pull/1009#issuecomment-2629502197 I can also help with the packaging in the next day or two -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938551798 ## Cargo.toml: ## @@ -17,7 +17,7 @@ [package] name = "datafusion-python" -version = "43.0.0" +version = "45.0.0" Review Comment: we might want to

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938548438 ## Cargo.toml: ## @@ -35,13 +35,13 @@ substrait = ["dep:datafusion-substrait"] [dependencies] tokio = { version = "1.42", features = ["macros", "rt",

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938551460 ## examples/ffi-table-provider/Cargo.toml: ## @@ -21,15 +21,15 @@ version = "0.1.0" edition = "2021" [dependencies] -datafusion = { version = "44.0.0"

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
comphead commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629508048 That is exactly what I was mentioning https://github.com/apache/datafusion/pull/14392#issuecomment-2628334890 not sure how to separate out functions with the same names. I think thi

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-02 Thread via GitHub
comphead commented on PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#issuecomment-2629508370 @andygrove @parthchandra @mbutrovich @kazuyukitanimura can I have a review please? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] docs: Fix create_udf examples [datafusion]

2025-02-02 Thread via GitHub
comphead merged PR #14405: URL: https://github.com/apache/datafusion/pull/14405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions [datafusion]

2025-02-02 Thread via GitHub
Garamda commented on issue #11732: URL: https://github.com/apache/datafusion/issues/11732#issuecomment-2629341525 PR is ready to get review : https://github.com/apache/datafusion/pull/13511 cc. @Dandandan -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Epic: Ordered Set Aggregate Functions [datafusion]

2025-02-02 Thread via GitHub
Garamda commented on issue #12824: URL: https://github.com/apache/datafusion/issues/12824#issuecomment-2629341277 PR is ready to get review : https://github.com/apache/datafusion/pull/13511 cc. @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Limits are not applied correctly [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas commented on issue #14406: URL: https://github.com/apache/datafusion/issues/14406#issuecomment-2629342754 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] DataSink::write_all given invalid RecordBatchStream [datafusion]

2025-02-02 Thread via GitHub
gatesn commented on issue #14394: URL: https://github.com/apache/datafusion/issues/14394#issuecomment-2629360656 I imagine it's downstream, and of course, the debug assert only catches bad RecordBatchStream impls that used the Adapter. This specific bug may not even be caught by the

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
andygrove commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629456979 > For Comet, we need the function names to match Spark. Actually, this isn't true. In Comet, we just need the Scala wrapper classes to have the same function names as Spark.

Re: [I] Test DataFusion 45.0.0 with Sail [datafusion]

2025-02-02 Thread via GitHub
Omega359 commented on issue #14408: URL: https://github.com/apache/datafusion/issues/14408#issuecomment-2629465143 I'd be curious if this could be narrowed down via git bisect. Probably would need a small script to do it and a quick test case -- This is an automated message from the Apach

[PR] Ballista Release Blog Announcement [datafusion-site]

2025-02-02 Thread via GitHub
milenkovicm opened a new pull request, #53: URL: https://github.com/apache/datafusion-site/pull/53 Ballista 43 release blog announcement! Better late than never :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Fix code example in `library-user-guide/adding-udfs` [datafusion]

2025-02-02 Thread via GitHub
comphead closed issue #14404: Fix code example in `library-user-guide/adding-udfs` URL: https://github.com/apache/datafusion/issues/14404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Ballista Release Blog Announcement [datafusion-site]

2025-02-02 Thread via GitHub
milenkovicm commented on PR #53: URL: https://github.com/apache/datafusion-site/pull/53#issuecomment-2629511725 I hope I haven't messed up it, please have a look when you get chance @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] chore: generate change log for 44.0.0 [datafusion-ballista]

2025-02-02 Thread via GitHub
milenkovicm opened a new pull request, #1173: URL: https://github.com/apache/datafusion-ballista/pull/1173 # Which issue does this PR close? Relates to #1172 # Rationale for this change Change log generation for ballista 44 release # What changes are included in

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
jayzhan211 commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629290634 > It may be a good idea to prefix all function names with spark_ to avoid confusion, conflicts, or unknown behavior between functions that share the same name. In optimizer

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub
shehabgamin commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629314116 Another regression is that implementing the `is_nullable` function in the `ScalarUDFImpl` trait no longer works. For example: ``` impl ScalarUDFImpl for SparkArray {

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
shehabgamin commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629317132 > If the behavior is the same we don't have any reason to copy one to spark crate, adding alias to the function is enough. @jayzhan211 I’ll need to verify this, but I reca

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938699415 ## Cargo.toml: ## @@ -35,13 +35,13 @@ substrait = ["dep:datafusion-substrait"] [dependencies] tokio = { version = "1.42", features = ["macros", "rt",

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
kevinjqliu commented on PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#issuecomment-2629787697 Converted to draft for now, blocked until datafusion v45 is published -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[I] Add array_min function support [datafusion]

2025-02-02 Thread via GitHub
erenavsarogullari opened a new issue, #14416: URL: https://github.com/apache/datafusion/issues/14416 ### Is your feature request related to a problem or challenge? Currently, Spark, Snowflake and Presto support `array_min` function. This can also be useful for DataFusion. ``` ar

Re: [I] Add array_min function support [datafusion]

2025-02-02 Thread via GitHub
erenavsarogullari commented on issue #14416: URL: https://github.com/apache/datafusion/issues/14416#issuecomment-2629798537 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Add array_min function support [datafusion]

2025-02-02 Thread via GitHub
Omega359 commented on issue #14416: URL: https://github.com/apache/datafusion/issues/14416#issuecomment-2629806061 Possibly related/tangential: https://github.com/apache/datafusion/pull/14392 -- This is an automated message from the Apache Git Service. To respond to the message, pl

[PR] feat: Add array_min function [datafusion]

2025-02-02 Thread via GitHub
erenavsarogullari opened a new pull request, #14417: URL: https://github.com/apache/datafusion/pull/14417 ## Which issue does this PR close? Closes #14416. ## What changes are included in this PR? Currently, Spark, Snowflake and Presto support `array_min` function. This can also

Re: [PR] Prepare release 44.0.0 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer commented on PR #1009: URL: https://github.com/apache/datafusion-python/pull/1009#issuecomment-2629583610 I'm moving this to draft just to make sure it doesn't merge until the release is approved -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] feat: Add implicit casting to `TypeSignature::String` [datafusion]

2025-02-02 Thread via GitHub
github-actions[bot] commented on PR #13404: URL: https://github.com/apache/datafusion/pull/13404#issuecomment-2629725618 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] feature request: support minute granularity with date_trunc [datafusion-python]

2025-02-02 Thread via GitHub
netapp-vaughan closed issue #1007: feature request: support minute granularity with date_trunc URL: https://github.com/apache/datafusion-python/issues/1007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] chore: Fixed CI [datafusion]

2025-02-02 Thread via GitHub
Weijun-H opened a new pull request, #14415: URL: https://github.com/apache/datafusion/pull/14415 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [I] array_agg cannot perform both distinct and order_by [datafusion]

2025-02-02 Thread via GitHub
Rachelint commented on issue #12371: URL: https://github.com/apache/datafusion/issues/12371#issuecomment-2629771616 > Hi [@Rachelint](https://github.com/Rachelint) 👋, not sure if you were still looking at this, I need support for DISTINCT + ORDER BY in ARRAY_AGG as a prerequisite for [#1441

[PR] bump arrow version and fix clippy error [datafusion]

2025-02-02 Thread via GitHub
Lordworms opened a new pull request, #14414: URL: https://github.com/apache/datafusion/pull/14414 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938599063 ## examples/ffi-table-provider/Cargo.toml: ## @@ -21,15 +21,15 @@ version = "0.1.0" edition = "2021" [dependencies] -datafusion = { version = "44.0.0"

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938599194 ## Cargo.toml: ## @@ -17,7 +17,7 @@ [package] name = "datafusion-python" -version = "43.0.0" +version = "45.0.0" Review Comment: I think we want to

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer commented on PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#issuecomment-2629590542 Thank you for taking this on! It looks like a smooth update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Chore/upgrade datafusion 45 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer commented on code in PR #1010: URL: https://github.com/apache/datafusion-python/pull/1010#discussion_r1938599713 ## Cargo.toml: ## @@ -35,13 +35,13 @@ substrait = ["dep:datafusion-substrait"] [dependencies] tokio = { version = "1.42", features = ["macros", "rt", "

Re: [PR] disable coercison for unmatched struct type [datafusion]

2025-02-02 Thread via GitHub
jayzhan211 commented on code in PR #14409: URL: https://github.com/apache/datafusion/pull/14409#discussion_r1938599855 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -614,7 +614,11 @@ pub fn try_type_union_resolution_with_struct( let mut keys_string: Option =

Re: [PR] disable coercison for unmatched struct type [datafusion]

2025-02-02 Thread via GitHub
jayzhan211 commented on code in PR #14409: URL: https://github.com/apache/datafusion/pull/14409#discussion_r1938599855 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -614,7 +614,11 @@ pub fn try_type_union_resolution_with_struct( let mut keys_string: Option =

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2025-02-02 Thread via GitHub
andygrove commented on PR #13717: URL: https://github.com/apache/datafusion/pull/13717#issuecomment-2629661922 > @andygrove for Comet, would it help if Physical expressions carried their type? Yes, maybe. It would be good to look at an example of this. -- This is an automated mess

Re: [PR] feat: metadata columns [datafusion]

2025-02-02 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1938650383 ## datafusion/core/src/execution/session_state.rs: ## @@ -1330,7 +1330,7 @@ impl SessionStateBuilder { /// let url = Url::try_from("file://").unwrap();

Re: [PR] feat: metadata columns [datafusion]

2025-02-02 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1938650952 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -370,26 +370,6 @@ impl LogicalPlan { } } -/// Gather the schema representating the metada

Re: [I] Test DataFusion 45.0.0 with Sail [datafusion]

2025-02-02 Thread via GitHub
findepi commented on issue #14408: URL: https://github.com/apache/datafusion/issues/14408#issuecomment-2629542547 > Physical input schema should be the same as the one converted from logical input schema. > Differences: > - field nullability at index 7 [#98]: (physical) false vs (log

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2025-02-02 Thread via GitHub
findepi commented on PR #13717: URL: https://github.com/apache/datafusion/pull/13717#issuecomment-2629540666 @andygrove for Comet, would it help if Physical expressions carried their type? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Fix: Avoid recursive external error wrapping [datafusion]

2025-02-02 Thread via GitHub
getChan commented on code in PR #14371: URL: https://github.com/apache/datafusion/pull/14371#discussion_r1938636466 ## datafusion/common/src/error.rs: ## @@ -131,6 +131,10 @@ pub enum DataFusionError { /// Errors from either mapping LogicalPlans to/from Substrait plans

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
andygrove commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629666721 I'm going to move this to draft for now. Let's keep the conversation going, though. I plan on working with the community to start adding tests to the `datafusion-comet-spark

Re: [I] Add array_min function support [datafusion]

2025-02-02 Thread via GitHub
erenavsarogullari commented on issue #14416: URL: https://github.com/apache/datafusion/issues/14416#issuecomment-2629881267 @Omega359 Thanks for the comment. Yes, this effort is also related with #14392. I think we have 2 use cases for `array_min` & `array_max` functions (and i think th

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2025-02-02 Thread via GitHub
findepi commented on PR #13717: URL: https://github.com/apache/datafusion/pull/13717#issuecomment-2630130120 Do you have an example code in Comet from which the return_type discussion started? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] fix: Limits are not applied correctly [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas commented on PR #14418: URL: https://github.com/apache/datafusion/pull/14418#issuecomment-2630139098 ```rust error: use of deprecated constant `arrow::datatypes::MAX_DECIMAL_FOR_EACH_PRECISION`: Use MAX_DECIMAL128_FOR_EACH_PRECISION (note indexes are different) --> dataf

Re: [PR] fix: Limits are not applied correctly [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas commented on PR #14418: URL: https://github.com/apache/datafusion/pull/14418#issuecomment-2630140001 cc @alamb @adriangb The PR is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Add parsing for GRANT ROLE and GRANT DATABASE ROLE in Snowflake dialect [datafusion-sqlparser-rs]

2025-02-02 Thread via GitHub
iffyio commented on code in PR #1689: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1689#discussion_r1938890850 ## src/ast/mod.rs: ## @@ -5912,6 +5924,7 @@ impl fmt::Display for GrantObjects { display_comma_separated(schemas)

Re: [PR] Add support for GRANT on some common Snowflake objects [datafusion-sqlparser-rs]

2025-02-02 Thread via GitHub
iffyio merged PR #1699: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] fix: Limits are not applied correctly [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas opened a new pull request, #14418: URL: https://github.com/apache/datafusion/pull/14418 ## Which issue does this PR close? Closes [14406](https://github.com/apache/datafusion/issues/14406) ## Rationale for this change Fix the behaviour for limit with Coale

Re: [I] Limits are not applied correctly [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas commented on issue #14406: URL: https://github.com/apache/datafusion/issues/14406#issuecomment-2630107041 Submitted a PR to fix it: https://github.com/apache/datafusion/pull/14418 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Add RETURNS TABLE() support for CREATE FUNCTION in Postgresql [datafusion-sqlparser-rs]

2025-02-02 Thread via GitHub
iffyio merged PR #1687: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-02-02 Thread via GitHub
buraksenn commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2630202903 Then, I could not find time to look at this and it is not working in branches at the moment. Unfortunately I'll be on mandatory military leave for 4 weeks in Turkey so I will not

Re: [I] Querying Parquet file specifically with a predicate returns invalid data error but works in other situations [datafusion]

2025-02-02 Thread via GitHub
alamb commented on issue #14281: URL: https://github.com/apache/datafusion/issues/14281#issuecomment-2629375060 Maybe what we can do is once the arrow-rs fix is in, we can add a test in datafusion with this particular test file / query to ensure there are no regressions -- This is an aut

Re: [I] [DISCUSSION] Lowering the barrier to new users (Lessons from-799 CMU Optimizer Class) [datafusion]

2025-02-02 Thread via GitHub
alamb commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2629378180 > I think we can find more optimizer-focused projects. There is so much to do there 🚀 That would be awesome -- I am sure @lmwnshn would appreciate any other suggestions -

Re: [I] DataSink::write_all given invalid RecordBatchStream [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas commented on issue #14394: URL: https://github.com/apache/datafusion/issues/14394#issuecomment-2629374509 Thank you @gatesn for the report. Can you provide the full code or sql to reproduce the it? So we can solve it more quickly. -- This is an automated message from t

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629379592 > hey @alamb quick question about the release process, are the release for datafusion and datafusion-python in locksteps? I see the next release for datafusion is 45.0.0 meanwhile

Re: [I] DataSink::write_all given invalid RecordBatchStream [datafusion]

2025-02-02 Thread via GitHub
gatesn commented on issue #14394: URL: https://github.com/apache/datafusion/issues/14394#issuecomment-2629380503 Yes, this code fails: https://github.com/apache/datafusion/compare/main...gatesn:datafusion:ngates/record-batch-stream-schema?expand=1#diff-9b1672adeba35025e24d21f8d7da2f0

Re: [I] Test DataFusion 45 with datafusion-python [datafusion]

2025-02-02 Thread via GitHub
alamb commented on issue #14410: URL: https://github.com/apache/datafusion/issues/14410#issuecomment-2629380625 I am hoping @@kevinjqliu can help with this ticket 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] test: add regression test for unnesting dictionary encoded columns [datafusion]

2025-02-02 Thread via GitHub
alamb merged PR #14395: URL: https://github.com/apache/datafusion/pull/14395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629381090 Ok, given the feedback what I think we should do is: 1. Complete the testing on our release branch (`branch-45`) and backport any fixes needed 2. Continue development on main

Re: [I] `make_array` -> `unnest` w/ dict-encoded strings fails [datafusion]

2025-02-02 Thread via GitHub
alamb closed issue #6057: `make_array` -> `unnest` w/ dict-encoded strings fails URL: https://github.com/apache/datafusion/issues/6057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub
jayzhan211 commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629384465 > Digging into the code, I see that it's deprecated (it should still work even though it's deprecated). What's strange, however, is that the deprecation warning is not propag

Re: [PR] Add regexp_extract func [datafusion]

2025-02-02 Thread via GitHub
Omega359 commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1938507496 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,322 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Prepare release 44.0.0 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer commented on PR #1009: URL: https://github.com/apache/datafusion-python/pull/1009#issuecomment-2629429628 @andygrove I've run into some trouble with the release instructions. Would you mind doing the packaging and publishing to test pypi? I also don't think I have access to dataf

[PR] Prepare release 44.0.0 [datafusion-python]

2025-02-02 Thread via GitHub
timsaucer opened a new pull request, #1009: URL: https://github.com/apache/datafusion-python/pull/1009 This is to prepare to release datafusion-python 44.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Test DataFusion 45 with datafusion-python [datafusion]

2025-02-02 Thread via GitHub
timsaucer commented on issue #14410: URL: https://github.com/apache/datafusion/issues/14410#issuecomment-2629430409 Happy to help. I would like to get https://github.com/apache/datafusion-python/pull/1009 wrapped up first, but it's not actually blocking for this test -- This is an automa

Re: [PR] feat: metadata columns [datafusion]

2025-02-02 Thread via GitHub
Omega359 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1938511635 ## datafusion/core/src/execution/session_state.rs: ## @@ -1330,7 +1330,7 @@ impl SessionStateBuilder { /// let url = Url::try_from("file://").unwrap();

Re: [PR] Prepare release 44.0.0 [datafusion-python]

2025-02-02 Thread via GitHub
andygrove commented on PR #1009: URL: https://github.com/apache/datafusion-python/pull/1009#issuecomment-2629432630 > @andygrove I've run into some trouble with the release instructions. Would you mind doing the packaging and publishing to test pypi? I also don't think I have access to dat

Re: [PR] feat: metadata columns [datafusion]

2025-02-02 Thread via GitHub
Omega359 commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1938512422 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -370,26 +370,6 @@ impl LogicalPlan { } } -/// Gather the schema representating the metadata

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-02-02 Thread via GitHub
andygrove commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629435886 > @comphead @andygrove @alamb It may be a good idea to prefix all function names with `spark_` to avoid confusion, conflicts, or unknown behavior between functions that share the s

Re: [PR] feat: metadata columns [datafusion]

2025-02-02 Thread via GitHub
chenkovsky commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1938514464 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -370,26 +370,6 @@ impl LogicalPlan { } } -/// Gather the schema representating the metada

Re: [I] Extension Types [datafusion]

2025-02-02 Thread via GitHub
alamb commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2629370589 Coming soon (arrow 54.2 in a month), support for Extension Types: - https://github.com/apache/arrow-rs/issues/4472 - https://github.com/apache/arrow-rs/pull/5822 Courtesy o

Re: [I] Limits are not applied correctly [datafusion]

2025-02-02 Thread via GitHub
zhuqi-lucas commented on issue #14406: URL: https://github.com/apache/datafusion/issues/14406#issuecomment-2629371225 **First round investigation:** The ParquetExec will apply limit to partition level, so it will return each row for 2 partitions, and the result will get 2 rows.

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-02 Thread via GitHub
timsaucer commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2629425413 > Hi [@kevinjqliu](https://github.com/kevinjqliu) -- I am not sure what the plan for datafusion-python is I am currently working on building the 44 release. I hope to ge