Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-27 Thread via GitHub
xudong963 commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2062983955 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -430,6 +430,32 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { Ok(Statistics::new

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-27 Thread via GitHub
xudong963 commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2062952007 ## datafusion/datasource/src/file_groups.rs: ## @@ -421,7 +421,7 @@ impl FileGroup { } /// Get the statistics for this group -pub fn statistics(&

Re: [PR] feat: simplify count distinct logical plan [datafusion]

2025-04-27 Thread via GitHub
chenkovsky commented on PR #15867: URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2834018098 I tested shared concurrent hashset(DashSet) to avoid clone, but no performance gain. something like ```rust static global_values: LazyLock> = LazyLock::new(|| Das

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-04-27 Thread via GitHub
joroKr21 commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2833989444 This missed the v47 train. Anything else needed to merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Respect ignore_nulls in array_agg [datafusion]

2025-04-27 Thread via GitHub
joroKr21 commented on PR #15544: URL: https://github.com/apache/datafusion/pull/15544#issuecomment-2833988848 This missed the v47 train. Anything else needed to merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Partial fix for #1078 — [Add Dataframe display config] [datafusion-python]

2025-04-27 Thread via GitHub
kosiew commented on PR #1086: URL: https://github.com/apache/datafusion-python/pull/1086#issuecomment-2833965602 Closing this. Moving the configuration from Rust to Python in #1119 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Partial fix for #1078 — [Add Dataframe display config] [datafusion-python]

2025-04-27 Thread via GitHub
kosiew closed pull request #1086: Partial fix for #1078 — [Add Dataframe display config] URL: https://github.com/apache/datafusion-python/pull/1086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] Partial fix for 1078: Enhance DataFrame Formatter Configuration with Memory and Display Controls [datafusion-python]

2025-04-27 Thread via GitHub
kosiew opened a new pull request, #1119: URL: https://github.com/apache/datafusion-python/pull/1119 ## Which issue does this PR close? partial fix for #1078 ## Rationale for this change This change improves the flexibility and performance of DataFrame rendering in no

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-04-27 Thread via GitHub
comphead commented on issue #1681: URL: https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2833858936 The simplest test ``` test("native reader - read a STRUCT subfield - field from second") { testSingleLineQuery( """ |select named_struct('

[I] main 5a7f638 is broken [datafusion-python]

2025-04-27 Thread via GitHub
kosiew opened a new issue, #1118: URL: https://github.com/apache/datafusion-python/issues/1118 **Describe the bug** When running `pytest` on the `main` branch, tests fail with the following error: ``` import functions as F E ModuleNotFoundError: No module named 'funct

Re: [I] main 5a7f638 is broken [datafusion-python]

2025-04-27 Thread via GitHub
kosiew commented on issue #1118: URL: https://github.com/apache/datafusion-python/issues/1118#issuecomment-2833823260 cc @deanm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] fix: recursive import [datafusion-python]

2025-04-27 Thread via GitHub
chenkovsky opened a new pull request, #1117: URL: https://github.com/apache/datafusion-python/pull/1117 # Which issue does this PR close? No # Rationale for this change expr depends on functions. functions depends on expr. there's a recursive import.

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-27 Thread via GitHub
jayzhan211 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2833729992 It seems like the syntax you mentioned is not supported yet ``` statement count 0 create table t(a varchar) as values ('a'), ('b'); query error DataFusion e

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-27 Thread via GitHub
jayzhan211 commented on PR #15861: URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833715141 > Whereas the DataFusion::AvroError is only produced by the avro reader but it affects every place where DataFusionError can appear. How about we convert the error into the

Re: [PR] feat: simplify count distinct logical plan [datafusion]

2025-04-27 Thread via GitHub
jayzhan211 commented on PR #15867: URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2833713281 > BTW, if we want to run count distinct in big data scenario, we have to use two-step process. so I think we have to add an configure to toggle this optimization. We ca

[I] bug: regexp_match not working? [datafusion]

2025-04-27 Thread via GitHub
juju4 opened a new issue, #15872: URL: https://github.com/apache/datafusion/issues/15872 ### Describe the bug From https://github.com/openobserve/openobserve/discussions/6584 regexp_match does not seem to work with length or space matches. see below. ### To Reproduce `

[PR] Handle dicts for distinct count [datafusion]

2025-04-27 Thread via GitHub
blaginin opened a new pull request, #15871: URL: https://github.com/apache/datafusion/pull/15871 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/258 ## Rationale for this change ## What changes are included in this PR?

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-04-27 Thread via GitHub
blaginin commented on code in PR #15793: URL: https://github.com/apache/datafusion/pull/15793#discussion_r2062708916 ## datafusion/common/src/config.rs: ## @@ -1995,11 +2052,11 @@ config_namespace! { } } -pub trait FormatOptionsExt: Display {} +pub trait OutputFormatExt:

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-27 Thread via GitHub
codecov-commenter commented on PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#issuecomment-2833580963 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1680?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-27 Thread via GitHub
rroelke commented on PR #15861: URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833570923 From the lint description: > Enum size is bounded by the largest variant. Having one large variant can penalize the memory layout of that enum. That is to say, the pres

Re: [PR] [wip] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-27 Thread via GitHub
huaxingao commented on PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#issuecomment-2833571912 Iceberg shades Parquet. In our internal version of Iceberg, we remove the shading. In OSS, when enabling Comet native execution in https://github.com/apache/iceberg/pull/12709

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-27 Thread via GitHub
rroelke commented on code in PR #15861: URL: https://github.com/apache/datafusion/pull/15861#discussion_r2062689096 ## datafusion/common/src/error.rs: ## @@ -59,7 +59,7 @@ pub enum DataFusionError { ParquetError(ParquetError), /// Error when reading Avro data. #[c

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-27 Thread via GitHub
klemniops commented on code in PR #15861: URL: https://github.com/apache/datafusion/pull/15861#discussion_r2062687991 ## datafusion/common/src/error.rs: ## @@ -59,7 +59,7 @@ pub enum DataFusionError { ParquetError(ParquetError), /// Error when reading Avro data. #

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-27 Thread via GitHub
klemniops commented on code in PR #15861: URL: https://github.com/apache/datafusion/pull/15861#discussion_r2062687991 ## datafusion/common/src/error.rs: ## @@ -59,7 +59,7 @@ pub enum DataFusionError { ParquetError(ParquetError), /// Error when reading Avro data. #

Re: [PR] chore: fix clippy::large_enum_variant for DataFusionError [datafusion]

2025-04-27 Thread via GitHub
klemniops commented on PR #15861: URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833568011 From the lint description: > Enum size is bounded by the largest variant. Having one large variant can penalize the memory layout of that enum. That is to say, the presenc

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-27 Thread via GitHub
comphead commented on code in PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#discussion_r2062680371 ## native/spark-expr/src/array_funcs/array_repeat.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max [datafusion]

2025-04-27 Thread via GitHub
gabotechs commented on code in PR #15857: URL: https://github.com/apache/datafusion/pull/15857#discussion_r2062676790 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -726,6 +726,8 @@ fn extract_window_frame_target_type(col_type: &DataType) -> Result { Ok(D

[PR] Added support for CREATE DOMAIN and its test suite [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 opened a new pull request, #1830: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830 This pull request adds support for the [`CREATE DOMAIN`](https://www.postgresql.org/docs/current/sql-createdomain.html) syntax and the tests to validate whether the implement

Re: [PR] 1075/enhancement/Make col class with __getattr__ [datafusion-python]

2025-04-27 Thread via GitHub
deanm commented on PR #1076: URL: https://github.com/apache/datafusion-python/pull/1076#issuecomment-2833514544 I got it from polars, not clever on my part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] `CREATE DOMAIN` is not supported [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 opened a new issue, #1829: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1829 [`CREATE DOMAIN`](https://www.postgresql.org/docs/current/sql-createdomain.html) is not supported. Statements such as the following fail parsing at this time: ```sql

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-04-27 Thread via GitHub
berkaysynnada commented on code in PR #15793: URL: https://github.com/apache/datafusion/pull/15793#discussion_r2062667570 ## datafusion/common/src/config.rs: ## @@ -1995,11 +2052,11 @@ config_namespace! { } } -pub trait FormatOptionsExt: Display {} +pub trait OutputForma

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-27 Thread via GitHub
berkaysynnada commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2062662498 ## datafusion/datasource/src/file_groups.rs: ## @@ -421,7 +421,7 @@ impl FileGroup { } /// Get the statistics for this group -pub fn statisti

[PR] Added support for `DROP DOMAIN` [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 opened a new pull request, #1828: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1828 This pull request: * Adds support for [`DROP DOMAIN`](https://www.postgresql.org/docs/current/sql-dropdomain.html) syntax resolving issue #1827 * Adds tests for `DROP

[I] `DROP DOMAIN` is not supported [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 opened a new issue, #1827: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1827 The [`DROP DOMAIN`](https://www.postgresql.org/docs/current/sql-dropdomain.html) syntax is not currently supported. Statements such as the following, therefore, cannot be p

Re: [PR] Impl intermeidate result blocked approach sketch [datafusion]

2025-04-27 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2833519008 @Dandandan @alamb this pr may be ready now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[I] Fix regression on main with circular import on expr [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer opened a new issue, #1116: URL: https://github.com/apache/datafusion-python/issues/1116 **Describe the bug** https://github.com/apache/datafusion-python/pull/1074 introduced a regression on `main`. It has two issues: functions is not imported properly in `expr.py` and also

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer commented on PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074#issuecomment-2833517926 I must have missed that CI didn't run on this. We have a circular import and broke CI on `main`. -- This is an automated message from the Apache Git Service. To respond to

[PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 opened a new pull request, #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826 This pull request resolves the bug described in issue #1825, which was caused by an incorrect implementation of the named argument parsing. It also adds a few tests to verify

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer merged PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Release DataFusion-Python 47.0.0 [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer commented on issue #1115: URL: https://github.com/apache/datafusion-python/issues/1115#issuecomment-2833485753 If we do not merge in #1112 then we at least need to make a different PR that includes the documentation changes so the site will build without issues and render properl

Re: [PR] feat: add missing PyLogicalPlan to_variant [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer commented on PR #1085: URL: https://github.com/apache/datafusion-python/pull/1085#issuecomment-2833467666 I'm sorry it took me so long to get around to reviewing this. Thank you for the contribution! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] 1075/enhancement/Make col class with __getattr__ [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer merged PR #1076: URL: https://github.com/apache/datafusion-python/pull/1076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Add a Col class instead of just col function to use __getattr__ method [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer closed issue #1075: Add a Col class instead of just col function to use __getattr__ method URL: https://github.com/apache/datafusion-python/issues/1075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[I] Library fails to parse function [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 opened a new issue, #1825: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1825 Attempting to parse a function such as the following currently fails with `ParserError("Expected: ), found: INT")` ```sql CREATE OR REPLACE FUNCTION check_values_differen

Re: [I] Seemingly pointless test [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 commented on issue #1807: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1807#issuecomment-2833477202 Ping to request I am expected to do with this, happy to remove the test if it is indeed pointless. -- This is an automated message from the Apache Git Se

Re: [I] Missing support for `INHERITS` operation from PostgreSQL [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 closed issue #1804: Missing support for `INHERITS` operation from PostgreSQL URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Missing support for `INHERITS` operation from PostgreSQL [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
LucaCappelletti94 commented on issue #1804: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1804#issuecomment-2833476990 Closing issue as relevant pull request has been merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add DataFrame usage guide with HTML rendering customization options [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer commented on PR #1108: URL: https://github.com/apache/datafusion-python/pull/1108#issuecomment-2833465864 Thank you again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add DataFrame usage guide with HTML rendering customization options [datafusion-python]

2025-04-27 Thread via GitHub
timsaucer merged PR #1108: URL: https://github.com/apache/datafusion-python/pull/1108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Update extending-operators.md [datafusion]

2025-04-27 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2833455331 > You can rebase with main doe this solve the issue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] Tracking: improve aggreagation fuzzer [datafusion]

2025-04-27 Thread via GitHub
Rachelint opened a new issue, #15870: URL: https://github.com/apache/datafusion/issues/15870 ### Is your feature request related to a problem or challenge? I found aggregation fuzzer are still hard to use when I act as an user currently. Some points I noticed can be improved:

Re: [PR] Move the udf module to user_defined [datafusion-python]

2025-04-27 Thread via GitHub
crystalxyz commented on PR #1112: URL: https://github.com/apache/datafusion-python/pull/1112#issuecomment-2833418001 @timsaucer The documentation changes look good to me! Do you think that adding a comment in `python/datafusion/user_defined` to explain the renaming would be helpful? People

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2062561210 ## src/parser/mod.rs: ## @@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [PR] Update extending-operators.md [datafusion]

2025-04-27 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2833377199 Would anyone happen to know how to preview the HTML format for the PR changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Update extending-operators.md [datafusion]

2025-04-27 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2833376316 You can rebase with main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Add DataFrame usage guide with HTML rendering customization options [datafusion-python]

2025-04-27 Thread via GitHub
kosiew commented on PR #1108: URL: https://github.com/apache/datafusion-python/pull/1108#issuecomment-2833375574 Thank you @timsaucer for the detailed review. I have corrected the above. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-04-27 Thread via GitHub
xudong963 commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2833359178 Also, there is a newer paper for the topic: https://15799.courses.cs.cmu.edu/spring2025/papers/11-unnesting/neumann-btw2025.pdf -- This is an automated message from the Apac

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-27 Thread via GitHub
xudong963 commented on PR #15865: URL: https://github.com/apache/datafusion/pull/15865#issuecomment-2833360326 Fyi @friendlymatthew -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: simplify count distinct logical plan [datafusion]

2025-04-27 Thread via GitHub
chenkovsky commented on PR #15867: URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2833360269 BTW, if we want to run count distinct in big data scenario, we have to use two-step process. so I think we have to add an configure to toggle this optimization. -- This is an a

Re: [PR] Support for projection item prefix operator (CONNECT_BY_ROOT) [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
iffyio commented on code in PR #1780: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1780#discussion_r2062522304 ## src/parser/mod.rs: ## @@ -15375,6 +15391,17 @@ impl<'a> Parser<'a> { } } +fn prefixed_expr(expr: Expr, prefix: Option) -> Expr { Review Comm

Re: [PR] Support some of pipe operators [datafusion-sqlparser-rs]

2025-04-27 Thread via GitHub
iffyio commented on code in PR #1759: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1759#discussion_r2062500408 ## src/parser/mod.rs: ## @@ -10574,11 +10598,96 @@ impl<'a> Parser<'a> { for_clause, settings, format

Re: [PR] chore: Make Aggregate transformation more compact [datafusion-comet]

2025-04-27 Thread via GitHub
EmilyMatt commented on code in PR #1670: URL: https://github.com/apache/datafusion-comet/pull/1670#discussion_r2062485241 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -430,55 +430,43 @@ class CometSparkSessionExtensions op,

Re: [PR] chore: Make Aggregate transformation more compact [datafusion-comet]

2025-04-27 Thread via GitHub
EmilyMatt commented on code in PR #1670: URL: https://github.com/apache/datafusion-comet/pull/1670#discussion_r2062480644 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -430,55 +430,43 @@ class CometSparkSessionExtensions op,