Re: [PR] Migrate the internal and testing functions to invoke_with_args [datafusion]

2025-02-16 Thread via GitHub
goldmedal commented on PR #14693: URL: https://github.com/apache/datafusion/pull/14693#issuecomment-2661769748 Thanks @alamb for reviewing 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] MIgrate math function macro to implement invoke_with_args [datafusion]

2025-02-16 Thread via GitHub
goldmedal merged PR #14690: URL: https://github.com/apache/datafusion/pull/14690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] MIgrate math function macro to implement invoke_with_args [datafusion]

2025-02-16 Thread via GitHub
goldmedal commented on PR #14690: URL: https://github.com/apache/datafusion/pull/14690#issuecomment-2661769452 Thanks @alamb 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Migrate Crypto function to `inovke_with_args` [datafusion]

2025-02-16 Thread via GitHub
Chen-Yuan-Lai commented on issue #14704: URL: https://github.com/apache/datafusion/issues/14704#issuecomment-2661785557 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-16 Thread via GitHub
rkrishn7 commented on code in PR #14538: URL: https://github.com/apache/datafusion/pull/14538#discussion_r1957518449 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -0,0 +1,264 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Do

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-16 Thread via GitHub
Omega359 commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1957429055 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1779,37 +1779,82 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-16 Thread via GitHub
Omega359 commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1957429055 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1779,37 +1779,82 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-16 Thread via GitHub
Omega359 commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1957429972 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1779,37 +1779,82 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-16 Thread via GitHub
Omega359 commented on PR #14684: URL: https://github.com/apache/datafusion/pull/14684#issuecomment-2661644184 Quite a speed improvement! I'll look at the code in a bit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on code in PR #14595: URL: https://github.com/apache/datafusion/pull/14595#discussion_r1957439224 ## datafusion/optimizer/src/decorrelate_lateral_join.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2661667504 The current one-pass subquery unnesting implementation in `PullUpCorrelatedExpr` does not handle cases such as when we have joins in subqueries: ```sql SELECT * FROM t0, LATE

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2661671600 I think this patch will not regress existing supported cases as they get fully decorrelated with the scalar query rule and the predicate subquery rule. The patch unlocks a new set of p

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on code in PR #14689: URL: https://github.com/apache/datafusion/pull/14689#discussion_r1957460734 ## datafusion/core/src/execution/context/csv.rs: ## @@ -116,11 +116,11 @@ mod tests { assert_eq!(results.len(), 1); let expected = [ -

Re: [PR] `AggregateUDFImpl::schema_name` and `AggregateUDFImpl::display_name` for customizable name [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on PR #14695: URL: https://github.com/apache/datafusion/pull/14695#issuecomment-2661723898 Add display name as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Migrate Datetime functions to `invoke_with_args` [datafusion]

2025-02-16 Thread via GitHub
haydenwoodhead commented on issue #14705: URL: https://github.com/apache/datafusion/issues/14705#issuecomment-2661861634 I'd be keen to give this a go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Migrate String Functions to `inovke_with_args` [datafusion]

2025-02-16 Thread via GitHub
zjregee commented on issue #14708: URL: https://github.com/apache/datafusion/issues/14708#issuecomment-2661857912 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-16 Thread via GitHub
2010YOUY01 commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2661367542 > It is a problem of datafusion-cli. If datafusion-cli decides to hold all the result batches in memory, it should create a memory consumer for itself and reserve memory for the

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-16 Thread via GitHub
zhuqi-lucas commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2661372383 > > This PR can't solve > > select * from t consumes 2GB memory to fully materialize the output, in the best case select * from t order by c1 consumes around 2GB memory, given

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-16 Thread via GitHub
Omega359 closed pull request #14653: Dataframe with_column and with_column_renamed performance improvements URL: https://github.com/apache/datafusion/pull/14653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] feat(examples): Add an example of boundary analysis for AND/OR expressions [datafusion]

2025-02-16 Thread via GitHub
clflushopt commented on code in PR #14688: URL: https://github.com/apache/datafusion/pull/14688#discussion_r1957360285 ## datafusion-examples/examples/expr_api.rs: ## @@ -275,6 +279,77 @@ fn range_analysis_demo() -> Result<()> { Ok(()) } +// DataFusion's analysis can inf

[PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-16 Thread via GitHub
Fly-Style opened a new pull request, #14699: URL: https://github.com/apache/datafusion/pull/14699 ## Rationale for this change The Statistics framework in Datafusion is a foundational component for query planning and execution. It provides metadata about datasets, enabling optimizati

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2661674964 And ready for review again :) After trying understanding what's happening in `decorrelate.rs`, I think we need new code path to support a variety of logical plans produced by lat

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2661676052 And FYI, the Hyper paper https://dl.gi.de/server/api/core/bitstreams/10f3ad20-e9ae-4876-abdf-5c8e83e4c595/content, and the SQL server paper https://www.microsoft.com/en-us/research/wp

[PR] Speed up `chr` UDF (~4x faster) [datafusion]

2025-02-16 Thread via GitHub
simonvandel opened a new pull request, #14700: URL: https://github.com/apache/datafusion/pull/14700 ## Which issue does this PR close? N/A ## Rationale for this change It is wasteful to allocate temporary memory with intermediate `to_string` calls. ```

Re: [I] Documentation regarding running/regenerating stability test plans [datafusion-comet]

2025-02-16 Thread via GitHub
andygrove commented on issue #1393: URL: https://github.com/apache/datafusion-comet/issues/1393#issuecomment-2661638673 > also, I've been able to run the tests by running without "-pl spark" now, but then my code is not ran but a default/previous version of Comet somehow? I'm truly stumped

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2661668777 For multiple subqueries like: ``` SELECT * FROM t0, LATERAL (SELECT * FROM t1 WHERE t0.v0 = t1.v0 AND t1.v1 > (SELECT SUM(v1) FROM t2 WHERE t1.v0 = t2.v0)); ``` I'm

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on code in PR #14595: URL: https://github.com/apache/datafusion/pull/14595#discussion_r1957438517 ## datafusion/sqllogictest/test_files/join.slt.part: ## @@ -1312,3 +1312,85 @@ SELECT a+b*2, statement ok drop table t1; + + +statement ok +CREATE TABLE t1(v0 BI

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-16 Thread via GitHub
skyzh commented on code in PR #14595: URL: https://github.com/apache/datafusion/pull/14595#discussion_r1957444892 ## datafusion/optimizer/src/decorrelate_lateral_join.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[PR] Add support for Postgres ALTER TYPE {RENAME TO|{ADD|RENAME} VALUE} [datafusion-sqlparser-rs]

2025-02-16 Thread via GitHub
jvatic opened a new pull request, #1727: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1727 And expose `CREATE TYPE AS ENUM` through `Generic` dialect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-16 Thread via GitHub
Omega359 commented on PR #14653: URL: https://github.com/apache/datafusion/pull/14653#issuecomment-2661642119 I've made some changes locally where I test to see if the existing plan is a projection but I realized that I can't just rely on that either as the plan could possibly have been man

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-16 Thread via GitHub
Omega359 commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1957428384 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -1617,9 +1617,19 @@ async fn with_column_renamed() -> Result<()> { // accepts table qualifier .

[I] Need help running benchmarks and other pyspark jobs. [datafusion-comet]

2025-02-16 Thread via GitHub
Noah-FetchRewards opened a new issue, #1411: URL: https://github.com/apache/datafusion-comet/issues/1411 ### Describe the bug I am trying to get a pyspark job to run on apache comet using an EKS cluster; however after 20+ hours, I am unable to do so for a variety of reasons. Ex

[I] No way to get the schema for sliding accumulator state [datafusion]

2025-02-16 Thread via GitHub
mwylde opened a new issue, #14701: URL: https://github.com/apache/datafusion/issues/14701 The AggregateUDF trait includes a function `fn state_fields(&self, args: StateFieldsArgs) -> Result>` to get the types for the intermediate state of the aggregate. This is useful if we need to store th

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2661731950 > Side note: Is this logic from the old [PR](https://github.com/apache/datafusion/pull/14268/files#diff-6e1fb265597317a8256c60670ff3ea7be6896b2df1199a40ca79419ce29b4ce3R610-R626)

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2661729365 > Side note: Is this logic from the old [PR](https://github.com/apache/datafusion/pull/14268/files#diff-6e1fb265597317a8256c60670ff3ea7be6896b2df1199a40ca79419ce29b4ce3R610-R626)

Re: [I] Unable to query file on Kubernetes on AWS EKS, for remote-sql.rs example [datafusion-ballista]

2025-02-16 Thread via GitHub
Noah-FetchRewards commented on issue #1180: URL: https://github.com/apache/datafusion-ballista/issues/1180#issuecomment-2661900847 Sorry @milenkovicm , I am unsure what you mean. My "client" is my local computer running the example ballista code from the repo at https://github.com/apache/d

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-16 Thread via GitHub
rkrishn7 commented on code in PR #14689: URL: https://github.com/apache/datafusion/pull/14689#discussion_r1957547941 ## datafusion/expr/src/planner.rs: ## @@ -211,6 +214,23 @@ pub trait ExprPlanner: Debug + Send + Sync { fn plan_any(&self, expr: RawBinaryExpr) -> Result> {

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on code in PR #14689: URL: https://github.com/apache/datafusion/pull/14689#discussion_r1957604951 ## datafusion/core/src/execution/context/csv.rs: ## @@ -116,11 +116,11 @@ mod tests { assert_eq!(results.len(), 1); let expected = [ -

[PR] minor: enable decimal dictionary sbbf pruning test [datafusion]

2025-02-16 Thread via GitHub
korowa opened a new pull request, #14711: URL: https://github.com/apache/datafusion/pull/14711 ## Which issue does this PR close? Closes #13821. ## Rationale for this change The issue with reading / writing decimal dictionaries has been fixed in `parquet`

[I] Overflow happened on: -2147483648 % -1 [datafusion-comet]

2025-02-16 Thread via GitHub
wForget opened a new issue, #1412: URL: https://github.com/apache/datafusion-comet/issues/1412 ### Describe the bug `-2147483648 % -1` evaluates to 0 in scala, but fails in rust. scala: ![Image](https://github.com/user-attachments/assets/903d305b-7800-46be-81ce-a008c9ca8

Re: [I] Attach `Diagnostic` to "invalid function argument types" error [datafusion]

2025-02-16 Thread via GitHub
dentiny commented on issue #14431: URL: https://github.com/apache/datafusion/issues/14431#issuecomment-2662134875 Checking the code, I found for "invalid argument type" error there're quite a few places we need to update; I update only one of them in the PR, mainly want to make sure I don

Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-02-16 Thread via GitHub
alan910127 commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2662141685 I've found that the relevant code is in `datafusion/optimizer/src/analyzer/type_coercion.rs`. However, there's another issue: the `Expr::Not` match arm calls `get_casted_expr

Re: [I] Feature request: hermetic build [datafusion]

2025-02-16 Thread via GitHub
alamb commented on issue #14678: URL: https://github.com/apache/datafusion/issues/14678#issuecomment-2661380706 Thank you @dentiny > it reduces confusion for newcomer to start playing with the project ANother way we have tried to make this easier is via the devcontianer

Re: [I] Feature request: documentation on project build instructions [datafusion]

2025-02-16 Thread via GitHub
alamb commented on issue #14681: URL: https://github.com/apache/datafusion/issues/14681#issuecomment-2661380879 Thank you @dentiny Maybe we can also highlight some instructions about the using the decontainer too https://github.com/apache/datafusion/blob/main/.devcontainer/Dockerf

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1957300909 ## docs/source/contributor-guide/index.md: ## @@ -216,3 +216,23 @@ The good thing about open code and open development is that any issues in one ch Pull requests

[I] alamb's review queue [datafusion]

2025-02-16 Thread via GitHub
alamb opened a new issue, #14698: URL: https://github.com/apache/datafusion/issues/14698 This is my current PR review list. I am putting it on github because: 1. I like how github renders checklists w/ PR titles so it is easy to track (I currently have a local text file...) 2. I though

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957303984 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -84,42 +84,54 @@ impl EnforceSorting { } } -/// This object is used within the [`EnforceSo

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-16 Thread via GitHub
alamb merged PR #14673: URL: https://github.com/apache/datafusion/pull/14673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1956363618 ## datafusion/core/src/lib.rs: ## @@ -229,9 +229,9 @@ //! 1. The query string is parsed to an Abstract Syntax Tree (AST) //![`Statement`] using [sqlparser]. /

Re: [I] extended_test (with memory limit tracking) are commented out [datafusion]

2025-02-16 Thread via GitHub
2010YOUY01 commented on issue #14680: URL: https://github.com/apache/datafusion/issues/14680#issuecomment-2661388984 This test is failing because it runs out of disk space, I've checked the remaining disk space after Github's CI runner has setup the rust toolchain and before running any tes

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-16 Thread via GitHub
rluvaton commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2661388762 Hey, is there a reason why not using hdfs-sys or hdrs? it looks like it support many more versions of hdfs, versions 2.2 to 3.3 https://github.com/Xuanwo/hdfs-sys/blob/ma

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1957305049 ## datafusion/expr/src/planner.rs: ## @@ -62,39 +75,44 @@ pub trait ContextProvider { not_impl_err!("Recursive CTE is not implemented") } -/// Ge

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1957305082 ## datafusion/sql/src/planner.rs: ## @@ -224,7 +224,24 @@ impl PlannerContext { } } -/// SQL query planner +/// SQL query planner and binder Review Comment:

Re: [I] alamb's review queue [datafusion]

2025-02-16 Thread via GitHub
alamb commented on issue #14698: URL: https://github.com/apache/datafusion/issues/14698#issuecomment-2661386427 ## PRs In need of first Review DataFusion - [ ] https://github.com/apache/datafusion/pull/14196 arow-rs - [ ] https://github.com/apache/arrow-rs/pull/696

Re: [PR] refactor: do ambiguous_distinct_check in select [datafusion]

2025-02-16 Thread via GitHub
alamb commented on PR #14180: URL: https://github.com/apache/datafusion/pull/14180#issuecomment-2661386974 This PR seems to be failing CI tests -- marking it as draft while those are resolved to make it clear this isn't waiitng on review (I am trying to reduce the review queue) -- This i

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1957305771 ## datafusion/sql/src/planner.rs: ## @@ -224,7 +224,24 @@ impl PlannerContext { } } -/// SQL query planner +/// SQL query planner and binder +/// +/// This s

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1957306599 ## datafusion/expr/src/planner.rs: ## @@ -49,11 +54,19 @@ pub trait ContextProvider { not_impl_err!("Table Functions are not supported") } -/// T

Re: [PR] script to export benchmark information as Line Protocol format [datafusion]

2025-02-16 Thread via GitHub
alamb merged PR #14662: URL: https://github.com/apache/datafusion/pull/14662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] optimize performance of the repeat function (up to 50% faster) [datafusion]

2025-02-16 Thread via GitHub
alamb commented on PR #14697: URL: https://github.com/apache/datafusion/pull/14697#issuecomment-2661397049 > Hi, @alamb. > > In addition, thank you very much for your help, but I have tried some suggestions in #14610 to reduce memory copy by using the `write!`, but I have not achieve

Re: [PR] optimize performance of the repeat function (up to 50% faster) [datafusion]

2025-02-16 Thread via GitHub
alamb commented on PR #14697: URL: https://github.com/apache/datafusion/pull/14697#issuecomment-2661398431 BTW I got a flamegraph of this run while using ```shell andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ sudo flamegraph -- target/release/deps/repeat-f851d6dc5c039e6c

Re: [I] Export benchmark information as line protocol [datafusion]

2025-02-16 Thread via GitHub
alamb closed issue #6107: Export benchmark information as line protocol URL: https://github.com/apache/datafusion/issues/6107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] `AggregateUDFImpl::schema_name` for customizable schema name [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14695: URL: https://github.com/apache/datafusion/pull/14695#discussion_r1957309589 ## datafusion/expr/src/udaf.rs: ## @@ -382,6 +390,45 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { /// Returns this function's name fn name(&self) -

Re: [PR] Migrate the internal and testing functions to invoke_with_args [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14693: URL: https://github.com/apache/datafusion/pull/14693#discussion_r1957309927 ## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ## @@ -518,16 +514,12 @@ impl ScalarUDFImpl for AddIndexToStringVolatileScalarUDF {

Re: [PR] MIgrate math function macro to implement invoke_with_args [datafusion]

2025-02-16 Thread via GitHub
alamb commented on PR #14690: URL: https://github.com/apache/datafusion/pull/14690#issuecomment-2661400632 Thanks (again) @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-16 Thread via GitHub
2010YOUY01 commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2661365989 Sorry, I mistakenly edited your original reply @Kontinuation, I'm trying to revert it back. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-16 Thread via GitHub
EmilyMatt commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2661320829 The issue can be dissected as this: a. There is no reason to have a Partial aggregate and not a Final one, regardless of shuffle, if we support the aggregate expres

Re: [I] Documentation regarding running/regenerating stability test plans [datafusion-comet]

2025-02-16 Thread via GitHub
EmilyMatt commented on issue #1393: URL: https://github.com/apache/datafusion-comet/issues/1393#issuecomment-2661321597 also, I've been able to run the tests by running without "-pl spark" now, but then my code is not ran but a default/previous version of Comet somehow? I'm truly stumped

[PR] Minor: Re-export `datafusion_expr_common` crate [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 opened a new pull request, #14696: URL: https://github.com/apache/datafusion/pull/14696 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

[PR] optimize performance of the repeat function [datafusion]

2025-02-16 Thread via GitHub
zjregee opened a new pull request, #14697: URL: https://github.com/apache/datafusion/pull/14697 ## Which issue does this PR close? - Closes #14610. ## Rationale for this change ## What changes are included in this PR? By calculating the length in advance and allocating memory acco

Re: [PR] Minor: Re-export `datafusion_expr_common` crate [datafusion]

2025-02-16 Thread via GitHub
goldmedal commented on PR #14696: URL: https://github.com/apache/datafusion/pull/14696#issuecomment-2661338454 Related discussion: https://github.com/apache/datafusion/pull/14440#issuecomment-2661228616 -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Minor: Re-export `datafusion_expr_common` crate [datafusion]

2025-02-16 Thread via GitHub
goldmedal merged PR #14696: URL: https://github.com/apache/datafusion/pull/14696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] optimize performance of the repeat function [datafusion]

2025-02-16 Thread via GitHub
zjregee commented on PR #14697: URL: https://github.com/apache/datafusion/pull/14697#issuecomment-2661339934 Hi, @alamb. In addition, thank you very much for your help, but I have tried some suggestions in #14610 to reduce memory copy by using the `write!`, but I have not achieved an

Re: [PR] Minor: Re-export `datafusion_expr_common` crate [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on PR #14696: URL: https://github.com/apache/datafusion/pull/14696#issuecomment-2661340709 Thanks @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner [datafusion]

2025-02-16 Thread via GitHub
jayzhan211 commented on code in PR #14689: URL: https://github.com/apache/datafusion/pull/14689#discussion_r1957280402 ## datafusion/expr/src/expr.rs: ## @@ -2272,13 +2273,20 @@ impl Display for SchemaDisplay<'_> { order_by, null_treatment,

Re: [PR] start refactoring process by setting up base + init [datafusion]

2025-02-16 Thread via GitHub
Rachelint commented on PR #14306: URL: https://github.com/apache/datafusion/pull/14306#issuecomment-2661345815 > @Rachelint this is just a reminder. Please disregard if this isn't needed. Sorry, I am back and reviewing now. -- This is an automated message from the Apache Git Service

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-16 Thread via GitHub
Kontinuation commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2661348012 > * **This PR can't solve** > `select * from t` consumes 2GB memory to fully materialize the output, in the best case `select * from t order by c1` consumes around 2GB memor

Re: [PR] optimize performance of the repeat function (up to 50% faster) [datafusion]

2025-02-16 Thread via GitHub
Dandandan commented on code in PR #14697: URL: https://github.com/apache/datafusion/pull/14697#discussion_r1957315634 ## datafusion/functions/src/string/repeat.rs: ## @@ -151,20 +151,35 @@ where T: OffsetSizeTrait, S: StringArrayType<'a>, { -let mut builder: Gener

Re: [PR] optimize performance of the repeat function (up to 50% faster) [datafusion]

2025-02-16 Thread via GitHub
zjregee commented on code in PR #14697: URL: https://github.com/apache/datafusion/pull/14697#discussion_r1957317071 ## datafusion/functions/src/string/repeat.rs: ## @@ -151,20 +151,35 @@ where T: OffsetSizeTrait, S: StringArrayType<'a>, { -let mut builder: Generic

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-16 Thread via GitHub
dentiny commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1957718072 ## docs/source/contributor-guide/index.md: ## @@ -216,3 +216,23 @@ The good thing about open code and open development is that any issues in one ch Pull reques

Re: [PR] Add sum statistics and PhysicalExpr::column_statistics [datafusion]

2025-02-16 Thread via GitHub
berkaysynnada commented on PR #13736: URL: https://github.com/apache/datafusion/pull/13736#issuecomment-2662248232 This can be closed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Improve `downcast_value!` macro [datafusion]

2025-02-16 Thread via GitHub
findepi merged PR #14683: URL: https://github.com/apache/datafusion/pull/14683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

[PR] Map access supports constant-resolvable expressions [datafusion]

2025-02-16 Thread via GitHub
Lordworms opened a new pull request, #14712: URL: https://github.com/apache/datafusion/pull/14712 ## Which issue does this PR close? - Closes #11785 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-16 Thread via GitHub
berkaysynnada commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2662253980 cc @gatesn @ch-sc @suremarc @edmondop as I remember you working with statistics, and these are some related issues: https://github.com/apache/datafusion/issues/14237 https

Re: [PR] Add sum statistics and PhysicalExpr::column_statistics [datafusion]

2025-02-16 Thread via GitHub
gatesn closed pull request #13736: Add sum statistics and PhysicalExpr::column_statistics URL: https://github.com/apache/datafusion/pull/13736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add sum statistics and PhysicalExpr::column_statistics [datafusion]

2025-02-16 Thread via GitHub
gatesn commented on PR #13736: URL: https://github.com/apache/datafusion/pull/13736#issuecomment-2662254473 Yes, although is there a timeline for the new stats API? I'm keen to get the change in that adds `PhysicalExpr::column_statistics` or similar to the expression trait. https://github.

Re: [I] Feature request: documentation on project build instructions [datafusion]

2025-02-16 Thread via GitHub
dentiny closed issue #14681: Feature request: documentation on project build instructions URL: https://github.com/apache/datafusion/issues/14681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Add support for `ORDER BY ALL` [datafusion-sqlparser-rs]

2025-02-16 Thread via GitHub
iffyio commented on code in PR #1724: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1724#discussion_r1957698799 ## src/parser/mod.rs: ## @@ -13405,6 +13419,19 @@ impl<'a> Parser<'a> { }) } +pub fn parse_order_by_all(&mut self) -> Result { Revi

Re: [I] Feature request: documentation on project build instructions [datafusion]

2025-02-16 Thread via GitHub
dentiny commented on issue #14681: URL: https://github.com/apache/datafusion/issues/14681#issuecomment-2662208739 I find project setup already well-documented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-16 Thread via GitHub
dentiny commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1957683279 ## docs/source/contributor-guide/index.md: ## @@ -216,3 +216,23 @@ The good thing about open code and open development is that any issues in one ch Pull reques

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-16 Thread via GitHub
dentiny commented on PR #14694: URL: https://github.com/apache/datafusion/pull/14694#issuecomment-2662225775 > I have some other thought on making this easier to find. What do you think? Both sounds good to me! -- This is an automated message from the Apache Git Service. To

Re: [PR] Add support for Postgres ALTER TYPE {RENAME TO|{ADD|RENAME} VALUE} [datafusion-sqlparser-rs]

2025-02-16 Thread via GitHub
iffyio commented on code in PR #1727: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1727#discussion_r1957667202 ## tests/sqlparser_postgres.rs: ## @@ -5320,6 +5320,117 @@ fn parse_create_type_as_enum() { } } +#[test] +fn parse_alter_type() { +struct Te

Re: [PR] Implement SnowFlake ALTER SESSION [datafusion-sqlparser-rs]

2025-02-16 Thread via GitHub
iffyio commented on PR #1712: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1712#issuecomment-2662199671 @osipovartem could you take a look into the CI failure? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-16 Thread via GitHub
berkaysynnada commented on PR #14538: URL: https://github.com/apache/datafusion/pull/14538#issuecomment-2662269400 > I was thinking I can open up an issue to fix the failing tests due to the sanity checker, and leave that as a follow-up once this is merged. Let me know if that sounds good t

Re: [I] Implement UNION ALL BY NAME [datafusion]

2025-02-16 Thread via GitHub
berkaysynnada closed issue #14508: Implement UNION ALL BY NAME URL: https://github.com/apache/datafusion/issues/14508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-16 Thread via GitHub
berkaysynnada merged PR #14538: URL: https://github.com/apache/datafusion/pull/14538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Runtime-adaptive data representation [datafusion]

2025-02-16 Thread via GitHub
findepi commented on issue #12720: URL: https://github.com/apache/datafusion/issues/12720#issuecomment-2662312332 Prior discussion https://github.com/apache/datafusion/discussions/7421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] pyarrow-19.0.0 breaks unit test [datafusion-python]

2025-02-16 Thread via GitHub
timsaucer opened a new issue, #1023: URL: https://github.com/apache/datafusion-python/issues/1023 **Describe the bug** When running the unit tests with pyarrow-19.0.0 installed the `test_write_compressed_parquet` tests fail with an error: ``` libc++abi: terminating due to un

Re: [PR] Migrate the internal and testing functions to invoke_with_args [datafusion]

2025-02-16 Thread via GitHub
alamb merged PR #14693: URL: https://github.com/apache/datafusion/pull/14693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-16 Thread via GitHub
alamb commented on code in PR #14689: URL: https://github.com/apache/datafusion/pull/14689#discussion_r1957332659 ## datafusion/expr/src/planner.rs: ## @@ -211,6 +214,23 @@ pub trait ExprPlanner: Debug + Send + Sync { fn plan_any(&self, expr: RawBinaryExpr) -> Result> {

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-16 Thread via GitHub
Dandandan commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2661425398 > > I ran some tests yesterday and I can confirm the runtime improvements. I do get some high memory usage however especially with some queries (TPC-H Query 18 I believe) than when

  1   2   >