Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
Dandandan commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063754335 > > is it using a hash table or open addressing (df doesn't have the latter) > > [@XiangpengHao](https://github.com/XiangpengHao) has mentioned several times that we thi

Re: [PR] Perform type coercion for corr aggregate function [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #15776: URL: https://github.com/apache/datafusion/pull/15776#issuecomment-3063767365 Thanks again @kumarlokesh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Perform type coercion for corr aggregate function [datafusion]

2025-07-11 Thread via GitHub
alamb merged PR #15776: URL: https://github.com/apache/datafusion/pull/15776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Avoid explicit cast during execution in `corr` aggregate function [datafusion]

2025-07-11 Thread via GitHub
alamb closed issue #13721: Avoid explicit cast during execution in `corr` aggregate function URL: https://github.com/apache/datafusion/issues/13721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
alamb commented on code in PR #16744: URL: https://github.com/apache/datafusion/pull/16744#discussion_r2201840244 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1739,7 +1739,7 @@ async fn roundtrip_physical_plan_node() { } // Failing due to https://github

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3063812572 Thank you @XiangpengHao FYI @NGA-TRAN and @LiaCastaneda -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16668: URL: https://github.com/apache/datafusion/pull/16668#issuecomment-3063821753 I merged up from main to rerun the CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
alamb commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2201829283 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<()>

Re: [PR] Per file filter evaluation [datafusion]

2025-07-11 Thread via GitHub
alamb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2201812153 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
jonathanc-n commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063872693 Another thing we can do is hash it once and use parts of the hash at a time during `RepartitionExec` and building the hashtable. This is made even better with having to do a

Re: [PR] Per file filter evaluation [datafusion]

2025-07-11 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2201901299 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
Dandandan commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063915741 Also, referencing the direct indexing / perfect hash join here. I think that should be relatively simple to implement. https://github.com/duckdb/duckdb/pull/1959 #816

Re: [PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-11 Thread via GitHub
colinmarc commented on PR #16750: URL: https://github.com/apache/datafusion/pull/16750#issuecomment-3063952865 I added a test! Let me know if that seems like enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] docs: Remove legacy comment in docs [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove merged PR #2022: URL: https://github.com/apache/datafusion-comet/pull/2022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] Serializing custom `TableSource` implementations fails [datafusion]

2025-07-11 Thread via GitHub
colinmarc opened a new issue, #16749: URL: https://github.com/apache/datafusion/issues/16749 ### Describe the bug `LogicalExtensionCodec` allows providing a custom serialization strategy for a `TableProvider`, but the calling code always expects a `DefaultTableSource` to unwrap:

[PR] add filter to handle backtrace [datafusion]

2025-07-11 Thread via GitHub
geetanshjuneja opened a new pull request, #16752: URL: https://github.com/apache/datafusion/pull/16752 ## Which issue does this PR close? - Closes #16146. ## Rationale for this change To run datafusion-cli tests with backtrace=1 ## What changes are incl

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3062840649 A little bit more investigation show that some of the non-determinism is introduced by hashset, so we probably also want to change how we compare plans. -- This is an automate

Re: [PR] Benchmark for char expression [datafusion]

2025-07-11 Thread via GitHub
ajita-asthana commented on PR #16743: URL: https://github.com/apache/datafusion/pull/16743#issuecomment-3062843109 Thanks @comphead I will fix the linux build failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich merged PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Add ANSI support for Remainder [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich closed issue #532: Add ANSI support for Remainder URL: https://github.com/apache/datafusion-comet/issues/532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2201477659 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2200966491 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
rishvin commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2201439622 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_fro

[I] try_ arithmetic functions return incorrect results [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new issue, #2021: URL: https://github.com/apache/datafusion-comet/issues/2021 ### Describe the bug As part of exploring writing unit tests for serde code in https://github.com/apache/datafusion-comet/issues/2020, I discovered that we currently have incorrect behavi

Re: [I] Optimized spill file format [datafusion]

2025-07-11 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-3063721375 > Yes, I think so. Of course, there's still room to seek further performance optimizations, but for now: Indeed -we can always make the code better :) -- This is an a

Re: [I] Optimized spill file format [datafusion]

2025-07-11 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-3063722047 Thanks again @ding-young -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Optimized spill file format [datafusion]

2025-07-11 Thread via GitHub
alamb closed issue #14078: Optimized spill file format URL: https://github.com/apache/datafusion/issues/14078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-11 Thread via GitHub
corwinjoy commented on code in PR #16738: URL: https://github.com/apache/datafusion/pull/16738#discussion_r2201788873 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1571,12 +1564,14 @@ fn spawn_parquet_parallel_serialization_task( let max_row_group_rows = w

Re: [D] DISCUSSION: DataFusion Meetup in Boston, USA [datafusion]

2025-07-11 Thread via GitHub
GitHub user alamb added a comment to the discussion: DISCUSSION: DataFusion Meetup in Boston, USA Sounds good -- when you get a chance perhaps you can create a luma event (or some other signup of your choosing) so we can start advertising / starting speakers Here is an example (this fall NYC

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-11 Thread via GitHub
alamb merged PR #16732: URL: https://github.com/apache/datafusion/pull/16732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3063730981 Thanks again @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3063739024 > * Doing (some of the) merge algorithm itself in parallel - I am not sure what would the best way forward here, but it seems it could give the largest gains, as merging is currently d

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-11 Thread via GitHub
corwinjoy commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3063738684 Looks good to me, with the exception of multi-row group writing being missing. When you go to rebase to the latest datafusion the diff should get a lot simpler since they have upgr

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
alamb commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063749417 > is it using a hash table or open addressing (df doesn't have the latter) @XiangpengHao has mentioned several times that we think DuckDB uses radix trees (which work l

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-11 Thread via GitHub
corwinjoy commented on code in PR #16738: URL: https://github.com/apache/datafusion/pull/16738#discussion_r2201805504 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1723,28 +1708,47 @@ async fn output_single_parquet_file_parallelized( let (serialize_tx, seriali

[PR] Support for T-SQL [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
yoavcloud opened a new pull request, #1937: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1937 T-SQL scripts do not require a semicolon as a statement delimiter. Added a new parser option to not require semicolons, as introducing this at the dialect level seems to be very intr

[PR] chore(deps): bump sysinfo from 0.35.2 to 0.36.0 [datafusion]

2025-07-11 Thread via GitHub
dependabot[bot] opened a new pull request, #16747: URL: https://github.com/apache/datafusion/pull/16747 Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.35.2 to 0.36.0. Changelog Sourced from https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md";>sysi

Re: [PR] add Makefile and local setup instruction in README [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on PR #86: URL: https://github.com/apache/datafusion-site/pull/86#issuecomment-3061777542 This `makefile` is amazing and makes me way more productive -- thank yoU again @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200411812 ## content/blog/2025-07-11-datafusion-47.0.0.md: ## @@ -0,0 +1,272 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-11 +author: PMC +cat

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200410610 ## content/blog/2025-07-10-datafusion-47.0.0.md: ## @@ -0,0 +1,256 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-10 +author: PMC +cat

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200413249 ## content/blog/2025-07-11-datafusion-47.0.0.md: ## @@ -0,0 +1,272 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-11 +author: PMC +cat

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200412619 ## content/blog/2025-07-11-datafusion-47.0.0.md: ## @@ -0,0 +1,272 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-11 +author: PMC +cat

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200413022 ## content/blog/2025-07-11-datafusion-47.0.0.md: ## @@ -0,0 +1,272 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-11 +author: PMC +cat

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb merged PR #83: URL: https://github.com/apache/datafusion-site/pull/83 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on PR #83: URL: https://github.com/apache/datafusion-site/pull/83#issuecomment-3061841186 The blog is live! https://datafusion.apache.org/blog/2025/07/11/datafusion-47.0.0/ -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] DataFusion 48.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on PR #84: URL: https://github.com/apache/datafusion-site/pull/84#issuecomment-3061830142 I am hoping to publish this one next wednesday, 2 days after the https://github.com/apache/datafusion-site/pull/79 -- This is an automated message from the Apache Git Service. To r

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on code in PR #83: URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200389589 ## content/blog/2025-07-10-datafusion-47.0.0.md: ## @@ -0,0 +1,256 @@ +--- +layout: post +title: Apache DataFusion 47.0.0 Released +date: 2025-07-10 +author: PMC +cat

[PR] Use tokio::task::coop::poll_proceed by default in CooperativeStream [datafusion]

2025-07-11 Thread via GitHub
pepijnve opened a new pull request, #16748: URL: https://github.com/apache/datafusion/pull/16748 ## Which issue does this PR close? - Closes #16489. ## Rationale for this change Using `poll_proceed` instead of `consume_budget` allows for budget consumption rollback which

Re: [PR] Use tokio::task::coop::poll_proceed by default in CooperativeStream [datafusion]

2025-07-11 Thread via GitHub
pepijnve commented on PR #16748: URL: https://github.com/apache/datafusion/pull/16748#issuecomment-3061578448 Draft status since this depends on the current git version of Tokio. Can be considered for merging after the next Tokio scheduled release. -- This is an automated message from the

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
LiaCastaneda commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2199941659 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Resu

Re: [PR] share staging infrastructure [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on PR #88: URL: https://github.com/apache/datafusion-site/pull/88#issuecomment-3061697398 > @alamb could you maybe push an empty commit to this branch? Or if its easier, add a github suggestion and apply it. That should trigger CI using your github user Done -- pushed

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
LiaCastaneda commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3061154200 Just out of curiosity, do you know if the issue is that an specific node can't serialize? -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Add support for Redshift `SELECT * EXCLUDE` [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio merged PR #1936: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
adriangb commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062283423 > @adriangb can you take a look if this is the right way to fix it? I took an initial look and... I'm a bit stumped. I don't fully understand where this is running or how. Wha

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2198593774 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

Re: [I] Bug: `make_date(year, month, day)` reports error if one of the fileds is NULL [datafusion]

2025-07-11 Thread via GitHub
Omega359 commented on issue #16746: URL: https://github.com/apache/datafusion/issues/16746#issuecomment-3062326009 Interesting. I would expect a db to error for that, unlike something like spark which I would expect to be lenient (if not in ansi/safe mode). I believe it should be a fairly e

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2200734851 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-11 Thread via GitHub
Loaki07 commented on PR #16726: URL: https://github.com/apache/datafusion/pull/16726#issuecomment-3062153659 Looks like the ci is unable to install `sudo apt-get install -y protobuf-compiler` -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2200796876 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

[PR] minor: Refactor arithmetic serde into separate classes [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new pull request, #2018: URL: https://github.com/apache/datafusion-comet/pull/2018 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[I] [EPIC] Refactor all expression serde logic out of `QueryPlanSerde` [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new issue, #2019: URL: https://github.com/apache/datafusion-comet/issues/2019 ### What is the problem the feature request solves? The `QueryPlanSerde.exprToProtoInternal` method contains logic for serializing Spark expressions to protocol buffer format and also cont

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062503974 @XiangpengHao: If you believe the round-trip bug reproduced in `test_round_trip_tpch_queries` from PR #16742 is distinct, we can file a separate issue and tackle it independently. @

Re: [PR] Snowflake Reserved SQL Keywords as Implicit Table Alias [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio merged PR #1934: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Added unquoted identifiers unicode support for mySql, postgreSqp, als… [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio commented on code in PR #1933: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1933#discussion_r2200438892 ## tests/sqlparser_common.rs: ## @@ -15895,3 +15895,11 @@ fn parse_create_procedure_with_parameter_modes() { _ => unreachable!(), } } + +

Re: [PR] Support for T-SQL [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio commented on code in PR #1937: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1937#discussion_r2200209520 ## src/parser/mod.rs: ## @@ -222,13 +222,17 @@ pub struct ParserOptions { /// Controls how literal values are unescaped. See /// [`Tokenizer::

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062022168 Thanks @XiangpengHao for the fix. Could you also run the tests [in this PR](https://github.com/apache/datafusion/pull/16742)? The deserialization bug only happens to 1 tpc-h queries

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3062011181 > Just out of curiosity, do you know if the issue is that an specific node can't serialize? Some info and fix: - https://github.com/apache/datafusion/issues/16665#issuecomm

Re: [PR] Support optional semicolon between statements [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio merged PR #1937: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] minor: Refactor arithmetic serde into separate classes [datafusion-comet]

2025-07-11 Thread via GitHub
codecov-commenter commented on PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#issuecomment-3062533657 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2018?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-11 Thread via GitHub
colinmarc opened a new pull request, #16750: URL: https://github.com/apache/datafusion/pull/16750 Currently, only instances of `TableProvider` are considered by `LogicalExtensionCodec`, and are automatically wrapped in a `DefaultTableSource` when deserializing. That doesn't work with custom

Re: [I] Optimize performance of `ByteViewGroupValueBuilder` on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
zhuqi-lucas commented on issue #16330: URL: https://github.com/apache/datafusion/issues/16330#issuecomment-3062751769 @Dandandan @Rachelint I submit a PR try to experiment this to see the performance gain or loss. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Benchmark for char expression [datafusion]

2025-07-11 Thread via GitHub
comphead commented on PR #16743: URL: https://github.com/apache/datafusion/pull/16743#issuecomment-3062760181 I changed PR header to `related` instead of `closed` as the #16009 expects both bench and optimization implementation which I believe is another your PR https://github.com/apache/da

[I] Implement unit tests for serde logic [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new issue, #2020: URL: https://github.com/apache/datafusion-comet/issues/2020 ### What is the problem the feature request solves? We currently rely on end-to-end integration tests to ensure that expressions are serialized correctly. This has generally been ok, but w

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2200966491 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_

[PR] Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
zhuqi-lucas opened a new pull request, #16751: URL: https://github.com/apache/datafusion/pull/16751 ## Which issue does this PR close? Optimize following cases, and add more fast path. do_append_val_inner do_equal_to_inner This is wasteful if there is no data buffe

[PR] Snowflake create database [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
osipovartem opened a new pull request, #1939: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1939 https://docs.snowflake.com/en/sql-reference/sql/create-database Added support for ```sql CREATE [ OR REPLACE ] [ TRANSIENT ] DATABASE [ IF NOT EXISTS ] [ CLONE

Re: [PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-11 Thread via GitHub
ozankabak commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3062949373 Thanks for taking a look at this. A cursory look suggests when a strict inequality is being propagated, if the next value of other side's lower bound is greater than the uppe

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-11 Thread via GitHub
colinmarc commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3064059046 I explored the solution space a bit today, and I don't think this problem is really solvable with the APIs as they currently exist. Just to be clear about what is

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
rishvin commented on PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#issuecomment-3064076095 > @rishvin, This PR is still a draft, but could you review the changes to the `remainder` code to ensure I didn't miss anything from your changes? Thanks @andygrove for th

Re: [I] [EPIC] A collection of items to improve developer / CI speed [datafusion]

2025-07-11 Thread via GitHub
blaginin commented on issue #13813: URL: https://github.com/apache/datafusion/issues/13813#issuecomment-3064077568 Added cache to CI runners to get some speedup: https://github.com/apache/datafusion/pull/16709 Also pinged the infra team if we can get larger runners: https://issues.ap

[PR] ensure MemTable has at least one partition [datafusion]

2025-07-11 Thread via GitHub
waynexia opened a new pull request, #16754: URL: https://github.com/apache/datafusion/pull/16754 ## Which issue does this PR close? - Related to https://github.com/datafusion-contrib/datafusion-postgres/pull/108. ## Rationale for this change When creating

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-11 Thread via GitHub
timsaucer commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3064351821 I agree with your assessment. I am starting to think your original suggestion was the correct one. I'm sorry I took a detour in the above approach. I think

Re: [PR] chore: Make `GroupValues` and APIs on `PhysicalGroupBy` aggregation APIs public [datafusion]

2025-07-11 Thread via GitHub
haohuaijin commented on code in PR #16733: URL: https://github.com/apache/datafusion/pull/16733#discussion_r2196439508 ## datafusion/physical-plan/src/aggregates/group_values/mod.rs: ## @@ -121,13 +121,15 @@ pub(crate) trait GroupValues: Send { /// will be chosen. ///

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on code in PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#discussion_r2202060380 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -0,0 +1,293 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on code in PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#discussion_r2202060728 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -0,0 +1,293 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3062716781 I checked #16744 with this test, and confirm that most tests still fails. A closer look at this show that it's mostly due to the field "human_display", the deserialized on

Re: [I] Add support for StringDecode in Spark 4.0.0 [datafusion-comet]

2025-07-11 Thread via GitHub
peter-toth commented on issue #1942: URL: https://github.com/apache/datafusion-comet/issues/1942#issuecomment-3062725854 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062727458 > > @adriangb can you take a look if this is the right way to fix it? > > I took an initial look and... I'm a bit stumped. I don't fully understand where this is running o

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062721623 > @XiangpengHao: If you believe the round-trip bug reproduced in `test_round_trip_tpch_queries` from PR #16742 is distinct, we can file a separate issue and tackle it independen

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
Dandandan commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063067577 Besides profiling, I would like to suggest to research how the other engines are running the join and extract some high level learnings out of it: * is it using a hash t

[I] Replace configure_me with maintained alternative [datafusion-ballista]

2025-07-11 Thread via GitHub
milenkovicm opened a new issue, #1281: URL: https://github.com/apache/datafusion-ballista/issues/1281 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `configure_me` dependency does not look maintained, it is a blocker to updat

Re: [PR] docs: Add guide showing comparison between Comet and Gluten [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove merged PR #2012: URL: https://github.com/apache/datafusion-comet/pull/2012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] docs: Add guide showing comparison between Comet and Gluten [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on PR #2012: URL: https://github.com/apache/datafusion-comet/pull/2012#issuecomment-3063409220 Thanks for the review @kazuyukitanimura. I will go ahead and merge this as a starting point for this content. I am sure we will add more to it soon. -- This is an automated

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
ryanschneider commented on PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#issuecomment-3063425629 @iffyio I went with the new ParserState::ColumnDefinition idea mentioned here: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2200796876 I

Re: [I] Improve documentation publishing to avoid maintaining separate template files [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich closed issue #2016: Improve documentation publishing to avoid maintaining separate template files URL: https://github.com/apache/datafusion-comet/issues/2016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] chore: Improve process for generating dynamic content into documentation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich merged PR #2017: URL: https://github.com/apache/datafusion-comet/pull/2017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] minor: Refactor to move some shuffle-related logic from `QueryPlanSerde` to `CometExecRule` [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich merged PR #2015: URL: https://github.com/apache/datafusion-comet/pull/2015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-07-11 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3063537883 Should be able to open Comet's PR after https://github.com/apache/datafusion-comet/issues/1993 is closed. -- This is an automated message from the Apache Git Service. To

[I] Add support for Snowflake CREATE DATABASE [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
osipovartem opened a new issue, #1938: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1938 https://docs.snowflake.com/en/sql-reference/sql/create-database -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#issuecomment-3063567453 @rishvin, This PR is still a draft, but could you review the changes to the `remainder` code to ensure I didn't miss anything from your changes? -- This is an automated mess

  1   2   >