Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on PR #14812: URL: https://github.com/apache/datafusion/pull/14812#issuecomment-2686817387 Thanks @zjregee -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 merged PR #14812: URL: https://github.com/apache/datafusion/pull/14812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2686901486 Thanks @SemyonSinchenko for review, not too bad for my case, and it takes more time and it's expected for huge file generation. But my computer is 48GB memory, i assume lower mem

Re: [PR] Random test cleanups use Expr::value [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
iffyio merged PR #1749: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Parse ALTER TABLE AUTO_INCREMENT operation for MySQL [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
iffyio merged PR #1748: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r1972915911 ## src/ast/dml.rs: ## @@ -138,6 +143,30 @@ pub struct CreateTable { pub engine: Option, pub comment: Option, pub auto_increment_offset: O

[PR] Fix the null handling for to_char function [datafusion]

2025-02-26 Thread via GitHub
kosiew opened a new pull request, #14908: URL: https://github.com/apache/datafusion/pull/14908 ## Which issue does this PR close? - Closes #14884. ## Rationale for this change Currently, when passing a NULL value to to_char, it returns an empty string instead

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
SemyonSinchenko commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687145015 @zhuqi-lucas Did you try to increase the `batch_size` argument? It is designed to avoid OOMs but the small batch size can also reduce the generation speed. If your computer h

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r1973021692 ## tests/sqlparser_mssql.rs: ## @@ -1590,6 +1590,30 @@ fn parse_create_table_with_valid_options() { comment: None,

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
2010YOUY01 commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687143242 > > The data generation will take long time for big data. > > How bad is it? I can try to dig into the problem and try to improve it on the side of `falsa` (generation libra

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
2010YOUY01 commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687165121 Thank you for the benchmark, I've tested it locally and it's working well. I have several small suggestions: 1. Add document for this new join benchmark https://github.com/apac

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687169996 Thanks @2010YOUY01 @SemyonSinchenko for review , I tried again, it's not a problem for me now, and previously may due to my disk is not enough, i cleaned up some disk usage.

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on PR #14637: URL: https://github.com/apache/datafusion/pull/14637#issuecomment-2686820953 Separate refactoring PR: https://github.com/apache/datafusion/pull/14907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2686821819 @alamb @jayzhan211 @xudong963 Unfortunately https://github.com/apache/datafusion/pull/14824 did not fix the wildcard issue. I'm currently working on Sail's upcoming release,

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-26 Thread via GitHub
zjregee commented on code in PR #14865: URL: https://github.com/apache/datafusion/pull/14865#discussion_r1972787989 ## datafusion/functions/src/string/btrim.rs: ## @@ -19,20 +19,28 @@ use crate::string::common::*; use crate::utils::{make_scalar_function, utf8_to_str_type}; use

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2686936016 ``` query I SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col); 4 ``` Is there other query reproducible in main branch? The query given d

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2687033429 > If something built and rewrite by analyzer before, it will now fail If you have count wildcard and is rewritten by analyzer, it fails because we remove the count wild

[PR] minor: Update docs and error messages about what SQL dialects are supported [datafusion]

2025-02-26 Thread via GitHub
AdamGS opened a new pull request, #14893: URL: https://github.com/apache/datafusion/pull/14893 ## Which issue does this PR close? - Closes #14892. ## Rationale for this change ## What changes are included in this PR? Adding DuckDB and Databricks to the

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-26 Thread via GitHub
diegoreis42 closed pull request #14885: Implement builder style API for ParserOptions URL: https://github.com/apache/datafusion/pull/14885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Change in 46: `count_all()` expr_fn function now displayed as `count(1)` rather than `count(*)` [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14894: URL: https://github.com/apache/datafusion/issues/14894#issuecomment-2684997552 There is a proposed fix in https://github.com/apache/datafusion/pull/14824#discussion_r1971470451 Maybe someone can apply the fix and update the tests ? -- This is an aut

[I] Change in 46: `count_all()` expr_fn function now displayed as `count(1)` rather than `count(*)` [datafusion]

2025-02-26 Thread via GitHub
alamb opened a new issue, #14894: URL: https://github.com/apache/datafusion/issues/14894 This is fallout from https://github.com/apache/datafusion/pull/14824 There is a small change in behavior now from DataFusion 45 and DataFusion 46 There were several tests changed (see link

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 merged PR #14865: URL: https://github.com/apache/datafusion/pull/14865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on PR #14865: URL: https://github.com/apache/datafusion/pull/14865#issuecomment-2685144945 Thanks @zjregee -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14887: URL: https://github.com/apache/datafusion/pull/14887#issuecomment-2685448270 Thanks again @kosiew -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14813: URL: https://github.com/apache/datafusion/pull/14813#issuecomment-2685455359 > I will go ahead and merge this since it is a follow-up to a previously discussed (and reviewed) PR/feature. It would still be great to have some post-merge review on this when you ha

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2685451356 > It would be really great to write up a document (maybe a blog post) that explains this change and gives help for people upgrading. Specifically some examples of creating ParquetExec

Re: [I] minor: List of supported SQL dialects is out of date [datafusion]

2025-02-26 Thread via GitHub
alamb closed issue #14892: minor: List of supported SQL dialects is out of date URL: https://github.com/apache/datafusion/issues/14892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14823: URL: https://github.com/apache/datafusion/pull/14823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14877: URL: https://github.com/apache/datafusion/pull/14877#issuecomment-2685473497 Thanks again @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-2685472967 I think this is a nice first isuse for someone who wants to make the code nicer and it is currently tested and could be handled without deep datafusion knowledge It does r

[PR] Update regenerate sql dep, revert runner changes. [datafusion]

2025-02-26 Thread via GitHub
Omega359 opened a new pull request, #14901: URL: https://github.com/apache/datafusion/pull/14901 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/pull/14824 ## Rationale for this change The forked sqllogictest-rs dependency required for t

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2685457670 The tests are green again on main! https://github.com/apache/datafusion/actions/runs/13545248421/job/37855153112 -- This is an automated message from the Apache Git Service. To res

Re: [PR] chore: Attach Diagnostic to "function x does not exist" error [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14849: URL: https://github.com/apache/datafusion/pull/14849#issuecomment-2685460366 🚀 -- thanks again @onlyjackfrost and @eliaperantoni -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Attach `Diagnostic` to "function x does not exist" error [datafusion]

2025-02-26 Thread via GitHub
alamb closed issue #14430: Attach `Diagnostic` to "function x does not exist" error URL: https://github.com/apache/datafusion/issues/14430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore: Attach Diagnostic to "function x does not exist" error [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14849: URL: https://github.com/apache/datafusion/pull/14849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14823: URL: https://github.com/apache/datafusion/pull/14823#issuecomment-2685462913 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] minor: Update docs and error messages about what SQL dialects are supported [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14893: URL: https://github.com/apache/datafusion/pull/14893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] minor: Update docs and error messages about what SQL dialects are supported [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14893: URL: https://github.com/apache/datafusion/pull/14893#issuecomment-2685458851 Thanks again @AdamGS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-02-26 Thread via GitHub
TheBuilderJR commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2685460077 @Lordworms can you share your branch? I'm happy to take a look as well if you don't have the bandwidth. -- This is an automated message from the Apache Git Service. To re

Re: [PR] refactor: make SqlToRel::new derive the parser options from the context provider [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14822: URL: https://github.com/apache/datafusion/pull/14822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14887: URL: https://github.com/apache/datafusion/pull/14887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] refactor: make SqlToRel::new derive the parser options from the context provider [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14822: URL: https://github.com/apache/datafusion/pull/14822#issuecomment-2685466420 > @alamb Sure, I'll add some tests soon. Let's do it in a follow on PR to reduce the review queue -- This is an automated message from the Apache Git Service. To respond to the

[PR] Align SQL formatting and add all missing table options [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
benrsatori opened a new pull request, #1746: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1746 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14877: URL: https://github.com/apache/datafusion/pull/14877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Make SqlToRel respect parser options from ContextProvider [datafusion]

2025-02-26 Thread via GitHub
alamb closed issue #13700: Make SqlToRel respect parser options from ContextProvider URL: https://github.com/apache/datafusion/issues/13700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2685483396 > JSON bench will have a different schema for row and looks like datafusion(arrow-json) can't support this? E.g. type of subject for these two row are different. I think th

Re: [PR] Minor: Add Development Environment to Documentation Index [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14890: URL: https://github.com/apache/datafusion/pull/14890#issuecomment-2685486525 > Oh I should be more careful, thank you! No worries -- it is easy to fix. Maybe we should turn doc build warnings into errors (like we do for code 🤔 ) -- This is an automated

Re: [PR] Align SQL formatting and add all missing table options [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
benrsatori closed pull request #1746: Align SQL formatting and add all missing table options URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] minor: Update docs and error messages about what SQL dialects are supported [datafusion]

2025-02-26 Thread via GitHub
AdamGS commented on PR #14893: URL: https://github.com/apache/datafusion/pull/14893#issuecomment-2685493174 Thank you for the quick review and feedback! and not just for this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] minor: Update docs and error messages about what SQL dialects are supported [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14893: URL: https://github.com/apache/datafusion/pull/14893#issuecomment-2685498545 > Thank you for the quick review and feedback! and not just for this PR No problem -- your PRs are easy to review / merge (I think they are important and the code / comments are

[PR] Parse ALTER TABLE AUTO_INCREMENT operation for MySQL [datafusion-sqlparser-rs]

2025-02-26 Thread via GitHub
mvzink opened a new pull request, #1748: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1748 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Cancellation benchmark [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14818: URL: https://github.com/apache/datafusion/pull/14818#issuecomment-2685749538 Thanks again @carols10cents -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2685750238 @carols10cents has made a benchmark here: - https://github.com/apache/datafusion/pull/14818 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
SemyonSinchenko commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2685735020 > The data generation will take long time for big data. How bad is it? I can try to dig into the problem and try to improve it on the side of `falsa` (generation librar

Re: [PR] Cancellation benchmark [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14818: URL: https://github.com/apache/datafusion/pull/14818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: faster maven mirror [datafusion-comet]

2025-02-26 Thread via GitHub
comphead merged PR #1447: URL: https://github.com/apache/datafusion-comet/pull/1447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-26 Thread via GitHub
comphead commented on code in PR #14882: URL: https://github.com/apache/datafusion/pull/14882#discussion_r1972094393 ## datafusion/datasource/src/source.rs: ## @@ -35,7 +35,7 @@ use datafusion_physical_expr_common::sort_expr::LexOrdering; /// Common behaviors in Data Sources

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-02-26 Thread via GitHub
vbarua commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1971967557 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1327,19 +1327,37 @@ pub async fn from_read_rel( table_ref: TableReference, schema: DF

Re: [PR] Include struct name on FileScanConfig debug impl [datafusion]

2025-02-26 Thread via GitHub
comphead merged PR #14883: URL: https://github.com/apache/datafusion/pull/14883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1972157836 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requirement(&pare

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1972119757 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,57 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1972157836 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requirement(&pare

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1972157836 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requirement(&pare

Re: [PR] Feat: support array_compact function [datafusion-comet]

2025-02-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1321: URL: https://github.com/apache/datafusion-comet/pull/1321#discussion_r1972211143 ## native/core/src/execution/planner.rs: ## @@ -830,6 +830,25 @@ impl PhysicalPlanner { )); Ok(array_has_any_expr)

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-02-26 Thread via GitHub
westonpace commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1972236021 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-26 Thread via GitHub
ZENOTME commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2685969063 > Thanks for checking it out [@ZENOTME](https://github.com/ZENOTME) > > > JSON bench will have a different schema for row and looks like datafusion(arrow-json) can't suppo

[PR] Move HashJoin from `RawTable` to `HashTable` [datafusion]

2025-02-26 Thread via GitHub
Dandandan opened a new pull request, #14904: URL: https://github.com/apache/datafusion/pull/14904 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] Change tpch validation to use `exec_sql_on_tables` [datafusion-ray]

2025-02-26 Thread via GitHub
andygrove commented on code in PR #66: URL: https://github.com/apache/datafusion-ray/pull/66#discussion_r1972273649 ## src/util.rs: ## @@ -397,6 +402,52 @@ fn print_node(plan: &Arc, indent: usize, output: &mut String) } } +async fn exec_sql(query: String, tables: Vec<(S

Re: [I] NoSuchMethodError with Spark 3.5.3 (EMR 7.6) [datafusion-comet]

2025-02-26 Thread via GitHub
andygrove commented on issue #1451: URL: https://github.com/apache/datafusion-comet/issues/1451#issuecomment-2686057451 > I am getting the following error when running a job on EMR 7.6 with spark code also compiled with version 3.5.3 and using comet: 3.5_2.12-0.6.0 Did you compile Co

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2686065559 > There is a new test on main ([added 4 hours ago](https://github.com/apache/datafusion/blob/f5b7affecd90e9be26289d869c4a542359cb98e3/datafusion/core/tests/physical_optimizer/enforce_so

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2685167196 > Change in 46: `count_all()` expr_fn function now displayed as `count(1)` rather than `count(*)` #14894 Thansk! Note I did file - https://github.com/apache/datafusion/issues/

Re: [PR] feat: `tree` / pretty explain [datafusion]

2025-02-26 Thread via GitHub
irenjj commented on code in PR #14677: URL: https://github.com/apache/datafusion/pull/14677#discussion_r1971677974 ## datafusion/physical-plan/src/render_tree.rs: ## @@ -0,0 +1,158 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2685178672 That is all the more reason to perhaps defer to the spark function PR / focus on that -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2685178514 That is all the more reason to perhaps defer to the spark function PR / focus on that -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
Dandandan commented on code in PR #14888: URL: https://github.com/apache/datafusion/pull/14888#discussion_r1971695362 ## datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs: ## @@ -122,14 +123,21 @@ impl SimplifyExpressions { // Preserve expression names t

Re: [I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14481: URL: https://github.com/apache/datafusion/issues/14481#issuecomment-2685206026 > Would love to help on this issue. It can probably be generalized in some way but I'm open to any thoughts you have. Thank you @AdamGS that would be amazing

[PR] Rename `DataSource` and `FileSource` fields for consistency [datafusion]

2025-02-26 Thread via GitHub
alamb opened a new pull request, #14898: URL: https://github.com/apache/datafusion/pull/14898 ## Which issue does this PR close? - Related to https://github.com/delta-io/delta-rs/pull/3261 - Related to https://github.com/apache/datafusion/issues/14123 - Follow on to https://githu

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
alamb merged PR #14824: URL: https://github.com/apache/datafusion/pull/14824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2684989262 Let's get the tests clean -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Extended sqllite tests are failing on main [datafusion]

2025-02-26 Thread via GitHub
alamb closed issue #14853: Extended sqllite tests are failing on main URL: https://github.com/apache/datafusion/issues/14853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14824: URL: https://github.com/apache/datafusion/pull/14824#discussion_r1971606770 ## datafusion/optimizer/tests/optimizer_integration.rs: ## @@ -198,7 +198,7 @@ fn between_date32_plus_interval() -> Result<()> { WHERE col_date32 between '1998-

Re: [I] Regression since 45.0.0: `select count(), count(*)` does not work [datafusion]

2025-02-26 Thread via GitHub
alamb closed issue #14855: Regression since 45.0.0: `select count(), count(*)` does not work URL: https://github.com/apache/datafusion/issues/14855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14824: URL: https://github.com/apache/datafusion/pull/14824#discussion_r1971612657 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2455,7 +2455,7 @@ async fn test_count_wildcard_on_sort() -> Result<()> { let ctx = create_join_context()?;

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
joroKr21 commented on code in PR #14888: URL: https://github.com/apache/datafusion/pull/14888#discussion_r1971631661 ## datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs: ## @@ -122,14 +123,21 @@ impl SimplifyExpressions { // Preserve expression names to

[I] Remove duplicated alias in Sort [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 opened a new issue, #14895: URL: https://github.com/apache/datafusion/issues/14895 ### Describe the bug ``` query TT explain SELECT count(*) order by count(*); logical_plan 01)Projection: count(*) 02)--Sort: count(Int64(1)) AS count(*) AS count(*) ASC

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14888: URL: https://github.com/apache/datafusion/pull/14888#discussion_r1971620620 ## datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs: ## @@ -122,14 +123,21 @@ impl SimplifyExpressions { // Preserve expression names to av

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on code in PR #14865: URL: https://github.com/apache/datafusion/pull/14865#discussion_r1971659351 ## datafusion/functions/src/string/btrim.rs: ## @@ -19,20 +19,28 @@ use crate::string::common::*; use crate::utils::{make_scalar_function, utf8_to_str_type};

[I] minor: `SessionStateBuilder::with_default_features` ergonomics [datafusion]

2025-02-26 Thread via GitHub
milenkovicm opened a new issue, #14899: URL: https://github.com/apache/datafusion/issues/14899 ### Is your feature request related to a problem or challenge? Just a small note about ergonomics of `SessionStateBuilder`, as I spent some time to figure out why: ```rust let

Re: [PR] Rename `DataSource` and `FileSource` fields for consistency [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14898: URL: https://github.com/apache/datafusion/pull/14898#discussion_r1971829934 ## datafusion/datasource/src/source.rs: ## @@ -139,51 +142,50 @@ impl ExecutionPlan for DataSourceExec { partition: usize, context: Arc, ) ->

Re: [PR] minor: Update docs and error messages about what SQL dialects are supported [datafusion]

2025-02-26 Thread via GitHub
AdamGS commented on PR #14893: URL: https://github.com/apache/datafusion/pull/14893#issuecomment-2685408832 I think this is the sort of thing that is very easy in JVM-land with reflection, but I really don't know how to do it in rust. The only idea I have is to to add a `DialectRegistry`,

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2685435062 As I have been working through actually upgrading delta.rs with these changes it turns out it was more effort than expected (like everything in software) I have created a few PR

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14885: URL: https://github.com/apache/datafusion/pull/14885#issuecomment-2685446386 Thanks for understanding -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on PR #14887: URL: https://github.com/apache/datafusion/pull/14887#issuecomment-2685448073 Since this is a minor additive API I am just going to merge it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement builder style API for `ParserOptions` [datafusion]

2025-02-26 Thread via GitHub
alamb closed issue #14879: Implement builder style API for `ParserOptions` URL: https://github.com/apache/datafusion/issues/14879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Change tpch validation to use `exec_sql_on_tables` [datafusion-ray]

2025-02-26 Thread via GitHub
vmingchen opened a new pull request, #66: URL: https://github.com/apache/datafusion-ray/pull/66 Fixes https://github.com/apache/datafusion-ray/issues/65 `exec_sql_on_tables` is a util function added by this PR that uses DataFution without Ray to execute queries. This ensures the valid

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-26 Thread via GitHub
jayzhan211 commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2685102083 Thanks @alamb. I will file related issue as follow-up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-26 Thread via GitHub
ozankabak merged PR #14813: URL: https://github.com/apache/datafusion/pull/14813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-26 Thread via GitHub
ozankabak commented on PR #14813: URL: https://github.com/apache/datafusion/pull/14813#issuecomment-2685221570 I will go ahead and merge this since it is a follow-up to a previously discussed (and reviewed) PR/feature. It would still be great to have some post-merge review on this when you

Re: [PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-26 Thread via GitHub
alamb commented on code in PR #14888: URL: https://github.com/apache/datafusion/pull/14888#discussion_r1971708109 ## datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs: ## @@ -122,14 +123,21 @@ impl SimplifyExpressions { // Preserve expression names to av

[I] Statistics: Migrate to Distribution from Precision [datafusion]

2025-02-26 Thread via GitHub
ozankabak opened a new issue, #14896: URL: https://github.com/apache/datafusion/issues/14896 ### Is your feature request related to a problem or challenge? DataFusion doesn't have a great statistics infrastructure, which will be long project to fix. Luckily, we have begun the process

  1   2   3   >