Re: [PR] Chore: refactor DataSink traits to avoid duplication [datafusion]

2025-01-15 Thread via GitHub
berkaysynnada merged PR #14121: URL: https://github.com/apache/datafusion/pull/14121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] create view with multi union use the first union schema as the final view schema [datafusion]

2025-01-15 Thread via GitHub
Curricane opened a new issue, #14132: URL: https://github.com/apache/datafusion/issues/14132 ### Describe the bug create view with multi union use the first union schema as the final view schema. eg: use the latest datafusion-cli ```sql CREATE UNBOUNDED EXTERNAL TABLE test1 (

Re: [PR] WIP Upgrade to arrow-rs/parquet `54.0.0` [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #13663: URL: https://github.com/apache/datafusion/pull/13663#issuecomment-2592485890 > Do you still need help on this ? I can help to file a MR tmr to upgrade arrow, parquet & pyo3 for the project. Let me know @Owen-CH-Leung -- Yes, please I am still looking for

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14130: URL: https://github.com/apache/datafusion/pull/14130#issuecomment-2592492306 I thin you can fix the CI test by using `assert_contains!` rather than checking the exact error message -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] fix: encode should work with non-UTF-8 binaries [datafusion]

2025-01-15 Thread via GitHub
Omega359 commented on PR #14087: URL: https://github.com/apache/datafusion/pull/14087#issuecomment-2592917242 LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
Curricane opened a new pull request, #14133: URL: https://github.com/apache/datafusion/pull/14133 ## Which issue does this PR close? https://github.com/apache/datafusion/issues/14132 Closes #. ## Rationale for this change ## What changes are included in thi

Re: [PR] Feat: Support array_join [datafusion-comet]

2025-01-15 Thread via GitHub
SteNicholas commented on code in PR #1290: URL: https://github.com/apache/datafusion-comet/pull/1290#discussion_r1916191228 ## native/proto/src/proto/expr.proto: ## @@ -86,6 +86,7 @@ message Expr { ArrayInsert array_insert = 59; BinaryExpr array_contains = 60; Bin

Re: [I] Extension Types [datafusion]

2025-01-15 Thread via GitHub
tobixdev commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2592666978 I second @paleolimbot here, as I am also interested in this topic. First of all, thank you for all your efforts in this area! To provide some context, we are currently

Re: [PR] Add support for Snowflake column aliases that use SQL keywords [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
yoavcloud commented on PR #1632: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1632#issuecomment-2592830492 @iffyio in this iteration I re-used the logic but also aligned parsing of table aliases to column aliases, enabling customization by dialects. LMKWYT -- This is an au

[PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen opened a new pull request, #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657 In BigQuery, it's possible to have statements like these (added tests for them): ```sql CREATE TABLE foo (x INT64) OPTIONS() ``` ```sql CREATE TABLE db.schem

Re: [PR] feat: Add HasRowIdMapping interface [datafusion-comet]

2025-01-15 Thread via GitHub
viirya merged PR #1288: URL: https://github.com/apache/datafusion-comet/pull/1288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: Add HasRowIdMapping interface [datafusion-comet]

2025-01-15 Thread via GitHub
viirya commented on PR #1288: URL: https://github.com/apache/datafusion-comet/pull/1288#issuecomment-2591897788 Thanks @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] doc-gen: make user_doc to work with predefined consts [datafusion]

2025-01-15 Thread via GitHub
ding-young commented on code in PR #14086: URL: https://github.com/apache/datafusion/pull/14086#discussion_r1916385694 ## datafusion/macros/src/user_doc.rs: ## @@ -24,11 +25,12 @@ use syn::{parse_macro_input, DeriveInput, LitStr}; /// from it by constructing a `DocumentBuilder(

Re: [PR] doc-gen: make user_doc to work with predefined consts [datafusion]

2025-01-15 Thread via GitHub
ding-young commented on PR #14086: URL: https://github.com/apache/datafusion/pull/14086#issuecomment-2592453533 @comphead Thank you for review. Please tell me if there is any suggestion about the documentation or else. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] doc-gen: make user_doc to work with predefined consts [datafusion]

2025-01-15 Thread via GitHub
ding-young commented on code in PR #14086: URL: https://github.com/apache/datafusion/pull/14086#discussion_r1916385249 ## datafusion/macros/src/user_doc.rs: ## @@ -192,13 +186,18 @@ pub fn user_doc(args: TokenStream, input: TokenStream) -> TokenStream { let input = parse_m

Re: [PR] fix: API build problem due to missing dependency [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1159: URL: https://github.com/apache/datafusion-ballista/pull/1159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Fix combine with session config [datafusion]

2025-01-15 Thread via GitHub
jayzhan211 merged PR #14139: URL: https://github.com/apache/datafusion/pull/14139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
Curricane commented on PR #14133: URL: https://github.com/apache/datafusion/pull/14133#issuecomment-2594286581 > Hi @Curricane -- thank you for the fix 🙏 > > Can you please add some tests, ideally in sqllogicteset format to ensure this behavior is not broken in the future. > >

[PR] Test: Validate memory limit for sort queries [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 opened a new pull request, #14142: URL: https://github.com/apache/datafusion/pull/14142 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/13431 ## Rationale for this change Datafusion supports memory-limited queries: it's im

[PR] chore: Add array types to fuzz testing utility [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove opened a new pull request, #1292: URL: https://github.com/apache/datafusion-comet/pull/1292 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] doc: add new logo to readme [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1161: URL: https://github.com/apache/datafusion-ballista/pull/1161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] Comet parquet exec merge from main(20250114) [datafusion-comet]

2025-01-15 Thread via GitHub
parthchandra opened a new pull request, #1293: URL: https://github.com/apache/datafusion-comet/pull/1293 Brings comet-parquet-exec almost up to date with main There are three new test failures which will be addressed in subsequent PRs - ``` - Broadcast HashJoin without join filter

Re: [PR] chore: replace print statements with logs [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1162: URL: https://github.com/apache/datafusion-ballista/pull/1162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] [DO NOT MERGE] Merge latest from main into comet-parquet-exec branch [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove closed pull request #1291: [DO NOT MERGE] Merge latest from main into comet-parquet-exec branch URL: https://github.com/apache/datafusion-comet/pull/1291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
xudong963 merged PR #14130: URL: https://github.com/apache/datafusion/pull/14130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
xudong963 commented on PR #14130: URL: https://github.com/apache/datafusion/pull/14130#issuecomment-2594170021 thanks @alamb @jonahgao —ill merge it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Minor: move resolve_overlap a method on `OrderingEquivalenceClass` [datafusion]

2025-01-15 Thread via GitHub
jayzhan211 merged PR #14138: URL: https://github.com/apache/datafusion/pull/14138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] update release scripts to release to Apache DataFusion not Apache Arrow [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1163: URL: https://github.com/apache/datafusion-ballista/pull/1163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[I] Error when use `user` field in where caluse [datafusion]

2025-01-15 Thread via GitHub
haohuaijin opened a new issue, #14141: URL: https://github.com/apache/datafusion/issues/14141 ### Describe the bug ``` DataFusion CLI v44.0.0 > create table t(a int, b int, user text) as values (1,2,'test'), (2,3,null); 0 row(s) fetched. Elapsed 0.051 seconds. > sele

Re: [I] Extension Types [datafusion]

2025-01-15 Thread via GitHub
jayzhan-synnada commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2594223771 The current status is that we have several changes in branch `logical-types` for #12622. Where the `Scalar` is introduced and the next step is to complete the tasks left

Re: [I] Extension Types [datafusion]

2025-01-15 Thread via GitHub
jayzhan211 commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2594225106 The current status is that we have several changes in branch `logical-types` for #12622. Where the `Scalar` is introduced and the next step is to complete the tasks left in #

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-15 Thread via GitHub
Chen-Yuan-Lai commented on PR #13919: URL: https://github.com/apache/datafusion/pull/13919#issuecomment-2594262550 @alamb @comphead I think this PR is ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #13919: URL: https://github.com/apache/datafusion/pull/13919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
jonahgao merged PR #14133: URL: https://github.com/apache/datafusion/pull/14133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Comet parquet exec merge from main(20250114) [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove commented on PR #1293: URL: https://github.com/apache/datafusion-comet/pull/1293#issuecomment-2594155855 TPC-H times: comet_native 327.14 comet_datafusion 341.61 comet_iceberg_compat 297.71 :fire: (first sub-300 timing I have seen) Our published time for 0.5.0 i

Re: [I] Improve Aggregate with Limit [datafusion]

2025-01-15 Thread via GitHub
ctsk commented on issue #13729: URL: https://github.com/apache/datafusion/issues/13729#issuecomment-2594124886 I checked out your second point and was able to replicate the difference in speed between the two queries. I don't yet see how a TopK operator would help the non-ordered query.

[I] Add tests for PR #14133 [datafusion]

2025-01-15 Thread via GitHub
jonahgao opened a new issue, #14140: URL: https://github.com/apache/datafusion/issues/14140 We add some tests for PR #14133 > Hi @Curricane -- thank you for the fix 🙏 > > Can you please add some tests, ideally in sqllogicteset format to ensure this behavior is not broken in th

Re: [PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on PR #14133: URL: https://github.com/apache/datafusion/pull/14133#issuecomment-2594345585 @Curricane Filed a todo issue #14140 for adding tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] create view with multi union use the first union schema as the final view schema [datafusion]

2025-01-15 Thread via GitHub
jonahgao closed issue #14132: create view with multi union use the first union schema as the final view schema URL: https://github.com/apache/datafusion/issues/14132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] update release scripts to release to Apache DataFusion not Apache Arrow [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove opened a new pull request, #1163: URL: https://github.com/apache/datafusion-ballista/pull/1163 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing cha

Re: [PR] flamegraph [datafusion]

2025-01-15 Thread via GitHub
github-actions[bot] commented on PR #13455: URL: https://github.com/apache/datafusion/pull/13455#issuecomment-2594306893 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: add `alias()` method for DataFrame [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on PR #14127: URL: https://github.com/apache/datafusion/pull/14127#issuecomment-2594318289 Thanks @comphead @alamb for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] A memory-limited sort query fails [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 commented on issue #14143: URL: https://github.com/apache/datafusion/issues/14143#issuecomment-2594537073 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Test: Validate memory limit for sort queries [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 commented on code in PR #14142: URL: https://github.com/apache/datafusion/pull/14142#discussion_r1917754162 ## datafusion/core/tests/memory_limit/memory_limit_validation/utils.rs: ## @@ -0,0 +1,192 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[I] A memory-limited sort query fails [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 opened a new issue, #14143: URL: https://github.com/apache/datafusion/issues/14143 ### Describe the bug Compile and run datafusion-cli ``` cargo run -- --mem-pool-type fair -m 80M -c 'select c1, c1 as c2 from generate_series(1,1000) as t1(c1) order by c2 DESC, c1

Re: [I] SQL/PGQ or even GQL support [datafusion]

2025-01-15 Thread via GitHub
gsvgit commented on issue #13545: URL: https://github.com/apache/datafusion/issues/13545#issuecomment-2594583174 My presentation on the topic for DataFusion community meeting: [SemyonGrigorev_DataFusion_PGQ.pdf](https://github.com/user-attachments/files/18433964/SemyonGrigorev_DataFusion_PGQ

Re: [PR] Simplify `collect_left_input` function in hash join [datafusion]

2025-01-15 Thread via GitHub
lewiszlw closed pull request #14148: Simplify `collect_left_input` function in hash join URL: https://github.com/apache/datafusion/pull/14148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Simplify `collect_left_input` function in hash join [datafusion]

2025-01-15 Thread via GitHub
lewiszlw commented on PR #14148: URL: https://github.com/apache/datafusion/pull/14148#issuecomment-2594664241 Emm... we have tests that create and run HashJoinExec directly, not optimized by PhysicalOptimizer. Closing. -- This is an automated message from the Apache Git Service. To re

Re: [PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on code in PR #14145: URL: https://github.com/apache/datafusion/pull/14145#discussion_r1917839619 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1112,6 +1113,14 @@ pub(crate) fn need_produce_result_in_final(join_type: JoinType) -> bool { ) } +

Re: [PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
lewiszlw commented on code in PR #14145: URL: https://github.com/apache/datafusion/pull/14145#discussion_r1917842401 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1112,6 +1113,14 @@ pub(crate) fn need_produce_result_in_final(join_type: JoinType) -> bool { ) } +

Re: [I] Error when joining dataframes with duplicate column names if dataframes generated from file [datafusion]

2025-01-15 Thread via GitHub
chenkovsky commented on issue #14147: URL: https://github.com/apache/datafusion/issues/14147#issuecomment-2594725070 ``` x1.write_csv("df1.csv", with_header=True) x2.write_csv("df2.csv", with_header=True) ``` -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-15 Thread via GitHub
berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2594728046 > Looks like I got hit by some new ColumnStatistics tests on main. Should be fixed now 🤞 > > @berkaysynnada can you expand on the rationale for the V2 stats? I understan

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917882700 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917893541 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

[I] Can `ctx.register_json` directly register and use a standard `JSON` array? `NdJson` is insufficient for handling general data. [datafusion]

2025-01-15 Thread via GitHub
shencangsheng opened a new issue, #14146: URL: https://github.com/apache/datafusion/issues/14146 ### Is your feature request related to a problem or challenge? I hope to directly use standard JSON, as most developers prefer this data format. ```json [ { "use

[I] Error when joining dataframes with duplicate column names if dataframes generated from file [datafusion]

2025-01-15 Thread via GitHub
fullstart opened a new issue, #14147: URL: https://github.com/apache/datafusion/issues/14147 ### Describe the bug Encountered an issue joining dataframes with duplicate column names if they generated from file read (I tried csv and parquet). Dataframes produced from python dict do

[I] Add a hint about expected extension in error message in register_csv, register_parquet, register_json [datafusion]

2025-01-15 Thread via GitHub
cj-zhukov opened a new issue, #14144: URL: https://github.com/apache/datafusion/issues/14144 ### Describe the bug When attempting to register an existing file with different format using the register_csv, register_json, register_parquet, instead of receiving an error, the operation s

Re: [PR] adding RowrsReader and writer [datafusion]

2025-01-15 Thread via GitHub
Lordworms commented on PR #14149: URL: https://github.com/apache/datafusion/pull/14149#issuecomment-2594663505 I got two following PR for implement SortPreservingMergeStream in Row format and change the logics in SortExec -- This is an automated message from the Apache Git Service. To res

[PR] adding RowrsReader and writer [datafusion]

2025-01-15 Thread via GitHub
Lordworms opened a new pull request, #14149: URL: https://github.com/apache/datafusion/pull/14149 ## Which issue does this PR close? part of #7053 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these c

Re: [PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
jonahgao merged PR #14145: URL: https://github.com/apache/datafusion/pull/14145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917876805 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [I] Use `NullBufferBuilder` instead of `BooleanBufferBuilder` for creating Null masks [datafusion]

2025-01-15 Thread via GitHub
Chen-Yuan-Lai commented on issue #14115: URL: https://github.com/apache/datafusion/issues/14115#issuecomment-2594710273 Thanks @alamb for pointing out the detail. I will take it into consideration. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] parquet RowGroup pruning for `Dictionary(Decimal)` type incorrect [datafusion]

2025-01-15 Thread via GitHub
korowa commented on issue #13821: URL: https://github.com/apache/datafusion/issues/13821#issuecomment-2594590175 @kosiew the root cause of the issue is how arrow writer handles data for `Dictionary(Decimal)`, and I suppose it'll mostly be fixed by https://github.com/apache/arrow-rs/pull/698

Re: [I] Add a hint about expected extension in error message in register_csv, register_parquet, register_json [datafusion]

2025-01-15 Thread via GitHub
cj-zhukov commented on issue #14144: URL: https://github.com/apache/datafusion/issues/14144#issuecomment-2594596672 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
lewiszlw opened a new pull request, #14145: URL: https://github.com/apache/datafusion/pull/14145 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? We have two same functions `get_

[PR] Simplify `collect_left_input` function in hash join [datafusion]

2025-01-15 Thread via GitHub
lewiszlw opened a new pull request, #14148: URL: https://github.com/apache/datafusion/pull/14148 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917904556 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917906505 ## tests/sqlparser_bigquery.rs: ## @@ -2244,3 +2244,15 @@ fn test_any_type() { fn test_any_type_dont_break_custom_type() { bigquery_and_gene

[PR] POC: Use IndexSet rather than `Vec` for OrderingEquivalenceClass [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new pull request, #14136: URL: https://github.com/apache/datafusion/pull/14136 WIP as I am still in the process of - [ ] I need to fix `resolve_overlap` - [ ] Test timing ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issu

[PR] Update datafusion-testing git hash [datafusion]

2025-01-15 Thread via GitHub
Omega359 opened a new pull request, #14137: URL: https://github.com/apache/datafusion/pull/14137 ## Which issue does this PR close? Closes #. ## Rationale for this change Update datafusion-testing to latest hash to fix test issues with extended github action work

[PR] Minor: move resolve_overlap a method on `OrderingEquivalenceClass` [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new pull request, #14138: URL: https://github.com/apache/datafusion/pull/14138 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change I am testing out a change to the internal representati

Re: [PR] Minor: move resolve_overlap a method on `OrderingEquivalenceClass` [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14138: URL: https://github.com/apache/datafusion/pull/14138#discussion_r1917264140 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -156,6 +156,26 @@ impl OrderingEquivalenceClass { } } +/// Trims `orderings[idx]`

[PR] Fix combine with session config [datafusion]

2025-01-15 Thread via GitHub
XiangpengHao opened a new pull request, #14139: URL: https://github.com/apache/datafusion/pull/14139 ## Which issue does this PR close? Closes #. ## Rationale for this change Current `default_from_session_config` take no effect, this pr fix it. It also added a `#[mus

Re: [PR] chore: extract math_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-15 Thread via GitHub
rluvaton commented on PR #1219: URL: https://github.com/apache/datafusion-comet/pull/1219#issuecomment-2593045630 @andygrove Yes, would you mind first merging https://github.com/apache/datafusion-comet/pull/1223 so if I have more conflicts I can resolve them all at once -- This is an au

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on code in PR #14130: URL: https://github.com/apache/datafusion/pull/14130#discussion_r1916749598 ## datafusion/optimizer/tests/optimizer_integration.rs: ## @@ -387,6 +389,30 @@ fn select_correlated_predicate_subquery_with_uppercase_ident() { assert_eq!(

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-15 Thread via GitHub
gatesn commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2592991088 Looks like I got hit by some new ColumnStatistics tests on main. Should be fixed now 🤞 @berkaysynnada can you expand on the rationale for the V2 stats? I understand that it's

Re: [PR] Propagate table constraints through physical plans to optimize sort operations [datafusion]

2025-01-15 Thread via GitHub
gokselk commented on code in PR #14111: URL: https://github.com/apache/datafusion/pull/14111#discussion_r1916731975 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -210,30 +221,25 @@ impl FileScanConfig { self } -/// Project the sch

[I] Discuss: Check in Cargo.lock file? [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new issue, #14135: URL: https://github.com/apache/datafusion/issues/14135 ### Is your feature request related to a problem or challenge? Broken out of a discussion on a PR here: - https://github.com/apache/datafusion/pull/14071#discussion_r1910286465 As describ

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14071: URL: https://github.com/apache/datafusion/pull/14071#discussion_r1917019302 ## README.md: ## @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according to

Re: [PR] Propagate table constraints through physical plans to optimize sort operations [datafusion]

2025-01-15 Thread via GitHub
gokselk commented on code in PR #14111: URL: https://github.com/apache/datafusion/pull/14111#discussion_r1916731975 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -210,30 +221,25 @@ impl FileScanConfig { self } -/// Project the sch

Re: [PR] Propagate table constraints through physical plans to optimize sort operations [datafusion]

2025-01-15 Thread via GitHub
gokselk commented on code in PR #14111: URL: https://github.com/apache/datafusion/pull/14111#discussion_r1916736891 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -548,6 +570,88 @@ impl EquivalenceProperties { true } +/// Checks if the sort

Re: [PR] feat: metadata columns [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2593033656 > Can these metadata columns utilize normal column properties, like ordering equivalences, constantness, distinctness etc.? For example, AFAIU rowid is an ordered column, and if I sort

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #14071: URL: https://github.com/apache/datafusion/pull/14071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2593391114 I will update the datafusion-cli cargo file as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] feat: Supporting `SAMPLE` parsing [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
seve-martinez closed pull request #1566: feat: Supporting `SAMPLE` parsing URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Supporting `SAMPLE` parsing [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
seve-martinez commented on PR #1566: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1566#issuecomment-2593395062 > I did not notice this PR when I started working on parsing the SAMPLE option, but since then it was merged: #1580 > > I can share that we used to parse the

Re: [I] create view with multi union use the first union schema as the final view schema [datafusion]

2025-01-15 Thread via GitHub
matthewmturner commented on issue #14132: URL: https://github.com/apache/datafusion/issues/14132#issuecomment-2593195210 @xudong963 this sounds similar to the issue we saw? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Implemented `simplify` for the `starts_with` function to convert it into a LIKE expression. [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14119: URL: https://github.com/apache/datafusion/pull/14119#discussion_r1916850331 ## datafusion/functions/src/string/starts_with.rs: ## @@ -98,6 +99,27 @@ impl ScalarUDFImpl for StartsWithFunc { } } +fn simplify( +&self

Re: [PR] Add sqlite sqllogictest run to extended.yml [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14101: URL: https://github.com/apache/datafusion/pull/14101#issuecomment-2593141596 Let's merge this in and get some experience with running the extended suite on main. Thank you @Omega359 and @comphead -- This is an automated message from the Apache Git Service.

Re: [PR] Add sqlite sqllogictest run to extended.yml [datafusion]

2025-01-15 Thread via GitHub
alamb merged PR #14101: URL: https://github.com/apache/datafusion/pull/14101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: fix duplicated SharedBitmapBuilder definitions [datafusion]

2025-01-15 Thread via GitHub
alamb merged PR #14122: URL: https://github.com/apache/datafusion/pull/14122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Automatically run sqlitetests regularly (but not with all PRs) to DataFusion [datafusion]

2025-01-15 Thread via GitHub
alamb closed issue #13967: Automatically run sqlitetests regularly (but not with all PRs) to DataFusion URL: https://github.com/apache/datafusion/issues/13967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Minor: fix duplicated SharedBitmapBuilder definitions [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14122: URL: https://github.com/apache/datafusion/pull/14122#issuecomment-2593146891 Thanks @jonahgao and @lewiszlw ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Async User Defined Functions (UDF) [datafusion]

2025-01-15 Thread via GitHub
adriangb commented on issue #6518: URL: https://github.com/apache/datafusion/issues/6518#issuecomment-2593297771 I just came across this use case today and am very interested, it would be amazing if DataFusion just had https://github.com/apache/datafusion/issues/6518#issuecomment-2585270509

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
mnpw commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2593062436 @alamb Here's my understanding of the issue: - `datafusion` crate already depends on `datafusion-physical-optimizer`. - SanityChecker's tests depend on `datafusion::dat

[PR] Remove dependency on physical-optimizer on functions-aggregates [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new pull request, #14134: URL: https://github.com/apache/datafusion/pull/14134 Draft it is based on https://github.com/apache/datafusion/pull/14083 from @mnpw ## Which issue does this PR close? This is a follow up to - https://github.com/apache/datafusion/pul

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14083: URL: https://github.com/apache/datafusion/pull/14083#discussion_r1916965668 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -36,10 +36,14 @@ recursive_protection = ["dep:recursive"] [dependencies] arrow = { workspace = true } +arrow-s

Re: [PR] feat: add `alias()` method for DataFrame [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #14127: URL: https://github.com/apache/datafusion/pull/14127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Can no longer easily join duplicate schemas as of version 43 [datafusion]

2025-01-15 Thread via GitHub
comphead closed issue #14112: Can no longer easily join duplicate schemas as of version 43 URL: https://github.com/apache/datafusion/issues/14112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] doc-gen: make user_doc to work with predefined consts [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #14086: URL: https://github.com/apache/datafusion/pull/14086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

  1   2   >