Re: [PR] Find a way to communicate the ordering of a file back with the existi… [datafusion]

2024-12-30 Thread via GitHub
zhuqi-lucas commented on PR #13933: URL: https://github.com/apache/datafusion/pull/13933#issuecomment-2566119945 Thank you @alamb for clarify, let me try to change the PR based our discussion conclusion. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Consolidate example to_date.rs into dateframe.rs [datafusion]

2024-12-30 Thread via GitHub
alamb commented on code in PR #13939: URL: https://github.com/apache/datafusion/pull/13939#discussion_r1899450386 ## datafusion-examples/examples/dataframe.rs: ## @@ -206,3 +215,38 @@ async fn write_out(ctx: &SessionContext) -> std::result::Result<(), DataFusionEr Ok(())

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13931: URL: https://github.com/apache/datafusion/pull/13931#issuecomment-2565305163 This PR is looking nice -- @vbarua and @Blizzara let me know when it is ready for a final review. -- This is an automated message from the Apache Git Service. To respond to the me

[I] @comphead I found that `use_doc()` only supports one alternative syntax. Should we modify it to support multiple alternative syntaxes? [datafusion]

2024-12-30 Thread via GitHub
alamb opened a new issue, #13944: URL: https://github.com/apache/datafusion/issues/13944 @comphead I found that `use_doc()` only supports one alternative syntax. Should we modify it to support multiple alternative syntaxes? _Originally posted by @Chen-Yuan-Lai in http

Re: [PR] feat: add `AsyncCatalogProvider` helpers for asynchronous catalogs [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13800: URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2565306691 Just checking in on this PR -- as I understand it the remaining item is to update the async catalog example to use the new structures @westonpace do you plan to do so?

Re: [PR] Find a way to communicate the ordering of a file back with the existi… [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13933: URL: https://github.com/apache/datafusion/pull/13933#issuecomment-2565321646 Hi @zhuqi-lucas -- sorry if we caused confusion here. I agree with @berkaysynnada and @ozankabak that ordering information is already represented in plans using [`EquivalenceProper

Re: [I] Find a way to communicate the ordering of a file back with the existing listing table implementation [datafusion]

2024-12-30 Thread via GitHub
alamb commented on issue #13891: URL: https://github.com/apache/datafusion/issues/13891#issuecomment-2565322252 Copying from https://github.com/apache/datafusion/pull/13933#issuecomment-2565321646: So this would look something like: 1. `COPY (SELECT ... ORDER BY x, y) to 'foo.parqu

Re: [PR] doc-gen: migrate scalar functions (string) documentation 1/4 [datafusion]

2024-12-30 Thread via GitHub
alamb commented on code in PR #13924: URL: https://github.com/apache/datafusion/pull/13924#discussion_r1899456067 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -852,10 +852,6 @@ btrim(str[, trim_str]) Alternative Syntax -```sql -trim(BOTH trim_str FROM str)

Re: [I] Support explain query when running dfbench with clickbench [datafusion]

2024-12-30 Thread via GitHub
alamb closed issue #13941: Support explain query when running dfbench with clickbench URL: https://github.com/apache/datafusion/issues/13941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Support explain query when running dfbench with clickbench [datafusion]

2024-12-30 Thread via GitHub
alamb merged PR #13942: URL: https://github.com/apache/datafusion/pull/13942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Supporting writing schema metadata when writing Parquet in parallel [datafusion]

2024-12-30 Thread via GitHub
alamb commented on code in PR #13866: URL: https://github.com/apache/datafusion/pull/13866#discussion_r1899464378 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -750,6 +749,28 @@ impl ParquetSink { } } +/// Create writer properties based upon

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2024-12-30 Thread via GitHub
alamb commented on code in PR #13527: URL: https://github.com/apache/datafusion/pull/13527#discussion_r1899467353 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -243,6 +292,7 @@ pub fn create_physical_expr( Arc::new(fun.clone()), input_phy_ex

Re: [PR] doc-gen: migrate scalar functions (string) documentation 2/4 [datafusion]

2024-12-30 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #13925: URL: https://github.com/apache/datafusion/pull/13925#discussion_r1899876008 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -971,8 +971,8 @@ concat_ws(separator, str[, ..., str_n]) Arguments -- **separator**: Separ

Re: [PR] doc-gen: migrate scalar functions (string) documentation 2/4 [datafusion]

2024-12-30 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #13925: URL: https://github.com/apache/datafusion/pull/13925#discussion_r1899876008 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -971,8 +971,8 @@ concat_ws(separator, str[, ..., str_n]) Arguments -- **separator**: Separ

[I] Refactor sqllogictest to extract postgres functionality into a separate file [datafusion]

2024-12-30 Thread via GitHub
Omega359 opened a new issue, #13948: URL: https://github.com/apache/datafusion/issues/13948 ### Is your feature request related to a problem or challenge? As mentioned in https://github.com/apache/datafusion/pull/13936#discussion_r1899138589 as part of https://github.com/apache/dataf

Re: [PR] Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests [datafusion]

2024-12-30 Thread via GitHub
Omega359 commented on PR #13936: URL: https://github.com/apache/datafusion/pull/13936#issuecomment-2565637239 > Also, it would be nice to file a ticket / follow on to clean up / modularize the postgres container management code in the sqllogictest runnner. I can do that if you don't have a

[I] Schema error when spilling with multiple aggregations [datafusion]

2024-12-30 Thread via GitHub
Friede80 opened a new issue, #13949: URL: https://github.com/apache/datafusion/issues/13949 ### Describe the bug There is an issue when using multiple aggregations and triggering a spill in a `GroupedHashAggregateStream`. A query like `SELECT MIN(b), MAX(b) FROM table GROUP BY

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899690187 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -101,14 +105,330 @@ use substrait::{ version, }; -use super::state::SubstraitPlanningState; +///

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899693626 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -185,257 +501,290 @@ pub fn to_substrait_extended_expr( })) } -/// Convert DataFusion LogicalPla

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899694806 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -101,14 +105,330 @@ use substrait::{ version, }; -use super::state::SubstraitPlanningState; +///

Re: [PR] doc-gen: migrate scalar functions (string) documentation 1/4 [datafusion]

2024-12-30 Thread via GitHub
comphead merged PR #13924: URL: https://github.com/apache/datafusion/pull/13924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899698878 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -185,257 +501,290 @@ pub fn to_substrait_extended_expr( })) } -/// Convert DataFusion LogicalPla

Re: [PR] doc-gen: migrate scalar functions (string) documentation 2/4 [datafusion]

2024-12-30 Thread via GitHub
comphead commented on code in PR #13925: URL: https://github.com/apache/datafusion/pull/13925#discussion_r1899698871 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -971,8 +971,8 @@ concat_ws(separator, str[, ..., str_n]) Arguments -- **separator**: Separator

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899707055 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -998,450 +1304,418 @@ pub fn make_binary_op_scalar_func( /// Convert DataFusion Expr to Substrait Rex

Re: [PR] doc-gen: migrate scalar functions (datetime) documentation 2/2 [datafusion]

2024-12-30 Thread via GitHub
comphead commented on PR #13921: URL: https://github.com/apache/datafusion/pull/13921#issuecomment-2565769912 I think we are very close -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Parallelize pruning utf8 fuzz test [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13947: URL: https://github.com/apache/datafusion/pull/13947#issuecomment-2565770848 Thanks for the review @adriangb ❤️ > Looks great to me! 12s is much better. I bet we can get that way down as well if we improve the speed of each test. Then we might even

Re: [PR] doc-gen: migrate scalar functions (datetime) documentation 2/2 [datafusion]

2024-12-30 Thread via GitHub
comphead commented on code in PR #13921: URL: https://github.com/apache/datafusion/pull/13921#discussion_r1899707297 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -2233,17 +2233,17 @@ to_date('2017-05-31', '%Y-%m-%d') ```sql > select to_date('2023-01-31'); -+-

Re: [PR] Revert "Update sqllogictest requirement from 0.24.0 to 0.25.0 (#13917)" [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13945: URL: https://github.com/apache/datafusion/pull/13945#issuecomment-2565772506 Thanks for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899710875 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -101,14 +105,330 @@ use substrait::{ version, }; -use super::state::SubstraitPlanningState; +///

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899711801 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1727,29 +2001,26 @@ fn make_substrait_window_function( } } -#[allow(deprecated)] #[allow(clipp

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899713365 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -998,450 +1304,418 @@ pub fn make_binary_op_scalar_func( /// Convert DataFusion Expr to Substrait Rex

Re: [I] Support pruning on string columns using starts_with [datafusion]

2024-12-30 Thread via GitHub
adriangb commented on issue #507: URL: https://github.com/apache/datafusion/issues/507#issuecomment-2565776344 @alamb I think we should re-open this for `starts_with`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899715775 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -998,450 +1304,418 @@ pub fn make_binary_op_scalar_func( /// Convert DataFusion Expr to Substrait Rex

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899716140 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -185,257 +501,290 @@ pub fn to_substrait_extended_expr( })) } -/// Convert DataFusion LogicalPla

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899718265 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -571,6 +571,21 @@ async fn roundtrip_self_implicit_cross_join() -> Result<()> { roundtrip

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899786865 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -730,32 +1035,23 @@ fn to_substrait_named_struct(schema: &DFSchemaRef) -> Result { } fn to_substrai

Re: [PR] feat(substrait): modular substrait producer [datafusion]

2024-12-30 Thread via GitHub
vbarua commented on PR #13931: URL: https://github.com/apache/datafusion/pull/13931#issuecomment-2565907299 Thanks for the feedback @Blizzara. Went ahead and made changes based on what you suggested, and also answered some questions. -- This is an automated message from the Apache Git Ser

[PR] Consolidate example dataframe_subquery.rs into dataframe.rs [datafusion]

2024-12-30 Thread via GitHub
zjregee opened a new pull request, #13950: URL: https://github.com/apache/datafusion/pull/13950 ## Which issue does this PR close? Closes #13911. ## Rationale for this change 1. Make it easier to find relevant examples. 2. Make local dev faster with fewer distinct bina

Re: [PR] Revert "Update sqllogictest requirement from 0.24.0 to 0.25.0 (#13917)" [datafusion]

2024-12-30 Thread via GitHub
jonahgao merged PR #13945: URL: https://github.com/apache/datafusion/pull/13945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2024-12-30 Thread via GitHub
bombsimon opened a new pull request, #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628 Adds supports for the `SETTINGS` and `FORMAT` keywords used for ClickHouse when inserting data with other syntax than SQL. This can happen e.g. when using the ClickHouse CLI tool to

[I] Improve speed of `datafusion::fuzz fuzz_cases::pruning::test_fuzz_utf8` test [datafusion]

2024-12-30 Thread via GitHub
alamb opened a new issue, #13946: URL: https://github.com/apache/datafusion/issues/13946 After https://github.com/apache/datafusion/pull/12978 the `test_fuzz_utf8` test takes over a minute to run on my machine This ticket tracks improving the speed > Thanks @adria

Re: [I] Support pruning on string columns using starts_with [datafusion]

2024-12-30 Thread via GitHub
alamb closed issue #507: Support pruning on string columns using starts_with URL: https://github.com/apache/datafusion/issues/507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add fuzz testing for UTF8 LIKE pruning [datafusion]

2024-12-30 Thread via GitHub
alamb closed pull request #13253: Add fuzz testing for UTF8 LIKE pruning URL: https://github.com/apache/datafusion/pull/13253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Implement predicate pruning for `like` expressions (prefix matching) [datafusion]

2024-12-30 Thread via GitHub
alamb merged PR #12978: URL: https://github.com/apache/datafusion/pull/12978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement predicate pruning for `like` expressions (prefix matching) [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #12978: URL: https://github.com/apache/datafusion/pull/12978#issuecomment-2565495535 > Yeah this is what I was hinting at in [#12978 (comment)](https://github.com/apache/datafusion/pull/12978#issuecomment-2542335627). > > I'm happy to throw threads at it for

Re: [PR] Implement predicate pruning for `like` expressions (prefix matching) [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #12978: URL: https://github.com/apache/datafusion/pull/12978#issuecomment-2565499527 Thank you again @adriangb for bearing with us -- I know this took a long time However, I am pretty stoked that we now have this optimization and it is an example of the very ca

Re: [I] Dec 13, 2024: This week(s) in DataFusion [datafusion]

2024-12-30 Thread via GitHub
alamb commented on issue #13760: URL: https://github.com/apache/datafusion/issues/13760#issuecomment-2565500273 - Merged https://github.com/apache/datafusion/pull/12978 from @adriangb -- quite a cool parquet based optimization -- This is an automated message from the Apache Git Service. T

Re: [PR] doc-gen: migrate scalar functions (string) documentation 1/4 [datafusion]

2024-12-30 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #13924: URL: https://github.com/apache/datafusion/pull/13924#discussion_r1899559374 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -852,10 +852,6 @@ btrim(str[, trim_str]) Alternative Syntax -```sql -trim(BOTH trim_str FR

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

2024-12-30 Thread via GitHub
codecov-commenter commented on PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#issuecomment-2565536446 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1190?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Find a way to communicate the ordering of a file back with the existi… [datafusion]

2024-12-30 Thread via GitHub
zhuqi-lucas commented on PR #13933: URL: https://github.com/apache/datafusion/pull/13933#issuecomment-2565203405 Thank you @berkaysynnada for review and good suggestions. I agree with you, this is not a perfect way, i will try to do following improvement for this PR: 1. Try to make the

Re: [PR] Find a way to communicate the ordering of a file back with the existi… [datafusion]

2024-12-30 Thread via GitHub
berkaysynnada commented on PR #13933: URL: https://github.com/apache/datafusion/pull/13933#issuecomment-2565152257 I believe extending the Statistics with sort information is dangerous, as it deviates from the single-responsibility principle and creates the burden of maintaining order infor

[I] Why do QueryPlanner and PhysicalPlanner exist as independent concepts? [datafusion]

2024-12-30 Thread via GitHub
njsmith opened a new issue, #13943: URL: https://github.com/apache/datafusion/issues/13943 ### Is your feature request related to a problem or challenge? I wrote a `LogicalPlan::Extension` and an `ExtensionPlanner` to convert it into a physical plan. Then I tried to figure out how to

Re: [PR] Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13936: URL: https://github.com/apache/datafusion/pull/13936#issuecomment-2565355212 > Git submodules - for me this worked (as documented in the description above): > `git submodule update --init --remote --recursive` 🤔 -- it still doesn't work for me.

Re: [I] [EPIC] Improvements to GroupColumn multi-column aggregation performance [datafusion]

2024-12-30 Thread via GitHub
alamb closed issue #12680: [EPIC] Improvements to GroupColumn multi-column aggregation performance URL: https://github.com/apache/datafusion/issues/12680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] [EPIC] Improvements to GroupColumn multi-column aggregation performance [datafusion]

2024-12-30 Thread via GitHub
alamb commented on issue #12680: URL: https://github.com/apache/datafusion/issues/12680#issuecomment-2565360495 I believe all the currently known items for this epic are completed so closing this issue for now. Let's open new issues if/when needed -- This is an automated message from the

Re: [PR] Consolidate example to_date.rs into dateframe.rs [datafusion]

2024-12-30 Thread via GitHub
alamb merged PR #13939: URL: https://github.com/apache/datafusion/pull/13939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Consolidate example to_date.rs into dateframe.rs [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13939: URL: https://github.com/apache/datafusion/pull/13939#issuecomment-2565371886 Thanks again @comphead for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Supporting writing schema metadata when writing Parquet in parallel [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13866: URL: https://github.com/apache/datafusion/pull/13866#issuecomment-2565373933 I'll plan to merge this tomorrow unless others would like time to review Thanks again @wiedld -- This is an automated message from the Apache Git Service. To respond to the me

[PR] Alamb/downgrade sqllogictest [datafusion]

2024-12-30 Thread via GitHub
alamb opened a new pull request, #13945: URL: https://github.com/apache/datafusion/pull/13945 ## Which issue does this PR close? - Reverts https://github.com/apache/datafusion/pull/13917 ## Rationale for this change - @Omega359 is adding the sqlite sqllogictests i

Re: [PR] Update sqllogictest requirement from 0.24.0 to 0.25.0 [datafusion]

2024-12-30 Thread via GitHub
alamb commented on code in PR #13917: URL: https://github.com/apache/datafusion/pull/13917#discussion_r1899486562 ## datafusion/sqllogictest/test_files/group_by.slt: ## @@ -80,7 +80,7 @@ SELECT col1 * cor0.col1 * 56 AS col1 FROM tab2 AS cor0 GROUP BY cor0.col1 208376 94136

Re: [PR] extract expressions to folders based on spark grouping [datafusion-comet]

2024-12-30 Thread via GitHub
rluvaton commented on PR #1206: URL: https://github.com/apache/datafusion-comet/pull/1206#issuecomment-2565417237 Thanks @andygrove, The PR is now ready, I moved all the functions to the correct location -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] Improve speed of `datafusion::fuzz fuzz_cases::pruning::test_fuzz_utf8` test [datafusion]

2024-12-30 Thread via GitHub
adriangb commented on issue #13946: URL: https://github.com/apache/datafusion/issues/13946#issuecomment-2565546423 Amazing thanks Andrew! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager [datafusion-comet]

2024-12-30 Thread via GitHub
ramyadass commented on issue #864: URL: https://github.com/apache/datafusion-comet/issues/864#issuecomment-2566140875 I get similar error when invoking with spark-shell. Tried below: a) Uploaded the jar to hdfs: hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar an

Re: [PR] feat: support `RightAnti` for `SortMergeJoin` [datafusion]

2024-12-30 Thread via GitHub
jayzhan-synnada commented on code in PR #13680: URL: https://github.com/apache/datafusion/pull/13680#discussion_r1899954923 ## datafusion/sqllogictest/test_files/sort_merge_join.slt: ## @@ -647,6 +647,54 @@ NULL NULL 7 9 NULL NULL 8 10 NULL NULL 9 11 +query II +select * from

Re: [I] Improve speed of `datafusion::fuzz fuzz_cases::pruning::test_fuzz_utf8` test [datafusion]

2024-12-30 Thread via GitHub
alamb commented on issue #13946: URL: https://github.com/apache/datafusion/issues/13946#issuecomment-2565575247 - Proposal: https://github.com/apache/datafusion/pull/13947#top -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] feat: support `RightAnti` for `SortMergeJoin` [datafusion]

2024-12-30 Thread via GitHub
irenjj commented on PR #13680: URL: https://github.com/apache/datafusion/pull/13680#issuecomment-2565594292 Thanks @comphead . I've made the changes based on your suggestions. Could you please take another look? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] feat: support `RightAnti` for `SortMergeJoin` [datafusion]

2024-12-30 Thread via GitHub
irenjj commented on code in PR #13680: URL: https://github.com/apache/datafusion/pull/13680#discussion_r1899605324 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2910,6 +2984,310 @@ mod tests { Ok(()) } +#[tokio::test] +async fn join_rig

[PR] Correctly tokenize nested comments [datafusion-sqlparser-rs]

2024-12-30 Thread via GitHub
hansott opened a new pull request, #1629: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1629 The tokenizer currently throws EOF error for `select 'foo' /*/**/*/` while this is supported by postgres: `last_ch` causes problems when tokenizing nested co

Re: [PR] Parallelize pruning utf8 fuzz test [datafusion]

2024-12-30 Thread via GitHub
alamb commented on code in PR #13947: URL: https://github.com/apache/datafusion/pull/13947#discussion_r1899592668 ## datafusion/core/tests/fuzz_cases/pruning.rs: ## @@ -38,151 +38,266 @@ use parquet::{ file::properties::{EnabledStatistics, WriterProperties}, }; use rand::

[PR] Parallelize pruning utf8 fuzz test [datafusion]

2024-12-30 Thread via GitHub
alamb opened a new pull request, #13947: URL: https://github.com/apache/datafusion/pull/13947 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/13946 ## Rationale for this change 1. After https://github.com/apache/datafusion/p

Re: [PR] Parallelize pruning utf8 fuzz test [datafusion]

2024-12-30 Thread via GitHub
alamb commented on PR #13947: URL: https://github.com/apache/datafusion/pull/13947#issuecomment-2565576230 There may be additional improvements we could make to improve the overall speed of these tests but I focused on parallelization first -- This is an automated message from the Apache

Re: [PR] Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests [datafusion]

2024-12-30 Thread via GitHub
jonahgao commented on code in PR #13936: URL: https://github.com/apache/datafusion/pull/13936#discussion_r1899597047 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -147,60 +294,148 @@ async fn run_tests() -> Result<()> { } } -async fn run_test_file(test_file: Tes

Re: [PR] Correctly tokenize nested comments [datafusion-sqlparser-rs]

2024-12-30 Thread via GitHub
hansott commented on PR #1629: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1629#issuecomment-2565672288 cc @iffyio -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2024-12-30 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1899871918 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,24 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] feat: add `AsyncCatalogProvider` helpers for asynchronous catalogs [datafusion]

2024-12-30 Thread via GitHub
westonpace commented on PR #13800: URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2566110720 I do, but it won't happen before Thursday, apologies (finishing up winter break). -- This is an automated message from the Apache Git Service. To respond to the message, please