Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-18 Thread via GitHub
codecov-commenter commented on PR #445: URL: https://github.com/apache/datafusion-comet/pull/445#issuecomment-2118934008 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/445?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[PR] build(deps): bump prost from 0.12.4 to 0.12.6 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #705: URL: https://github.com/apache/datafusion-python/pull/705 Bumps [prost](https://github.com/tokio-rs/prost) from 0.12.4 to 0.12.6. Commits https://github.com/tokio-rs/prost/commit/d42c85e790263f78f6c626ceb0dac5fda0edcb41";>d42c85e

[PR] build(deps): bump syn from 2.0.63 to 2.0.64 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #706: URL: https://github.com/apache/datafusion-python/pull/706 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.63 to 2.0.64. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.64 Su

[PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #707: URL: https://github.com/apache/datafusion-python/pull/707 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.9.1 to 0.10.1. Changelog Sourced from https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md";>object_

[PR] build(deps): bump prost-types from 0.12.3 to 0.12.6 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #708: URL: https://github.com/apache/datafusion-python/pull/708 Bumps [prost-types](https://github.com/tokio-rs/prost) from 0.12.3 to 0.12.6. Commits https://github.com/tokio-rs/prost/commit/d42c85e790263f78f6c626ceb0dac5fda0edcb41";>d4

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-18 Thread via GitHub
tshauck commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1605893537 ## docs/source/contributor-guide/adding_a_new_expression.md: ## @@ -0,0 +1,212 @@ + + +# Adding a Expression + +There are a number of Spark expression that are n

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-18 Thread via GitHub
tshauck commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1605893848 ## docs/source/contributor-guide/adding_a_new_expression.md: ## @@ -0,0 +1,212 @@ + + +# Adding a Expression + +There are a number of Spark expression that are n

[PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya opened a new pull request, #447: URL: https://github.com/apache/datafusion-comet/pull/447 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

[I] CometNativeExec.doCanonicalize should canonicalize SparkPlan in Product parameters [datafusion-comet]

2024-05-18 Thread via GitHub
viirya opened a new issue, #448: URL: https://github.com/apache/datafusion-comet/issues/448 ### Describe the bug `SparkPlan.doCanonicalize` default implementation canonicalizes expressions in Product parameters, but not for `SparkPlan` because derived classes in Spark doesn't have su

[PR] feat: add hex scalar function [datafusion-comet]

2024-05-18 Thread via GitHub
tshauck opened a new pull request, #449: URL: https://github.com/apache/datafusion-comet/pull/449 ## Which issue does this PR close? Related to https://github.com/apache/datafusion-comet/issues/341. ## Rationale for this change I recently added `unhex` so this PR adds `he

Re: [PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on code in PR #447: URL: https://github.com/apache/datafusion-comet/pull/447#discussion_r1605914565 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q5a/explain.txt: ## @@ -72,70 +72,16 @@ TakeOrderedAndProject (137) :

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-18 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2119039796 Thanks for your update! I'll work on the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-18 Thread via GitHub
huaxingao commented on code in PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#discussion_r1605917689 ## common/src/main/java/org/apache/comet/parquet/CometParquetToSparkSchemaConverter.scala: ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundat

Re: [I] `select array_concat([])` panicked [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 commented on issue #10200: URL: https://github.com/apache/datafusion/issues/10200#issuecomment-2119054943 Actually, I'm thinking about whether we should change the behavior of array_concat similar to postgres and duckdb. It is one of the earliest array functions that we don't f

Re: [PR] Improve round-robin repartitioning [datafusion]

2024-05-18 Thread via GitHub
github-actions[bot] closed pull request #6047: Improve round-robin repartitioning URL: https://github.com/apache/datafusion/pull/6047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] select multiple columns in a single `Expr` [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 commented on issue #10102: URL: https://github.com/apache/datafusion/issues/10102#issuecomment-2119060296 I didn't find equivalent behavior in postgres. I'm not sure should we support this kind of `returns subset of columns based on column name matching` -- This is an automated

[PR] Draft: Add pyi stubs for type hinting [datafusion-python]

2024-05-18 Thread via GitHub
timsaucer opened a new pull request, #709: URL: https://github.com/apache/datafusion-python/pull/709 # Which issue does this PR close? This PR does not close an issue, but it aims to address part of the discussion in https://github.com/apache/datafusion-python/issues/440 . This takes

[PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-18 Thread via GitHub
goldmedal opened a new pull request, #10573: URL: https://github.com/apache/datafusion/pull/10573 ## Which issue does this PR close? Closes #10557 ## Rationale for this change ## What changes are included in this PR? Only implement the default dialect in this PR.

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
goldmedal commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2119066019 As the mentioned in `dialect.rs` https://github.com/apache/datafusion/blob/e7858ff0ab1c282ab46bd93cabc3dc83db583165/datafusion/sql/src/unparser/dialect.rs#L19 I think

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-18 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605946538 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -689,7 +689,7 @@ select column1, column2, column3, column4, column5 from nested_arrays; # values table

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-18 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605947314 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: New failure: ``` Running "array.slt" External error: query failed: DataFusio

Re: [I] CometNativeExec.doCanonicalize should canonicalize SparkPlan in Product parameters [datafusion-comet]

2024-05-18 Thread via GitHub
viirya closed issue #448: CometNativeExec.doCanonicalize should canonicalize SparkPlan in Product parameters URL: https://github.com/apache/datafusion-comet/issues/448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on PR #447: URL: https://github.com/apache/datafusion-comet/pull/447#issuecomment-2119108275 Merged. Thanks @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
backkem commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2119108342 Yes, these are basically the same object. The one in DataFusion was put there temporarily until the trait extension in the sqlparser repo is landed and pushed to crates.io. --

Re: [PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya merged PR #447: URL: https://github.com/apache/datafusion-comet/pull/447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-18 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605950326 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: Can be a bug after the JSON path parse changes. -- This is an automated message from

[PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 opened a new pull request, #10574: URL: https://github.com/apache/datafusion/pull/10574 ## Which issue does this PR close? Closes #. ## Rationale for this change 1. add ahash for common, used for distinct count accumulator #10484 2. move other g

Re: [PR] Improve signature of `get_field` function [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 merged PR #10569: URL: https://github.com/apache/datafusion/pull/10569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Improve signature of `get_field` is function [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 closed issue #10566: Improve signature of `get_field` is function URL: https://github.com/apache/datafusion/issues/10566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1605955848 ## docs/source/user-guide/expressions.md: ## @@ -304,6 +304,16 @@ select log(-1), log(0), sqrt(-1); | rollup(exprs)

[PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-19 Thread via GitHub
timsaucer opened a new pull request, #710: URL: https://github.com/apache/datafusion-python/pull/710 # Which issue does this PR close? Closes #696. # Rationale for this change This PR sets up a work flow to generate TPH-C 1Gb data set in CI, runs the 22 examples, and c

Re: [I] Ensure examples stay updated in CI. [datafusion-python]

2024-05-19 Thread via GitHub
timsaucer commented on issue #696: URL: https://github.com/apache/datafusion-python/issues/696#issuecomment-2119241255 As an update, I've got the tests written as you describe. I removed the reference files from the repo. Now it's checking against the official answer files. There is one sp

Re: [I] Connection reset by peer on AWS S3 object store. [datafusion]

2024-05-19 Thread via GitHub
Smotrov commented on issue #10478: URL: https://github.com/apache/datafusion/issues/10478#issuecomment-2119249340 > We've been hitting this issue too, the object store doesn't retry these errors even if you set retries: Im my case retry doesn't help. Is listed in the code above. --

[PR] fix double blog path [datafusion-site]

2024-05-19 Thread via GitHub
andygrove opened a new pull request, #3: URL: https://github.com/apache/datafusion-site/pull/3 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] fix double blog path [datafusion-site]

2024-05-19 Thread via GitHub
andygrove commented on PR #3: URL: https://github.com/apache/datafusion-site/pull/3#issuecomment-2119262109 @alamb The published site at https://datafusion.apache.org/blog/ is based on this PR which fixed an issue where the blog posts were under `/blog/blog/` -- This is an automated messa

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-19 Thread via GitHub
andygrove commented on PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#issuecomment-2119274581 This is looking great @tshauck. Could you fix the merge conflict? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] bug: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-19 Thread via GitHub
vaibhawvipul opened a new pull request, #450: URL: https://github.com/apache/datafusion-comet/pull/450 ## Which issue does this PR close? Closes #431 . ## Rationale for this change Removing leading whitespaces: In some inputs, Spark's error messages conta

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-19 Thread via GitHub
tshauck commented on PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#issuecomment-2119308662 Thanks, @andygrove -- I think the conflict is resolved now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-19 Thread via GitHub
andygrove commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1606076912 ## tpch/queries/q15.sql: ## @@ -0,0 +1,33 @@ +-- SQLBench-H query 15 derived from TPC-H query 15 under the terms of the TPC Fair Use Policy. +-- TPC-H queri

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-19 Thread via GitHub
jmhain commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1606080413 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -689,7 +689,7 @@ select column1, column2, column3, column4, column5 from nested_arrays; # values table q

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-19 Thread via GitHub
jmhain commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1606080413 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -689,7 +689,7 @@ select column1, column2, column3, column4, column5 from nested_arrays; # values table q

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-19 Thread via GitHub
jmhain commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1606080413 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -689,7 +689,7 @@ select column1, column2, column3, column4, column5 from nested_arrays; # values table q

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-19 Thread via GitHub
jmhain commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1606083522 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -689,7 +689,7 @@ select column1, column2, column3, column4, column5 from nested_arrays; # values table q

Re: [PR] feat: extend unnest to support Struct datatype [datafusion]

2024-05-19 Thread via GitHub
duongcongtoai commented on code in PR #10429: URL: https://github.com/apache/datafusion/pull/10429#discussion_r1606095994 ## datafusion/expr/src/expr_schema.rs: ## @@ -123,7 +123,8 @@ impl ExprSchemable for Expr { Ok(field.data_type().clone())

Re: [I] Connection reset by peer on AWS S3 object store. [datafusion]

2024-05-19 Thread via GitHub
Smotrov commented on issue #10478: URL: https://github.com/apache/datafusion/issues/10478#issuecomment-2119348218 > I believe it may be fixed by [apache/arrow-rs#5609](https://github.com/apache/arrow-rs/pull/5609) which is in object store release 0.10.0 https://github.com/apache/

Re: [I] Make ASF public press release [datafusion]

2024-05-19 Thread via GitHub
alamb commented on issue #10403: URL: https://github.com/apache/datafusion/issues/10403#issuecomment-2119352799 The ASF guidelines involve working on the wording with the PMC, so unfortunately I can not share the content publically before then -- This is an automated message from the Apac

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-05-19 Thread via GitHub
alamb commented on issue #10572: URL: https://github.com/apache/datafusion/issues/10572#issuecomment-2119371107 Thank you for the report and the reproducer ❤️ > read row groups in order they were written This is not my expectation. DataFusion reads row groups in paralle

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-19 Thread via GitHub
alamb commented on issue #10571: URL: https://github.com/apache/datafusion/issues/10571#issuecomment-2119371269 I agree it would be much better to change the API and return an `Result<..>` Thank you for the report -- This is an automated message from the Apache Git Service. To respo

Re: [I] Dynamic schema for custom TableProvider [datafusion]

2024-05-19 Thread via GitHub
alamb commented on issue #10559: URL: https://github.com/apache/datafusion/issues/10559#issuecomment-2119371654 > how can we make the schema generic as well so users can select any fields from all tables the said API has? One way you might be able to do this would be creating a Schem

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-19 Thread via GitHub
advancedxy commented on PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#issuecomment-2119563653 Gently ping @viirya @sunchao and @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-19 Thread via GitHub
comphead commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2119569734 @viirya I'm planning to merge this PR soon as it fixes the crash, and addresses your concern (please see the slt test covering this specific case). All other improvements can be in

[PR] Improve `UserDefinedLogicalNode::from_template` API [datafusion]

2024-05-19 Thread via GitHub
lewiszlw opened a new pull request, #10575: URL: https://github.com/apache/datafusion/pull/10575 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/10571. ## Rationale for this change ## What changes are included in this

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-19 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606287947 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-19 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606289302 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

[PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-19 Thread via GitHub
advancedxy opened a new pull request, #451: URL: https://github.com/apache/datafusion-comet/pull/451 ## Which issue does this PR close? Closes #. ## Rationale for this change Follow up of #426, supports generating random rows for specified struct type. ## What changes are

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-19 Thread via GitHub
advancedxy commented on PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#issuecomment-2119750888 @andygrove would you mind to take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-19 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606287947 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-19 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606310433 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-05-19 Thread via GitHub
twitu commented on issue #10572: URL: https://github.com/apache/datafusion/issues/10572#issuecomment-2119764091 Setting `datafusion.optimizer.repartition_file_scans` to `false` like this fixes things. :heavy_check_mark: ```rust let session_cfg = SessionConfig::new(

[PR] Migrate testing optimizer rules [datafusion]

2024-05-20 Thread via GitHub
lewiszlw opened a new pull request, #10576: URL: https://github.com/apache/datafusion/pull/10576 ## Which issue does this PR close? part of https://github.com/apache/datafusion/issues/9637. ## Rationale for this change ## What changes are included in this

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-20 Thread via GitHub
LorrensP-2158466 commented on issue #10571: URL: https://github.com/apache/datafusion/issues/10571#issuecomment-2119941766 I don't see this as a really difficult API change. Is it ok if I do this? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Improve ContextProvider [datafusion]

2024-05-20 Thread via GitHub
lewiszlw opened a new pull request, #10577: URL: https://github.com/apache/datafusion/pull/10577 ## Which issue does this PR close? Renaming like `SchemaProvider::table_names`, add docs and remove deprecated code. ## Rationale for this change ## What chan

[PR] Update prost-build requirement from =0.12.4 to =0.12.6 [datafusion]

2024-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #10578: URL: https://github.com/apache/datafusion/pull/10578 Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. Commits https://github.com/tokio-rs/prost/commit/d42c85e790263f78f6

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1606501759 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [I] API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #9929: URL: https://github.com/apache/datafusion/issues/9929#issuecomment-2120223824 Update here is that I found it was maybe too large a step to get to the row level access initially -- instead I started with a basic example of building a *file level index* -- htt

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1606615253 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -131,177 +126,190 @@ fn contains_grouping_set(expr: &[Expr]) -> bool { impl OptimizerRule for Singl

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-20 Thread via GitHub
peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2120329963 I'm still working on an alternative to this PR and will need a couple of more days to test a few different ideas... -- This is an automated message from the Apache Git Service.

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-20 Thread via GitHub
ozankabak commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2120338004 No worries. Will be happy to review and help iterate once you are ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-20 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2120433348 https://github.com/apache/datafusion-comet/suites/23883332179/logs?attempt=2 In the logs for ubuntu-latest/java 17-spark-3.4-scala-2.12/java - which included the fuzz te

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10571: URL: https://github.com/apache/datafusion/issues/10571#issuecomment-2120463462 > I don't see this as a really difficult API change. Is it ok if I do this? Edit: there is a PR already,did not see it, sorry. @lewiszlw beats us to it! (BTW

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-20 Thread via GitHub
therealsharath commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2120466027 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-20 Thread via GitHub
dependabot[bot] commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2120466102 Sorry, only users with push access can use that command. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-20 Thread via GitHub
therealsharath commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2120467630 Hello, is it possible to merge this in because https://github.com/apache/arrow-rs/issues/5589 was fixed in object store `0.10.1`. Thanks! -- This is an automated messag

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-20 Thread via GitHub
aditanase commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2120467689 I was recently trying to query the NYC dataset from ballista. Path looks something like https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet

Re: [PR] Implement Unparse `GroupingSet` Expr --> String Support sql [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10555: URL: https://github.com/apache/datafusion/pull/10555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `GroupingSet` Expr --> String Support [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #10521: `GroupingSet` Expr --> String Support URL: https://github.com/apache/datafusion/issues/10521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Complete support for `Expr --> String ` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #9726: URL: https://github.com/apache/datafusion/issues/9726#issuecomment-2120522626 With the completion of https://github.com/apache/datafusion/pull/10555 from @xinlifoobar I think this epic is now done! -- This is an automated message from the Apache Git Service

Re: [I] Complete support for `Expr --> String ` [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #9726: Complete support for `Expr --> String ` URL: https://github.com/apache/datafusion/issues/9726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-20 Thread via GitHub
viirya merged PR #445: URL: https://github.com/apache/datafusion-comet/pull/445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-20 Thread via GitHub
viirya commented on PR #445: URL: https://github.com/apache/datafusion-comet/pull/445#issuecomment-2120582785 Merged. Thanks @andygrove @advancedxy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601980707 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -165,6 +168,53 @@ pub fn to_substrait_rel( }))), })) } +

Re: [PR] test: parametrize test_array_functions [datafusion-python]

2024-05-20 Thread via GitHub
Michael-J-Ward commented on PR #678: URL: https://github.com/apache/datafusion-python/pull/678#issuecomment-2120634135 @andygrove could we merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-20 Thread via GitHub
viirya commented on code in PR #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2#discussion_r1606905078 ## tpch/tpchgen.py: ## @@ -0,0 +1,89 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
comphead commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606905348 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` row

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
comphead commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2120655736 > I've seen some issues in this patch. It doesn't look like a correct fix. The tests currently in sync with what hash join returns, is there a test showing the opposite? --

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606910619 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606910619 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2120680185 @aditanase how are you running the external statement? It seems to work well from `datafusion-cli` ```shell andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606924344 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [PR] Minor: Move proxy to datafusion common [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10561: URL: https://github.com/apache/datafusion/pull/10561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `GroupingSet` Expr --> String Support [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10521: URL: https://github.com/apache/datafusion/issues/10521#issuecomment-2120689989 Thanks again @xinlifoobar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
Michael-J-Ward commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606931001 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . +

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
Michael-J-Ward commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606931001 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . +

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606942756 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606944266 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606948252 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

[I] DataFusion weekly project plan (Andrew Lamb) - May 20, 2024 [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10579: URL: https://github.com/apache/datafusion/issues/10579 Follow on to https://github.com/apache/datafusion/issues/10482 My (personal) North ⭐ : 1000 projects are built using DataFusion 📈 **It would be great for other contributors to DataFusion wh

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10482: URL: https://github.com/apache/datafusion/issues/10482#issuecomment-2120727825 Next week: https://github.com/apache/datafusion/issues/10579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #10482: DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 URL: https://github.com/apache/datafusion/issues/10482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2120729543 > I've seen some issues in this patch. It doesn't look like a correct fix. Took another look. Looks okay to me. -- This is an automated message from the Apache Git Ser

  1   2   3   4   5   6   7   8   9   10   >