[I] Advanced example for building an external index for Row Groups *within* parquet files [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10580: URL: https://github.com/apache/datafusion/issues/10580 ### Is your feature request related to a problem or challenge? It is common in databases and other analytic system to have additional external "indexes" (perhaps stored in the "metadata catalog",

Re: [PR] fix double blog path [datafusion-site]

2024-05-20 Thread via GitHub
alamb merged PR #3: URL: https://github.com/apache/datafusion-site/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.

Re: [PR] fix double blog path [datafusion-site]

2024-05-20 Thread via GitHub
alamb commented on PR #3: URL: https://github.com/apache/datafusion-site/pull/3#issuecomment-2120752894 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Using `Expr::field` panics [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10565: URL: https://github.com/apache/datafusion/issues/10565#issuecomment-2120754098 Thank you @jayzhan211 🙏 -- I will review it now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Remove `Expr::GetIndexedField` and fix panic of `field`, `index` and `range` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10568: URL: https://github.com/apache/datafusion/pull/10568#discussion_r1606971468 ## datafusion/core/tests/expr_api/mod.rs: ## @@ -61,7 +63,7 @@ fn test_eq_with_coercion() { #[test] fn test_get_field() { evaluate_expr_test( -get_fie

Re: [PR] Remove `Expr::GetIndexedField`, replace `Expr::{field,index,range}` with `FieldAccessor`, `IndexAccessor`, and `SliceAccessor` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10568: URL: https://github.com/apache/datafusion/pull/10568#discussion_r1606978717 ## datafusion/functions/src/core/expr_ext.rs: ## @@ -0,0 +1,68 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1606981891 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2120775312 https://github.com/apache/datafusion/pull/10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade. -- This is an automated mes

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2120780299 Hi @leoluan2009 In my opinion, I don't think DataFusion needs JIT to get good performance. In general, I find the paper ["Everything You Always Wanted to Know About C

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
timsaucer commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606991649 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . + +

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
timsaucer commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606991649 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . + +

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-20 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605950326 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: Can be a bug after the JSON path parse changes - https://github.com/sqlparser-rs/sqlpars

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-20 Thread via GitHub
tisonkun commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2120801283 > #10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade. We may need a 0.46.1 for resolving the regressions: *

[I] chore: extended explain info can be an object instead of class [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra opened a new issue, #452: URL: https://github.com/apache/datafusion-comet/issues/452 ### Describe the bug ExtendedExplainInfo is declared as a class, but it can be an object instead. ### Steps to reproduce _No response_ ### Expected behavior _N

Re: [PR] feat: Add logging to explain reasons for Comet not being able to run a query stage natively [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra commented on code in PR #397: URL: https://github.com/apache/datafusion-comet/pull/397#discussion_r1607016398 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -734,6 +734,23 @@ class CometSparkSessionExtensions } else {

Re: [PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-20 Thread via GitHub
andygrove commented on PR #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2#issuecomment-2120828494 Thanks for the review @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-20 Thread via GitHub
andygrove merged PR #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607045862 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.next

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10575: URL: https://github.com/apache/datafusion/pull/10575#discussion_r1607048271 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -76,27 +76,20 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// For example: `TopK: k=

Re: [PR] Improve ContextProvider [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10577: URL: https://github.com/apache/datafusion/pull/10577#issuecomment-2120882095 I'll leave this open for a day as it is an API change, in case anyone else wants a chance to review -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Update prost-build requirement from =0.12.4 to =0.12.6 [datafusion]

2024-05-20 Thread via GitHub
comphead merged PR #10578: URL: https://github.com/apache/datafusion/pull/10578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10564: URL: https://github.com/apache/datafusion/pull/10564#issuecomment-2120885272 @jayzhan211 has a better fix in https://github.com/apache/datafusion/pull/10568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned [datafusion]

2024-05-20 Thread via GitHub
alamb closed pull request #10564: Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned URL: https://github.com/apache/datafusion/pull/10564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-20 Thread via GitHub
comphead commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1607055206 ## datafusion/sql/Cargo.toml: ## @@ -47,6 +47,7 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true, default-features = true } datafus

[PR] Minor: Fix ArrayFunctionRewriter name [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new pull request, #10581: URL: https://github.com/apache/datafusion/pull/10581 This confused me while debugging something else -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] refactor: reduce allocations in push down filter [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10567: URL: https://github.com/apache/datafusion/pull/10567#discussion_r1607060754 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -861,16 +861,12 @@ impl OptimizerRule for PushDownFilter { .collect(); l

Re: [PR] Add examples of how to convert logical plan to/from sql strings [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10558: URL: https://github.com/apache/datafusion/pull/10558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add an example of how to convert LogicalPlan to/from SQL Strings [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #10550: Add an example of how to convert LogicalPlan to/from SQL Strings URL: https://github.com/apache/datafusion/issues/10550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-20 Thread via GitHub
comphead commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1607067858 ## datafusion/sql/src/unparser/expr.rs: ## @@ -504,6 +508,14 @@ impl Unparser<'_> { .collect::>>() } +pub(super) fn new_ident_quoted_if_ne

Re: [PR] Minor: Improve documentation in sql_to_plan example [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10582: URL: https://github.com/apache/datafusion/pull/10582#discussion_r1607076013 ## datafusion-examples/examples/plan_to_sql.rs: ## @@ -22,36 +22,45 @@ use datafusion::sql::unparser::expr_to_sql; use datafusion_sql::unparser::dialect::CustomDial

[I] HashJoin LeftAnti Join handles nulls incorrectly [datafusion]

2024-05-20 Thread via GitHub
viirya opened a new issue, #10583: URL: https://github.com/apache/datafusion/issues/10583 ### Describe the bug During working on https://github.com/apache/datafusion-comet/pull/437, a few Spark join tests are failed when delegating to DataFusion HashJoin. It is because that Dat

[PR] HashJoin LeftAnti Join should handle nulls correctly [datafusion]

2024-05-20 Thread via GitHub
viirya opened a new pull request, #10584: URL: https://github.com/apache/datafusion/pull/10584 ## Which issue does this PR close? Closes #10583. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] HashJoin LeftAnti Join should handle nulls correctly [datafusion]

2024-05-20 Thread via GitHub
viirya commented on PR #10584: URL: https://github.com/apache/datafusion/pull/10584#issuecomment-2120931328 Added the test case first. I will find some time to work on the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] HashJoin LeftAnti Join should handle nulls correctly [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10584: URL: https://github.com/apache/datafusion/pull/10584#discussion_r1607085982 ## datafusion/sqllogictest/test_files/join.slt: ## @@ -793,3 +793,19 @@ DROP TABLE companies statement ok DROP TABLE leads + + +# LeftAnti Join with null +state

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607087345 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607091830 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607104678 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[I] Incorrect statistics read for `i8` `i16` [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10585: URL: https://github.com/apache/datafusion/issues/10585 ### Describe the bug As @NGA-TRAN found in https://github.com/apache/datafusion/pull/10537 when i8 and i16 values are written to parquet and then the statistics are extracted, the returned min/m

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
NGA-TRAN commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607104991 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10585: URL: https://github.com/apache/datafusion/issues/10585#issuecomment-2120956336 Possibly related to https://github.com/apache/datafusion/issues/9779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607107595 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607110465 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[I] DataFusion ignores "column order" parquet statistics specification [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10586: URL: https://github.com/apache/datafusion/issues/10586 ### Describe the bug As @tustvold points out, there is a [`column_order` API](https://docs.rs/parquet/latest/parquet/file/metadata/struct.FileMetaData.html#method.column_order) defined in parquet

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607118413 ## datafusion/core/src/datasource/physical_plan/parquet/arrow_statistics.rs: ## @@ -0,0 +1,43 @@ +use arrow_array::ArrayRef; +use arrow_schema::DataType; +use datafu

[I] DataFusion reads Date32 and Date64 parquet statistics in as [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10587: URL: https://github.com/apache/datafusion/issues/10587 ### Describe the bug When reading a Date32 or Date64 column from a parquet file, DataFusion currently returns an Int32 array ### To Reproduce You can see the issue in https:/

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607125587 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,654 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10537: URL: https://github.com/apache/datafusion/pull/10537#issuecomment-2120986025 I have filed the following tickets * https://github.com/apache/datafusion/issues/10585 * https://github.com/apache/datafusion/issues/10586 * #10587 I think this PR is

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-05-20 Thread via GitHub
twitu closed issue #10572: Row groups are read out of order or with completely different values URL: https://github.com/apache/datafusion/issues/10572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor: Improve documentation in sql_to_plan example [datafusion]

2024-05-20 Thread via GitHub
edmondop commented on code in PR #10582: URL: https://github.com/apache/datafusion/pull/10582#discussion_r1607146553 ## datafusion-examples/examples/plan_to_sql.rs: ## @@ -22,36 +22,45 @@ use datafusion::sql::unparser::expr_to_sql; use datafusion_sql::unparser::dialect::CustomD

[PR] Minor: Consolidate some integration tests into `core_integration` [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new pull request, #10588: URL: https://github.com/apache/datafusion/pull/10588 ## Which issue does this PR close? ## Rationale for this change In an effort to make it faster to develop and test datafusion , it would be nice if the resources required to run th

Re: [PR] Minor: Consolidate some integration tests into `core_integration` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10588: URL: https://github.com/apache/datafusion/pull/10588#discussion_r1607149443 ## datafusion/core/tests/custom_sources.rs: ## @@ -1,308 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one Review Comment: This was moved to

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1607151015 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1452,17 +1452,55 @@ class CometExpressionSuite extends CometTestBase with A

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10537: URL: https://github.com/apache/datafusion/pull/10537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] DataFusion reads Date32 and Date64 parquet statistics in as [datafusion]

2024-05-20 Thread via GitHub
edmondop commented on issue #10587: URL: https://github.com/apache/datafusion/issues/10587#issuecomment-2121024857 @alamb the title here doesn't make much sense, are you saying that the `min` and `max` are not extracted as Date32/Date64? -- This is an automated message from the Apache Git

[I] Pass per-field BigQuery `OPTIONS` values to the LogicalPlan's Arrow Schema [datafusion]

2024-05-20 Thread via GitHub
davisp opened a new issue, #10589: URL: https://github.com/apache/datafusion/issues/10589 ### Is your feature request related to a problem or challenge? I've been reading and learning the TableProvider APIs and have finally gotten around to taking a serious look at implementing suppor

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1607158744 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1607164180 ## docs/source/index.rst: ## @@ -58,7 +58,11 @@ as a native runtime to achieve improvement in terms of query efficiency and quer Comet Plugin Overv

[PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-20 Thread via GitHub
davisp opened a new pull request, #10590: URL: https://github.com/apache/datafusion/pull/10590 ## Which issue does this PR close? Closes #10589 ## Rationale for this change Provide per-column key/value options in the `CREATE EXTERN TABLE` statement. ## What changes

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-20 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2121051712 Also, for anyone more familiar with datafusion and/or sqlparser, one thing I wasn't 100% on was how to represent the metadata value. For now I've just called format on it, but I have

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-20 Thread via GitHub
tshauck commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1607185036 ## docs/source/index.rst: ## @@ -58,7 +58,11 @@ as a native runtime to achieve improvement in terms of query efficiency and quer Comet Plugin Overview

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-20 Thread via GitHub
tshauck commented on PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#issuecomment-2121071344 @andygrove Conflict should actually be fixed now, thanks @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-20 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1607197818 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

[PR] feat: eliminate group by constant optimizer rule [datafusion]

2024-05-20 Thread via GitHub
korowa opened a new pull request, #10591: URL: https://github.com/apache/datafusion/pull/10591 ## Which issue does this PR close? Closes #. ## Rationale for this change Initial intention was to improve clickbench q34 -- it contains aggregation by constant

Re: [PR] bug fix: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on code in PR #450: URL: https://github.com/apache/datafusion-comet/pull/450#discussion_r1607200546 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -533,11 +533,16 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelpe

Re: [PR] bug fix: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on code in PR #450: URL: https://github.com/apache/datafusion-comet/pull/450#discussion_r1607203214 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -533,11 +533,16 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelpe

Re: [PR] bug fix: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #450: URL: https://github.com/apache/datafusion-comet/pull/450#discussion_r1607216900 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -533,11 +533,16 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPl

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1607218060 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1607249542 ## spark/src/main/scala/org/apache/spark/sql/comet/DecimalPrecision.scala: ## @@ -107,11 +108,4 @@ object DecimalPrecision { case e => e }

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2121165967 @viirya Please take another look cc @andygrove @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove opened a new pull request, #453: URL: https://github.com/apache/datafusion-comet/pull/453 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/431 ## Rationale for this change Enable cast string to int as a

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-20 Thread via GitHub
appletreeisyellow commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1607253589 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -131,177 +126,190 @@ fn contains_grouping_set(expr: &[Expr]) -> bool { impl OptimizerRu

Re: [PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on code in PR #453: URL: https://github.com/apache/datafusion-comet/pull/453#discussion_r1607253762 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -82,7 +82,7 @@ macro_rules! cast_utf8_to_int { for i in 0..len { if $array.is

Re: [PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on code in PR #453: URL: https://github.com/apache/datafusion-comet/pull/453#discussion_r1607254539 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1029,34 +1021,22 @@ fn do_cast_string_to_int< type_name: &str, min_value: T, ) -> Comet

Re: [PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on code in PR #453: URL: https://github.com/apache/datafusion-comet/pull/453#discussion_r1607255075 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1070,7 +1050,7 @@ fn do_cast_string_to_int< if ch == '.' { if eval_m

Re: [PR] bug fix: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on code in PR #450: URL: https://github.com/apache/datafusion-comet/pull/450#discussion_r1607255755 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -533,11 +533,16 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelpe

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove merged PR #422: URL: https://github.com/apache/datafusion-comet/pull/422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-20 Thread via GitHub
appletreeisyellow commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1607264284 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -131,177 +126,190 @@ fn contains_grouping_set(expr: &[Expr]) -> bool { impl OptimizerRu

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-20 Thread via GitHub
appletreeisyellow commented on PR #10527: URL: https://github.com/apache/datafusion/pull/10527#issuecomment-2121182146 @alamb Thanks for the review! I have updated the code according to your feedback > I found whitespace blind diff easier to review: [#10527 (files)](https://github.co

Re: [I] DataFusion reads Date32 and Date64 parquet statistics in as Int32Array [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10587: URL: https://github.com/apache/datafusion/issues/10587#issuecomment-2121184299 Thanks for pointing that out @edmondop -- yes the min/max seem to be extracted as `Int32Array`s -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2121191619 What do we think about merging this PR and filing a follow on ticket to unify the APIs? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Docs: Update PR workflow documentation [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10532: URL: https://github.com/apache/datafusion/pull/10532#issuecomment-2121194261 I plan to incorporate the feedback on this PR, I just haven't had a chance yet. I hope to do so over the next few days -- This is an automated message from the Apache Git Service. To

[PR] test: add more tests for statistics reading [datafusion]

2024-05-20 Thread via GitHub
NGA-TRAN opened a new pull request, #10592: URL: https://github.com/apache/datafusion/pull/10592 ## Which issue does this PR close? More tests for https://github.com/apache/datafusion/issues/10453 ## Rationale for this change ## What changes are included in th

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
comphead merged PR #10304: URL: https://github.com/apache/datafusion/pull/10304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Sort Merge Join. LeftSemi issues [datafusion]

2024-05-20 Thread via GitHub
comphead closed issue #10379: Sort Merge Join. LeftSemi issues URL: https://github.com/apache/datafusion/issues/10379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra commented on code in PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#discussion_r1607363360 ## common/src/main/java/org/apache/comet/parquet/CometParquetToSparkSchemaConverter.scala: ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra commented on PR #453: URL: https://github.com/apache/datafusion-comet/pull/453#issuecomment-2121348910 So do you thing the perf improvement is because we are no longer trimming? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] test: Fix explain with exteded info comet test [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra commented on code in PR #436: URL: https://github.com/apache/datafusion-comet/pull/436#discussion_r1607380766 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1399,7 +1399,7 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

[I] chore [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra opened a new issue, #454: URL: https://github.com/apache/datafusion-comet/issues/454 ### Describe the bug Separate out extended explain info unit test into spark dependent and spark independent parts ### Steps to reproduce _No response_ ### Expected b

Re: [I] Cast String to Date ANSI Mode - Spark 3.2 - Mismatch between Spark and Comet Errors [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra commented on issue #440: URL: https://github.com/apache/datafusion-comet/issues/440#issuecomment-2121399387 Is this an issue of just a mismatch between error messages? Or is the cast actually not doing the right thing with Spark 3.2? -- This is an automated message from the

Re: [PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove commented on PR #453: URL: https://github.com/apache/datafusion-comet/pull/453#issuecomment-2121447145 > So do you thing the perf improvement is because we are no longer trimming? We are still trimming, but we are no longer performing the redundant conditional logic in the m

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-20 Thread via GitHub
huaxingao commented on code in PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#discussion_r1607409997 ## common/src/main/java/org/apache/comet/parquet/CometParquetToSparkSchemaConverter.scala: ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
timsaucer commented on PR #710: URL: https://github.com/apache/datafusion-python/pull/710#issuecomment-2121459794 It looks like CI is running correctly and also caching the data. I’ll rebase in the morning and get the PR ready to merge. -- This is an automated message from the Apache Git

Re: [PR] Remove `Expr::GetIndexedField`, replace `Expr::{field,index,range}` with `FieldAccessor`, `IndexAccessor`, and `SliceAccessor` [datafusion]

2024-05-20 Thread via GitHub
jayzhan211 commented on PR #10568: URL: https://github.com/apache/datafusion/pull/10568#issuecomment-2121463057 Thanks, @alamb ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Remove `Expr::GetIndexedField`, replace `Expr::{field,index,range}` with `FieldAccessor`, `IndexAccessor`, and `SliceAccessor` [datafusion]

2024-05-20 Thread via GitHub
jayzhan211 merged PR #10568: URL: https://github.com/apache/datafusion/pull/10568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Remove `Expr::GetIndexedField` and `GetFieldAccess` and always use function `get_field` for indexing [datafusion]

2024-05-20 Thread via GitHub
jayzhan211 closed issue #10374: Remove `Expr::GetIndexedField` and `GetFieldAccess` and always use function `get_field` for indexing URL: https://github.com/apache/datafusion/issues/10374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Using `Expr::field` panics [datafusion]

2024-05-20 Thread via GitHub
jayzhan211 closed issue #10565: Using `Expr::field` panics URL: https://github.com/apache/datafusion/issues/10565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] Implement a way to preserve partitioning through `UnionExec` without losing ordering [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10314: URL: https://github.com/apache/datafusion/issues/10314#issuecomment-2121468243 > Hi @alamb, I am trying to work on this. > > I am not very familiar on the `InterleaveExec` in the optimizer. As initial thought, the interleaveExec is acting as a **Repart

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1607425124 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: Thanks @tisonkun -- sounds like we should fix that upstream and then I can maybe make a ne

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2121484917 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] bug fix: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-20 Thread via GitHub
vaibhawvipul closed pull request #450: bug fix: Fix fuzz testcase for cast string to integer URL: https://github.com/apache/datafusion-comet/pull/450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

<    1   2   3   4   5   6   7   8   9   10   >