Re: [PR] chore: Upgrade to latest DataFusion [datafusion-comet]

2024-12-18 Thread via GitHub
andygrove commented on PR #1154: URL: https://github.com/apache/datafusion-comet/pull/1154#issuecomment-2552058716 > > Stop running the test suite with miri > > Hmmm Since this is not an official release yet, should we wait for the miri fix? Yes, I think we should. There is a

Re: [I] Compute ScalarFunction properties including `return_type` and `nullable` on creation [datafusion]

2024-12-18 Thread via GitHub
findepi commented on issue #13825: URL: https://github.com/apache/datafusion/issues/13825#issuecomment-2552051471 > Compute the `return_type` and `nullable` as early as possible. Correct, but not only that. We should clearly separate function calls where coercions need to be applied

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-18 Thread via GitHub
andygrove commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2552057681 The main thing we are waiting on for Comet is https://github.com/apache/datafusion/pull/13778 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] test: enabling Spark tests with offHeap requirement [datafusion-comet]

2024-12-18 Thread via GitHub
kazuyukitanimura merged PR #1177: URL: https://github.com/apache/datafusion-comet/pull/1177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
vbarua commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1890594630 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -21,23 +21,24 @@ use datafusion::arrow::array::{GenericListArray, MapArray}; use datafusion::arrow::dat

Re: [PR] test: enabling Spark tests with offHeap requirement [datafusion-comet]

2024-12-18 Thread via GitHub
kazuyukitanimura commented on PR #1177: URL: https://github.com/apache/datafusion-comet/pull/1177#issuecomment-2551862159 Merged, thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] chore: Add more criterion benchmarks for shuffle writer [datafusion-comet]

2024-12-18 Thread via GitHub
andygrove opened a new pull request, #1180: URL: https://github.com/apache/datafusion-comet/pull/1180 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1123 ## Rationale for this change With the metrics that were

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-18 Thread via GitHub
andygrove commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2552031463 > @andygrove are you willing to try and make this release later this week? Maybe we can rally people to agree on what we need for the next release and try and get the upgrades

Re: [I] [EPIC] Add support for all array expressions [datafusion-comet]

2024-12-18 Thread via GitHub
jatin510 commented on issue #1042: URL: https://github.com/apache/datafusion-comet/issues/1042#issuecomment-2552035359 @andygrove ``` select array(1, 2, 3) ; 24/12/19 00:13:47 WARN CometSparkSessionExtensions$CometExecRule: Comet cannot execute some parts of this plan nativel

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
findepi commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890695092 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .map(|

Re: [PR] [do not review] experimental support for lz4 compression (not working) [datafusion-comet]

2024-12-18 Thread via GitHub
andygrove commented on code in PR #1181: URL: https://github.com/apache/datafusion-comet/pull/1181#discussion_r1890865548 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1543,14 +1569,38 @@ pub(crate) fn write_ipc_compressed( // write ipc_length placeholder

[PR] [do-not-merge] Diff updated comet-parquet-exec feature branch against main [datafusion-comet]

2024-12-18 Thread via GitHub
mbutrovich opened a new pull request, #1182: URL: https://github.com/apache/datafusion-comet/pull/1182 @andygrove suggested it might be helpful to see what the `comet-parquet-exec` branch with main merged into it looks like against `upstream/main` to see if the diff looks reasonable. Please

[PR] [comet-parquet-exec] Merge upstream/main and resolve conflicts [datafusion-comet]

2024-12-18 Thread via GitHub
mbutrovich opened a new pull request, #1183: URL: https://github.com/apache/datafusion-comet/pull/1183 This reflects a merge of upstream/main (as of this morning) and then a resolution of the conflicts. This catches up comet-parquet-exec feature branch on about a month of changes, including

Re: [PR] [comet-parquet-exec] Merge upstream/main and resolve conflicts [datafusion-comet]

2024-12-18 Thread via GitHub
mbutrovich commented on PR #1183: URL: https://github.com/apache/datafusion-comet/pull/1183#issuecomment-2552324567 See https://github.com/apache/datafusion-comet/pull/1182 to see the diff of this branch versus upstream/main, which should give an idea of what comet-parquet-exec feature bra

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-18 Thread via GitHub
alamb commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2551216606 I think we should start pushing to make this release before it accumulates more API changes such as - https://github.com/apache/datafusion/pull/13823 @andygrove are you w

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
Blizzara commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1890180571 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -94,13 +102,467 @@ use substrait::proto::{ join_rel, plan_rel, r#type, read_rel::ReadType,

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
timsaucer commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890188171 ## datafusion/ffi/src/plan_properties.rs: ## @@ -220,50 +235,89 @@ impl TryFrom for PlanProperties { RErr(e) => Err(DataFusionError::Plan(e.to_strin

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890195326 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -488,95 +495,152 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan { self.properties().out

[PR] Minor: new_zero impl for Date32/64 [datafusion]

2024-12-18 Thread via GitHub
berkaysynnada opened a new pull request, #13828: URL: https://github.com/apache/datafusion/pull/13828 ## Which issue does this PR close? Closes #. ## Rationale for this change Just a minor change. Date32 and Date64 types can be created with new_zero just

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
timsaucer commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890183926 ## datafusion/ffi/src/plan_properties.rs: ## @@ -220,50 +235,89 @@ impl TryFrom for PlanProperties { RErr(e) => Err(DataFusionError::Plan(e.to_strin

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890197827 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -488,95 +495,152 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan { self.properties().out

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
berkaysynnada commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890178866 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -488,95 +495,152 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan { self.properties()

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
Blizzara commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1890179391 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -94,13 +102,400 @@ use substrait::proto::{ join_rel, plan_rel, r#type, read_rel::ReadType,

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on PR #13823: URL: https://github.com/apache/datafusion/pull/13823#issuecomment-2551270608 > So maybe we can figure out the FFI stuff, and then wait to merge this PR until 44 is relased. I will try hard to focus on the 44 release later this week Sure, it makes sens

Re: [PR] feat: Improve shuffle metrics (second attempt) [datafusion-comet]

2024-12-18 Thread via GitHub
mbutrovich commented on code in PR #1175: URL: https://github.com/apache/datafusion-comet/pull/1175#discussion_r1890341254 ## docs/source/user-guide/metrics.md: ## @@ -0,0 +1,62 @@ + + +# Comet Metrics + +## Spark SQL Metrics + +Set `spark.comet.metrics.detailed=true` to see all

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
jonahgao commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2551515840 `SELECT * FROM one.two.three.four.five` can also be resolved as: `catalog`: None `schema`: one.two.three.four `table`: five Not sure if they can be distinguished.

Re: [PR] feat: `parse_float_as_decimal` supports scientific notation and Decimal256 [datafusion]

2024-12-18 Thread via GitHub
jonahgao commented on code in PR #13806: URL: https://github.com/apache/datafusion/pull/13806#discussion_r1890331182 ## datafusion/sql/src/expr/value.rs: ## @@ -315,45 +321,84 @@ const fn try_decode_hex_char(c: u8) -> Option { } } -/// Parse Decimal128 from a string -///

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890992144 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890992144 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [I] Can not create a `List` of `FixedSizedList` in SQL [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on issue #13819: URL: https://github.com/apache/datafusion/issues/13819#issuecomment-2552554818 ``` D select [array_value(1,2,3)]; ┌───┐ │ main.list_value(array_value(1, 2, 3)) │ │ integer[3][] │

Re: [I] Compute ScalarFunction properties including `return_type` and `nullable` on creation [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on issue #13825: URL: https://github.com/apache/datafusion/issues/13825#issuecomment-2552558419 > from those where they shouldn't I think the example in the comment requires coercion. ExprSchemable::get_type is asking for the return type of the function. To compu

Re: [I] Compute ScalarFunction properties including `return_type` and `nullable` on creation [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on issue #13825: URL: https://github.com/apache/datafusion/issues/13825#issuecomment-2552559631 `ScalarFunction::new(udf: Arc, args: Vec)`. We might also need `schema`, need to take a look whether we have valid schema at this point -- This is an automated message fro

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890996124 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890998009 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890992144 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1890996124 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [I] DataFusion should support casting strings such as "4e7" to decimal [datafusion]

2024-12-18 Thread via GitHub
andygrove commented on issue #10315: URL: https://github.com/apache/datafusion/issues/10315#issuecomment-2552112849 At first glance, `parse_decimal` looks more mature and more efficient than `parse_string_to_decimal_native`. For example, `parse_string_to_decimal_native` has some expen

[I] OOM in `GroupedHashAggregateStream::group_aggregate_batch()` [datafusion]

2024-12-18 Thread via GitHub
avantgardnerio opened a new issue, #13831: URL: https://github.com/apache/datafusion/issues/13831 ### Describe the bug When attempting to accumulate large text fields with a `group by`, it was observed that `group_aggregate_batch()` can OOM despite ostensibly using the `MemoryPool`.

Re: [I] DataFusion should support casting strings such as "4e7" to decimal [datafusion]

2024-12-18 Thread via GitHub
himadripal commented on issue #10315: URL: https://github.com/apache/datafusion/issues/10315#issuecomment-2551863649 I'm working on this issue and need help here - I added test case to debug, at the moment string to decimal conversion is using [parse_string_to_decimal_native](https://git

Re: [PR] feat: Improve shuffle metrics (second attempt) [datafusion-comet]

2024-12-18 Thread via GitHub
andygrove merged PR #1175: URL: https://github.com/apache/datafusion-comet/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Fix BigQuery hyphenated ObjectName with numbers [datafusion-sqlparser-rs]

2024-12-18 Thread via GitHub
iffyio merged PR #1598: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] [do not review] experimental support for lz4 compression (not working) [datafusion-comet]

2024-12-18 Thread via GitHub
andygrove opened a new pull request, #1181: URL: https://github.com/apache/datafusion-comet/pull/1181 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Support arbitrary composite access expressions [datafusion-sqlparser-rs]

2024-12-18 Thread via GitHub
iffyio commented on code in PR #1600: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1600#discussion_r1890779913 ## tests/sqlparser_common.rs: ## @@ -12506,6 +12506,92 @@ fn parse_create_table_with_bit_types() { } } +#[test] +fn parse_composed_access_expr()

Re: [PR] Allow foreign table constraint without columns [datafusion-sqlparser-rs]

2024-12-18 Thread via GitHub
iffyio commented on code in PR #1608: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1608#discussion_r1890788634 ## src/parser/mod.rs: ## @@ -6830,7 +6830,15 @@ impl<'a> Parser<'a> { let columns = self.parse_parenthesized_column_list(Mandatory, fa

Re: [PR] feat: `parse_float_as_decimal` supports scientific notation and Decimal256 [datafusion]

2024-12-18 Thread via GitHub
jonahgao merged PR #13806: URL: https://github.com/apache/datafusion/pull/13806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: `parse_float_as_decimal` supports scientific notation and Decimal256 [datafusion]

2024-12-18 Thread via GitHub
jonahgao commented on PR #13806: URL: https://github.com/apache/datafusion/pull/13806#issuecomment-2552591655 Thank you @findepi for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: Ignore empty files in ListingTable when listing files with or without partition filters, as well as when inferring schema [datafusion]

2024-12-18 Thread via GitHub
goldmedal merged PR #13750: URL: https://github.com/apache/datafusion/pull/13750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] fix: Ignore empty files in ListingTable when listing files with or without partition filters, as well as when inferring schema [datafusion]

2024-12-18 Thread via GitHub
goldmedal commented on PR #13750: URL: https://github.com/apache/datafusion/pull/13750#issuecomment-2551718731 Thanks @Blizzara and @alamb for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
jonahgao commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2551730481 > Correct, my proposal was to basically establish a convention on how to distinguish them. Currently it returns an error, so adding this as a convention seemed relatively harml

Re: [I] Ignore empty (parquet) files when using ListingTable [datafusion]

2024-12-18 Thread via GitHub
goldmedal closed issue #13737: Ignore empty (parquet) files when using ListingTable URL: https://github.com/apache/datafusion/issues/13737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
Blizzara commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1890511596 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -21,23 +21,24 @@ use datafusion::arrow::array::{GenericListArray, MapArray}; use datafusion::arrow::d

Re: [PR] feat: Improve shuffle metrics (second attempt) [datafusion-comet]

2024-12-18 Thread via GitHub
andygrove commented on code in PR #1175: URL: https://github.com/apache/datafusion-comet/pull/1175#discussion_r1890516093 ## docs/source/user-guide/metrics.md: ## @@ -0,0 +1,62 @@ + + +# Comet Metrics + +## Spark SQL Metrics + +Set `spark.comet.metrics.detailed=true` to see all

[PR] wip: array remove [datafusion-comet]

2024-12-18 Thread via GitHub
jatin510 opened a new pull request, #1179: URL: https://github.com/apache/datafusion-comet/pull/1179 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
jayzhan-synnada opened a new pull request, #13823: URL: https://github.com/apache/datafusion/pull/13823 ## Which issue does this PR close? Closes #. ## Rationale for this change Existing `execution_mode` is not enough the represent different combination of execut

Re: [PR] replace CASE expressions in predicate pruning with boolean algebra [datafusion]

2024-12-18 Thread via GitHub
findepi commented on PR #13795: URL: https://github.com/apache/datafusion/pull/13795#issuecomment-2550668767 > I was afraid that writers might populate the statistics with a default value (e.g. `0`) if all of the rows are null instead of `null`. And that some other pass might then remove th

Re: [PR] Chore: Do not return empty record batches from streams [datafusion]

2024-12-18 Thread via GitHub
ozankabak merged PR #13794: URL: https://github.com/apache/datafusion/pull/13794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Theoretical integer overflow in `StringArrayBuilder` / `LargeStringArrayBuilder` [datafusion]

2024-12-18 Thread via GitHub
alamb closed issue #13796: Theoretical integer overflow in `StringArrayBuilder` / `LargeStringArrayBuilder` URL: https://github.com/apache/datafusion/issues/13796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Handle possible overflows in StringArrayBuilder / LargeStringArrayBuilder [datafusion]

2024-12-18 Thread via GitHub
alamb merged PR #13802: URL: https://github.com/apache/datafusion/pull/13802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Introduce a way to represent constrained statistics / bounds on values in Statistics [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on issue #8078: URL: https://github.com/apache/datafusion/issues/8078#issuecomment-2551052253 ```rust let input_intervals: Vec<&Interval> = ; // wrap input intervals with Statistics let temp_statistics = Statistics::new_from_bounds(&input_intervals); // com

[I] Add version checking to FFI crate [datafusion]

2024-12-18 Thread via GitHub
timsaucer opened a new issue, #13827: URL: https://github.com/apache/datafusion/issues/13827 ### Is your feature request related to a problem or challenge? This came up as part of this PR: https://github.com/apache/datafusion/pull/13823 As the FFI crate evolves we need some way

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
Blizzara commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1890181294 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -21,23 +21,24 @@ use datafusion::arrow::array::{GenericListArray, MapArray}; use datafusion::arrow::d

Re: [PR] fix: pruning by bloom filters for dictionary columns [datafusion]

2024-12-18 Thread via GitHub
adriangb commented on PR #13768: URL: https://github.com/apache/datafusion/pull/13768#issuecomment-2551319990 Thank you all for working on this. DataFusion is restoring my belief in open source one interaction at a time. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] replace CASE expressions in predicate pruning with boolean algebra [datafusion]

2024-12-18 Thread via GitHub
adriangb commented on PR #13795: URL: https://github.com/apache/datafusion/pull/13795#issuecomment-2551315475 So is that assumption incorrect? Then it's a bug in the current implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
berkaysynnada commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890174570 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -488,95 +495,152 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan { self.properties()

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
Blizzara commented on PR #13803: URL: https://github.com/apache/datafusion/pull/13803#issuecomment-2551239991 > I think it makes sense to do that, and I'm happy to own it, but I would prefer to do that as a followup. I structured the `consumer.rs` changes in order to make it easy to see tha

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890196572 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -488,95 +495,152 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan { self.properties().out

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1890196572 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -488,95 +495,152 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan { self.properties().out

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
jonahgao commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2551545523 > the dotted syntax is not only for resolving table names, it's also used for column resolution: Yes, this depends on the position: whether it's in the relation position

Re: [PR] feat: `parse_float_as_decimal` supports scientific notation and Decimal256 [datafusion]

2024-12-18 Thread via GitHub
findepi commented on code in PR #13806: URL: https://github.com/apache/datafusion/pull/13806#discussion_r1890381499 ## datafusion/sql/src/expr/value.rs: ## @@ -315,45 +321,84 @@ const fn try_decode_hex_char(c: u8) -> Option { } } -/// Parse Decimal128 from a string -///

Re: [PR] feat: `parse_float_as_decimal` supports scientific notation and Decimal256 [datafusion]

2024-12-18 Thread via GitHub
findepi commented on PR #13806: URL: https://github.com/apache/datafusion/pull/13806#issuecomment-2551546930 shipit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
phillipleblanc commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2551563666 > `SELECT * FROM one.two.three.four.five` can also be resolved as: `catalog`: None `schema`: one.two.three.four `table`: five > > Not sure if they can be distinguis

Re: [PR] Preserve constant values across union operations [datafusion]

2024-12-18 Thread via GitHub
gokselk commented on code in PR #13805: URL: https://github.com/apache/datafusion/pull/13805#discussion_r1890263071 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -3651,4 +3661,38 @@ mod tests { sort_expr } + +#[test] +fn test_union_cons

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on PR #13800: URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2551377966 > The question is who's responsible for providing this consistency. Is this a catalog or table provider (eg it should self-wrap in ResolvedCatalogProvider), or is it the engine i

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2024-12-18 Thread via GitHub
alamb commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2551382739 > FYI I am pretty sure I was seeing this with some of the sqlite test files, specifically https://github.com/Omega359/arrow-datafusion/blob/feature/sqllogictest_add_sqlite/datafus

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890277251 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890304069 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890302675 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890289732 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890306938 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890308118 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: add helpers for users with asynchornous catalogs [datafusion]

2024-12-18 Thread via GitHub
westonpace commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1890308118 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,764 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-18 Thread via GitHub
vbarua commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1890403551 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -21,23 +21,24 @@ use datafusion::arrow::array::{GenericListArray, MapArray}; use datafusion::arrow::dat

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
phillipleblanc commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2551578173 I'll leave this open for a day to see if anyone has a better idea and/or has an idea on how the column parsing could be handled and/or how to customize the `object name`

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-18 Thread via GitHub
Blizzara commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2551616532 (Just for what it's worth, I appreciate smaller releases more often compared to bigger ones less often, even if it means more more times I need to fix downstream stuff due to b

[I] ScalarFunctionExpr does not preserve the nullable flag on roundtrip [datafusion]

2024-12-18 Thread via GitHub
ccciudatu opened a new issue, #13829: URL: https://github.com/apache/datafusion/issues/13829 ### Describe the bug A physical plan that contains scalar functions will always set the nullable flag to `true` after deserialization. ### To Reproduce The `coalesce` function re

[PR] [bugfix] ScalarFunctionExpr does not preserve the nullable flag on roundtrip [datafusion]

2024-12-18 Thread via GitHub
ccciudatu opened a new pull request, #13830: URL: https://github.com/apache/datafusion/pull/13830 ## Which issue does this PR close? Closes #13829. ## Rationale for this change Preserve the nullable flag for scalar functions after a ser/deser roundtrip.

Re: [PR] Preserve constant values across union operations [datafusion]

2024-12-18 Thread via GitHub
gokselk commented on code in PR #13805: URL: https://github.com/apache/datafusion/pull/13805#discussion_r1889853843 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -3651,4 +3661,38 @@ mod tests { sort_expr } + +#[test] +fn test_union_cons

[I] Compute ScalarFunction properties including `return_type` and `nullable` on creation [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 opened a new issue, #13825: URL: https://github.com/apache/datafusion/issues/13825 ### Is your feature request related to a problem or challenge? Continue on the discussion from https://github.com/apache/datafusion/pull/13756#discussion_r1887762971 We create scalar f

Re: [PR] Adding node_id to ExecutionPlanProperties [datafusion]

2024-12-18 Thread via GitHub
berkaysynnada commented on PR #12186: URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2551158629 Apologies for the delay in responding. I have started working on this issue and will open a draft PR to facilitate discussion, FYI @emgeee. I plan to share it today or tomorrow

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
findepi commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2550662936 Ordinary (single level) schema name can contain a dot. > I've also considered concatenating the middle namspaces with `.` and not changing DataFusion, but that would requi

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
findepi commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1889834200 ## datafusion/ffi/src/plan_properties.rs: ## @@ -220,50 +235,89 @@ impl TryFrom for PlanProperties { RErr(e) => Err(DataFusionError::Plan(e.to_string(

Re: [PR] feat: support normalized expr in CSE [datafusion]

2024-12-18 Thread via GitHub
peter-toth commented on PR #13315: URL: https://github.com/apache/datafusion/pull/13315#issuecomment-2551100328 Yeah, this PR looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Document SQL dialect guidance [datafusion]

2024-12-18 Thread via GitHub
findepi commented on code in PR #13706: URL: https://github.com/apache/datafusion/pull/13706#discussion_r1889798457 ## docs/source/user-guide/sql/dialect.md: ## @@ -0,0 +1,53 @@ + + +# SQL Dialect + +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL

Re: [I] Support any table nesting level in SQL queries (i.e `SELECT * FROM one.two.three.four.five`) [datafusion]

2024-12-18 Thread via GitHub
phillipleblanc commented on issue #13822: URL: https://github.com/apache/datafusion/issues/13822#issuecomment-2550609611 I'm happy to work on this implementation, assuming I get consensus that this is a good idea. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Preserve constant values across union operations [datafusion]

2024-12-18 Thread via GitHub
ozankabak commented on PR #13805: URL: https://github.com/apache/datafusion/pull/13805#issuecomment-2550590605 I wonder if we should change `across_partitions` to an `enum`; i.e. ```rust enum PartitionValues { Uniform(Option), Heterogenous(Option>) } ``` with

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2024-12-18 Thread via GitHub
findepi commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-2550624455 > I would love to figure out how to break the datafusion core crate into smaller pieces / crates that can be compiled in parallel yes! and move around sqlparser dependency

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
findepi commented on PR #13823: URL: https://github.com/apache/datafusion/pull/13823#issuecomment-2550715538 Thanks for expanding docstrings in JoinType. Can you please move them to separate PR? This would help with review and merging. -- This is an automated message from the Apache Gi

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-18 Thread via GitHub
findepi commented on code in PR #13823: URL: https://github.com/apache/datafusion/pull/13823#discussion_r1889838140 ## datafusion/common/src/join_type.rs: ## @@ -28,21 +28,26 @@ use crate::{DataFusionError, Result}; /// Join type #[derive(Debug, Clone, Copy, PartialEq, Eq, Par

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1889979703 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-18 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1889979703 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

  1   2   >