Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]
varun-bhardwaj-sde commented on PR #14792: URL: https://github.com/apache/datafusion/pull/14792#issuecomment-2680505232 it was closed by misktake i raised another PR https://github.com/apache/datafusion/pull/14864 for this task -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]
varun-bhardwaj-sde commented on PR #14864: URL: https://github.com/apache/datafusion/pull/14864#issuecomment-2680507929 @niebayes can you please review this once and guide me if I am doing any mistake here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]
varun-bhardwaj-sde opened a new pull request, #14864: URL: https://github.com/apache/datafusion/pull/14864 ## Which issue does this PR close? - Closes #14705. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ## Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]
bharath-techie commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2680694026 Thanks @chenkovsky for confirming. We are new to datafusion , but at high level looks like this feature will need a deeper integration in the ParquetExec flow and we might need changes in `ParquetRecordBatchStream` in `arrow-rs` as it performs pruning and at datafusion layer we might not be able to figure out the actual row ids because of it. Experts can comment on this / see if there are any other ways that they can think of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]
ozankabak commented on PR #14735: URL: https://github.com/apache/datafusion/pull/14735#issuecomment-2680697590 Indeed we will open an epic to plan the remaining statistics infrastructure work. I plan to put something together today/tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]
zjregee opened a new pull request, #14865: URL: https://github.com/apache/datafusion/pull/14865 ## Which issue does this PR close? - Part of #14759. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? Yes. ## Are there any user-facing changes? None. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]
zjregee commented on PR #14865: URL: https://github.com/apache/datafusion/pull/14865#issuecomment-2680958642 I think this PR is ready for review now. cc: @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]
zjregee commented on PR #14812: URL: https://github.com/apache/datafusion/pull/14812#issuecomment-2680962268 I think this PR is ready for review now. cc: @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add support column prefix index for MySQL [datafusion-sqlparser-rs]
zzzdong closed pull request #1732: Add support column prefix index for MySQL URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add support column prefix index for MySQL [datafusion-sqlparser-rs]
zzzdong commented on PR #1732: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1732#issuecomment-2680992482 Considering the work on the index parsing in #1707, this will be closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969168798 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +168,10 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sort expects one to three arguments"); } +if args.iter().any(|array| array.logical_null_count() > 0) { Review Comment: ``` D select array_sort([1,2,3], null); ┌┐ │ array_sort(main.list_value(1, 2, 3), NULL) │ │ int32│ ├┤ ││ └┘ ``` We need to check 2nd arg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]
clflushopt commented on PR #14735: URL: https://github.com/apache/datafusion/pull/14735#issuecomment-2680365175 @alamb any suggestions on what to improve here vis-a-vis documentation or the example since #14699 has been merged ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Migrate Datetime functions to `invoke_with_args` [datafusion]
varun-bhardwaj-sde commented on issue #14705: URL: https://github.com/apache/datafusion/issues/14705#issuecomment-2680482983 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]
clflushopt commented on PR #14735: URL: https://github.com/apache/datafusion/pull/14735#issuecomment-2680374034 @alamb @ozankabak do we plan to have open issues for the follow up changes described in #14699 ? I am specifically trying to figure out whether we should address those items as part of #3929 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] ExternalSorter Fails to Spill Dictionaries [datafusion]
davidhewitt commented on issue #4658: URL: https://github.com/apache/datafusion/issues/4658#issuecomment-2680773931 Ok great, I'll work on that today 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]
Arpit-Bandejiya commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2680785661 Found this PR in arrow-rs : https://github.com/apache/arrow-rs/pull/6624 . @XiangpengHao I see the PR is in draft from sometime. Is there any other we are trying to do it? Can you please share it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Store spans for Value expressions [datafusion-sqlparser-rs]
iffyio commented on code in PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#discussion_r1969034324 ## tests/sqlparser_bigquery.rs: ## @@ -29,19 +29,19 @@ use test_utils::*; #[test] fn parse_literal_string() { let sql = concat!( -"SELECT ", -"'single', ", -r#""double", "#, -"'''triple-single''', ", -r#triple-double""", "#, -r#"'single\'escaped', "#, -r#"'''triple-single\'escaped''', "#, -r#"'''triple-single'unescaped''', "#, -r#""double\"escaped", "#, -r#triple-double\"escaped""", "#, -r#triple-double"unescaped""", "#, -r#triple-double'unescaped""", "#, -r#"'''triple-single"unescaped'''"#, +"SELECT ",// line 1, column 1 +"'single', ", // line 1, column 7 +r#""double", "#, // line 1, column 14 +"'''triple-single''', ", // line 1, column 22 +r#triple-double""", "#, // line 1, column 33 +r#"'single\'escaped', "#, // line 1, column 43 +r#"'''triple-single\'escaped''', "#, // line 1, column 55 +r#"'''triple-single'unescaped''', "#, // line 1, column 68 +r#""double\"escaped", "#, // line 1, column 83 +r#triple-double\"escaped""", "#, // line 1, column 92 +r#triple-double"unescaped""", "#, // line 1, column 105 +r#triple-double'unescaped""", "#, // line 1, column 118 +r#"'''triple-single"unescaped'''"#, // line 1, column 131 Review Comment: do the added comments here need cleanup or do they serve a purpose? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Store spans for Value expressions [datafusion-sqlparser-rs]
iffyio merged PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969021807 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +168,10 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sort expects one to three arguments"); } +if args.iter().any(|array| array.logical_null_count() > 0) { Review Comment: I think we still need to check for `args[1..]` for null `desc` or `nulls_first`? but to be align with the sort options, I think checking `is_null(0)` should be enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: adjust create and drop trigger for mysql dialect [datafusion-sqlparser-rs]
iffyio merged PR #1734: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add support column prefix index for MySQL [datafusion-sqlparser-rs]
iffyio commented on code in PR #1732: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1732#discussion_r1969053806 ## src/parser/mod.rs: ## @@ -7646,6 +7646,39 @@ impl<'a> Parser<'a> { } } +pub fn parse_index_exprs(&mut self) -> Result, ParserError> { +self.parse_parenthesized(|p| p.parse_comma_separated(Parser::parse_index_expr)) +} + +pub fn parse_index_expr(&mut self) -> Result { +let expr = self.parse_expr()?; +let collation = if self.parse_keyword(Keyword::COLLATE) { +Some(self.parse_object_name(false)?) +} else { +None +}; + +let (operator_class, order_options) = if self.peek_keyword(Keyword::ASC) +|| self.peek_keyword(Keyword::DESC) +|| self.peek_keyword(Keyword::NULLS) +{ +let order_options = self.parse_order_by_options()?; +(None, order_options) +} else { +let operator_class = self.maybe_parse(|p| p.parse_expr())?; Review Comment: hmm I couldn't see `operator_class` being used in the introduced tests or documented so its unclear what the syntax is? ## src/parser/mod.rs: ## @@ -7646,6 +7646,39 @@ impl<'a> Parser<'a> { } } +pub fn parse_index_exprs(&mut self) -> Result, ParserError> { +self.parse_parenthesized(|p| p.parse_comma_separated(Parser::parse_index_expr)) +} + +pub fn parse_index_expr(&mut self) -> Result { Review Comment: Since this is flagged as a public function can we add a short description mentioning what it does? ## src/parser/mod.rs: ## @@ -7646,6 +7646,39 @@ impl<'a> Parser<'a> { } } +pub fn parse_index_exprs(&mut self) -> Result, ParserError> { +self.parse_parenthesized(|p| p.parse_comma_separated(Parser::parse_index_expr)) +} + +pub fn parse_index_expr(&mut self) -> Result { +let expr = self.parse_expr()?; +let collation = if self.parse_keyword(Keyword::COLLATE) { +Some(self.parse_object_name(false)?) +} else { +None +}; + +let (operator_class, order_options) = if self.peek_keyword(Keyword::ASC) +|| self.peek_keyword(Keyword::DESC) +|| self.peek_keyword(Keyword::NULLS) Review Comment: can we introduce a test case that covers the `ASC/DESC` and `nulls_first` syntax? ## src/parser/mod.rs: ## @@ -14688,6 +14721,8 @@ impl Word { #[cfg(test)] mod tests { +use std::vec; + Review Comment: hmm is this needed? ## src/parser/mod.rs: ## @@ -7646,6 +7646,39 @@ impl<'a> Parser<'a> { } } +pub fn parse_index_exprs(&mut self) -> Result, ParserError> { +self.parse_parenthesized(|p| p.parse_comma_separated(Parser::parse_index_expr)) +} + +pub fn parse_index_expr(&mut self) -> Result { +let expr = self.parse_expr()?; +let collation = if self.parse_keyword(Keyword::COLLATE) { +Some(self.parse_object_name(false)?) +} else { +None +}; + +let (operator_class, order_options) = if self.peek_keyword(Keyword::ASC) +|| self.peek_keyword(Keyword::DESC) +|| self.peek_keyword(Keyword::NULLS) Review Comment: Instead of using `peek`, we can probably use `self.maybe_parse()` so that we don't need to maintain this list of keywords in sync with the `parse_order_by_options` function e.g. ```rust let (operator_class, order_options) = if let Some(order_options ) = self.maybe_parse(|p| p.parse_order_by_options())? { (None, order_options) } else { let operator_class = self.maybe_parse(|p| p.parse_expr())?; let order_options = self.parse_order_by_options()?; (operator_class, order_options) }; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Parse signed/unsigned integer data type in MySQL CAST [datafusion-sqlparser-rs]
iffyio commented on code in PR #1739: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1739#discussion_r1969077053 ## src/ast/data_type.rs: ## @@ -238,6 +238,26 @@ pub enum DataType { UnsignedBigInt(Option), /// Unsigned Int8 with optional display width e.g. INT8 UNSIGNED or INT8(11) UNSIGNED UnsignedInt8(Option), +/// Signed integer as used in [MySQL CAST] target types, with optional `INTEGER` suffix: +/// `SIGNED [INTEGER]` +/// +/// Note that this doesn't accept a display width and is reversed from the syntax used in column +/// definitions ([`DataType::Int`]): `INTEGER [SIGNED]` +/// +/// Semantically equivalent to `BIGINT`. +/// +/// [MySQL CAST]: https://dev.mysql.com/doc/refman/8.4/en/cast-functions.html +Signed(bool), Review Comment: thinking it feels a unexpected to use `Signed(true)` to represent `SIGNED INTEGER`? and the repr assumes a mysql-only type which might not be the case going forward. Feels like it'd be clear to represent them explicitly as `SIGNED` and `SIGNED INTEGER` where both can be reused by other dialects later if needed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Enable Dataframe to be converted into views which can be used in register_table [datafusion-python]
kosiew commented on code in PR #1016: URL: https://github.com/apache/datafusion-python/pull/1016#discussion_r1969081210 ## src/dataframe.rs: ## @@ -156,6 +174,22 @@ impl PyDataFrame { PyArrowType(self.df.schema().into()) } +/// Convert this DataFrame into a Table that can be used in register_table +/// By convention, into_... methods consume self and return the new object. +/// Disabling the clippy lint, so we can use &self +/// because we're working with Python bindings +/// where objects are shared +#[allow(clippy::wrong_self_convention)] +fn into_view(&self) -> PyDataFusionResult { +// Call the underlying Rust DataFrame::into_view method. +// Note that the Rust method consumes self; here we clone the inner Arc +// so that we don’t invalidate this PyDataFrame. +let table_provider = self.df.as_ref().clone().into_view(); +let table_provider = PyTableProvider::new(table_provider); Review Comment: ```suggestion let table_provider = PyTableProvider::new(table_provider); ``` ## src/dataframe.rs: ## @@ -156,6 +174,22 @@ impl PyDataFrame { PyArrowType(self.df.schema().into()) } +/// Convert this DataFrame into a Table that can be used in register_table +/// By convention, into_... methods consume self and return the new object. +/// Disabling the clippy lint, so we can use &self +/// because we're working with Python bindings +/// where objects are shared +#[allow(clippy::wrong_self_convention)] +fn into_view(&self) -> PyDataFusionResult { +// Call the underlying Rust DataFrame::into_view method. +// Note that the Rust method consumes self; here we clone the inner Arc +// so that we don’t invalidate this PyDataFrame. +let table_provider = self.df.as_ref().clone().into_view(); +let table_provider = PyTableProvider::new(table_provider); Review Comment: ```suggestion let table_provider = PyTableProvider::new(table_provider); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] ExternalSorter Fails to Spill Dictionaries [datafusion]
tustvold commented on issue #4658: URL: https://github.com/apache/datafusion/issues/4658#issuecomment-2679171994 > I assume by row format you mean [arrow-row](https://arrow.apache.org/rust/arrow_row/index.html), however it's not clear to me if there's a standard way to serialize these to a file. I could create something simple which just writes the streams of rows as binary (probably a length then the bytes, repeated?). I suspect this is what I was going for, although it was 2 years ago so can't confess to really remembering and a lot has likely changed since then. > It looks like an alternative to using the row format in datafusion might be to support delta-encoded dictionaries in arrow-rs. https://github.com/apache/arrow-rs/issues/6783 This is likely an approach, although it would involve re-encoding the dictionaries as there is no mechanism to remap an existing key. IMO I'd be tempted to start a discussion on the mailing list about why this constraint exists, it seems somewhat nonsensical to me given that the footer identifies where the dictionary blocks are, and therefore it is trivial to determine which dictionary to use. If anything the support for delta dictionaries is more surprising... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Re-organize shuffle writer code [datafusion-comet]
mbutrovich commented on PR #1439: URL: https://github.com/apache/datafusion-comet/pull/1439#issuecomment-2679374261 This is a great help as we look to improve the shuffle performance. Thanks @andygrove! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: migrate invoke_batch to invoke_with_args for unicode function [datafusion]
alamb commented on PR #14856: URL: https://github.com/apache/datafusion/pull/14856#issuecomment-2679749126 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Create an MSRV policy in this crate [datafusion-sqlparser-rs]
mvzink commented on issue #1744: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1744#issuecomment-2679756348 From looking into #1612, some test code uses associated type bounds ([stabilized in 1.79](https://github.com/rust-lang/rust/pull/122055/)). It could be rewritten if an earlier MSRV is desired (1.75 worked without it). That's moot if following a more progressive policy like DataFusion though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add `range` table function [datafusion]
alamb commented on PR #14830: URL: https://github.com/apache/datafusion/pull/14830#issuecomment-2679748456 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: migrate invoke_batch to invoke_with_args for unicode function [datafusion]
alamb merged PR #14856: URL: https://github.com/apache/datafusion/pull/14856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add `range` table function [datafusion]
alamb merged PR #14830: URL: https://github.com/apache/datafusion/pull/14830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Migrate Unicode function to `invoke_with_args` [datafusion]
alamb closed issue #14709: Migrate Unicode function to `invoke_with_args` URL: https://github.com/apache/datafusion/issues/14709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]
alamb commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1968480652 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requirement(&parent_reqs); if is_sort(plan) { -let sort_fetch = plan.fetch(); -let required_ordering = plan +let current_sort_fetch = plan.fetch(); +let parent_req_fetch = requirements.data.fetch; + +let child_reqs = plan Review Comment: I found the overloading of "requirements" confusing here -- there are `LexRequirements` and `requirements: SortPushdown` Perhaps we could rename `requirements` to `sort_pushdown` to mirror the argument -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]
alamb commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1968482276 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requirement(&parent_reqs); if is_sort(plan) { -let sort_fetch = plan.fetch(); -let required_ordering = plan +let current_sort_fetch = plan.fetch(); +let parent_req_fetch = requirements.data.fetch; + +let child_reqs = plan .output_ordering() .cloned() .map(LexRequirement::from) .unwrap_or_default(); -if !satisfy_parent { -// Make sure this `SortExec` satisfies parent requirements: -let sort_reqs = requirements.data.ordering_requirement.unwrap_or_default(); -// It's possible current plan (`SortExec`) has a fetch value. -// And if both of them have fetch values, we should use the minimum one. -if let Some(fetch) = sort_fetch { -if let Some(requirement_fetch) = requirements.data.fetch { -requirements.data.fetch = Some(fetch.min(requirement_fetch)); -} -} -let fetch = requirements.data.fetch.or(sort_fetch); +let parent_is_stricter = plan +.equivalence_properties() +.requirements_compatible(&parent_reqs, &child_reqs); Review Comment: I don't see where these `requirements_compatible` calls come from. Is there an existing location? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] DeltaLake integration not working (Python) [datafusion]
mag1cfrog commented on issue #14842: URL: https://github.com/apache/datafusion/issues/14842#issuecomment-2679777967 not sure really relevant, but I'm also having some compilation issue when trying to use deltalake with datafusion in Rust. Here's what I get: ```bash Compiling datafusion v44.0.0 Compiling datafusion-proto v44.0.0 Compiling deltalake-core v0.24.0 error[E0308]: mismatched types --> /home/hanbo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/deltalake-core-0.24.0/src/kernel/models/actions.rs:567:9 | 566 | fn try_from(value: &TableFeatures) -> Result { | - expected `Result` because of return type 567 | ReaderFeatures::try_from(value.as_ref()) | expected `strum::ParseError`, found a different `strum::ParseError` | = note: `strum::ParseError` and `strum::ParseError` have similar names, but are actually distinct types note: `strum::ParseError` is defined in crate `strum` --> /home/hanbo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/strum-0.26.3/src/lib.rs:42:1 | 42 | pub enum ParseError { | ^^^ note: `strum::ParseError` is defined in crate `strum` --> /home/hanbo/.cargo/git/checkouts/strum-222d21164b266718/cc240c3/strum/src/lib.rs:42:1 | 42 | pub enum ParseError { | ^^^ = note: perhaps two different versions of crate `strum` are being used? error[E0308]: mismatched types --> /home/hanbo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/deltalake-core-0.24.0/src/kernel/models/actions.rs:575:9 | 574 | fn try_from(value: &TableFeatures) -> Result { | - expected `Result` because of return type 575 | WriterFeatures::try_from(value.as_ref()) | expected `strum::ParseError`, found a different `strum::ParseError` | = note: `strum::ParseError` and `strum::ParseError` have similar names, but are actually distinct types note: `strum::ParseError` is defined in crate `strum` --> /home/hanbo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/strum-0.26.3/src/lib.rs:42:1 | 42 | pub enum ParseError { | ^^^ note: `strum::ParseError` is defined in crate `strum` --> /home/hanbo/.cargo/git/checkouts/strum-222d21164b266718/cc240c3/strum/src/lib.rs:42:1 | 42 | pub enum ParseError { | ^^^ = note: perhaps two different versions of crate `strum` are being used? For more information about this error, try `rustc --explain E0308`. error: could not compile `deltalake-core` (lib) due to 2 previous errors ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Remove the need for registering an ObjectStore for remote files [datafusion-python]
robtandy commented on issue #899: URL: https://github.com/apache/datafusion-python/issues/899#issuecomment-2679361030 @kylebarron What are your thoughts on how to approach this? I'm happy to try to address it and submit a PR for it. I'd like the same thing for github.com/apache/datafusion-ray, as after the rewrite, I need to add support for object stores back in and I'd love it to be consistent with how it will work in datafusion-python. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Strip debuginfo symbols for release [datafusion]
comphead commented on code in PR #14843: URL: https://github.com/apache/datafusion/pull/14843#discussion_r1968211931 ## Cargo.toml: ## @@ -159,19 +159,20 @@ url = "2.5.4" [profile.release] codegen-units = 1 lto = true +debug = false Review Comment: Agree -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Parse signed/unsigned integer data types correctly in MySQL CAST [datafusion-sqlparser-rs]
mvzink commented on code in PR #1739: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1739#discussion_r1968070815 ## src/ast/mod.rs: ## @@ -798,9 +798,15 @@ pub enum Expr { kind: CastKind, expr: Box, data_type: DataType, -// Optional CAST(string_expression AS type FORMAT format_string_expression) as used by BigQuery -// https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#formatting_syntax +/// Optional CAST(string_expression AS type FORMAT format_string_expression) as used by [BigQuery] +/// +/// [BigQuery]: https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#formatting_syntax format: Option, +/// Whether this was parsed as a [MySQL]-style cast, which has a different syntax for Review Comment: I simplified this patch to take that approach and not introduce the error path for MySQL (leaving existing permissive parsing in place, since I noticed that's already the approach taken with other datatypes and dialects). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Strip debuginfo symbols for release [datafusion]
comphead commented on PR #14843: URL: https://github.com/apache/datafusion/pull/14843#issuecomment-2679541024 Hey @rkrishn7 this is a good call yes, although DF does not catch panics, we cannot say the for the dependencies. I'll experiment later with panic mode for DF crate level only, leaving other dependencies compile flags intact. But for this PR it is safer to remove abort on panics -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Improve performance of join building performance (up to 1.28x speedup) [datafusion]
Dandandan opened a new pull request, #14861: URL: https://github.com/apache/datafusion/pull/14861 ## Which issue does this PR close? - Closes #14859 ## Rationale for this change Performance improvements for join (up to 1.28x) ``` Benchmark tpch_mem_sf10.json ┏━━┳━━━┳━┳━━━┓ ┃ Query┃ main ┃ join_vectorization3 ┃Change ┃ ┡━━╇━━━╇━╇━━━┩ │ QQuery 1 │ 770.46ms │796.52ms │ no change │ │ QQuery 2 │ 125.30ms │106.59ms │ +1.18x faster │ │ QQuery 3 │ 259.14ms │220.22ms │ +1.18x faster │ │ QQuery 4 │ 134.04ms │122.23ms │ +1.10x faster │ │ QQuery 5 │ 525.49ms │456.57ms │ +1.15x faster │ │ QQuery 6 │ 45.93ms │ 42.05ms │ +1.09x faster │ │ QQuery 7 │ 1079.09ms │905.24ms │ +1.19x faster │ │ QQuery 8 │ 336.17ms │296.16ms │ +1.14x faster │ │ QQuery 9 │ 881.10ms │687.21ms │ +1.28x faster │ │ QQuery 10│ 406.44ms │353.14ms │ +1.15x faster │ │ QQuery 11│ 94.73ms │ 82.00ms │ +1.16x faster │ │ QQuery 12│ 280.55ms │250.40ms │ +1.12x faster │ │ QQuery 13│ 292.29ms │256.12ms │ +1.14x faster │ │ QQuery 14│ 47.03ms │ 43.25ms │ +1.09x faster │ │ QQuery 15│ 137.05ms │117.04ms │ +1.17x faster │ │ QQuery 16│ 94.72ms │ 82.67ms │ +1.15x faster │ │ QQuery 17│ 910.07ms │717.32ms │ +1.27x faster │ │ QQuery 18│ 3759.48ms │ 3025.62ms │ +1.24x faster │ │ QQuery 19│ 183.15ms │167.47ms │ +1.09x faster │ │ QQuery 20│ 242.99ms │195.70ms │ +1.24x faster │ │ QQuery 21│ 1582.68ms │ 1292.53ms │ +1.22x faster │ │ QQuery 22│ 95.75ms │ 97.61ms │ no change │ └──┴───┴─┴───┘ ┏┳┓ ┃ Benchmark Summary ┃┃ ┡╇┩ │ Total Time (main) │ 12283.64ms │ │ Total Time (join_vectorization3) │ 10313.65ms │ │ Average Time (main)│ 558.35ms │ │ Average Time (join_vectorization3) │ 468.80ms │ │ Queries Faster │ 20 │ │ Queries Slower │ 0 │ │ Queries with No Change │ 2 │ └┴┘ Note: Skipping /Users/danielheres/Code/arrow-datafusion-personal/benchmarks/results/main/tpch_sf1.json as /Users/danielheres/Code/arrow-datafusion-personal/benchmarks/results/join_vectorization3/tpch_sf1.json does not exist Benchmark tpch_sf10.json ┏━━┳━━━┳━┳━━━┓ ┃ Query┃ main ┃ join_vectorization3 ┃Change ┃ ┡━━╇━━━╇━╇━━━┩ │ QQuery 1 │ 981.31ms │946.03ms │ no change │ │ QQuery 2 │ 157.88ms │144.81ms │ +1.09x faster │ │ QQuery 3 │ 475.15ms │409.54ms │ +1.16x faster │ │ QQuery 4 │ 240.46ms │219.94ms │ +1.09x faster │ │ QQuery 5 │ 700.03ms │604.97ms │ +1.16x faster │ │ QQuery 6 │ 157.46ms │149.52ms │ +1.05x faster │ │ QQuery 7 │ 1072.19ms │881.00ms │ +1.22x faster │ │ QQuery 8 │ 740.30ms │663.77ms │ +1.12x faster │ │ QQuery 9 │ 1189.08ms │984.28ms │ +1.21x faster │ │ QQuery 10│ 666.24ms │605.10ms │ +1.10x faster │ │ QQuery 11│ 101.91ms │ 98.58ms │ no change │ │ QQuery 12│ 338.89ms │317.19ms │ +1.07x faster │ │ QQuery 13│ 475.00ms │418.44ms │ +1.14x faster │ │ QQuery 14│ 266.15ms │245.19ms │ +1.09x faster │ │ QQuery 15│ 442.98ms │403.32ms │ +1.10x faster │ │ QQuery 16│ 111.29ms │107.48ms │ no change │ │ QQuery 17│ 1249.82ms │ 1031.22ms │ +1.21x faster │ │ QQuery 18│ 2052.52ms │ 1672.16ms │ +1.23x faster │ │ QQuery 19│ 464.19ms │437.39ms │ +1.06x faster │ │ QQuery 20│ 470.89ms │391.63ms │ +1.20x faster │ │ QQuery 21│ 1637.42ms │ 1390.00ms │ +1.18x faster │ │ QQuery 22│ 156.41ms │148.43ms │ +1.05x faster │ └──┴───┴─┴───┘ ┏┳┓ ┃ Benchmark Summary
Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]
anlinc commented on PR #14553: URL: https://github.com/apache/datafusion/pull/14553#issuecomment-2679664474 @Blizzara @alamb I am closing this in favor of the latest iteration here: https://github.com/apache/datafusion/pull/14860, which addresses the discussions in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]
anlinc closed pull request #14553: fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait URL: https://github.com/apache/datafusion/pull/14553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Support User-Defined Sorting [datafusion]
tobixdev commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2679679211 So i dug a little deeper on how we could implement this functionality. ### arrow-rs Firstly, I think we require two "flavors" of sorting - one for `arrow-ord` and one for `arrow-row` as DataFusion uses both of these APIs. AFAIK, in `arrow-rs` we don't have a "user defined type" registry that we could leverage. So when a user calls, for example, `lexsort`, there is also no way to "lookup" whether we have a user defined type that can be applied to a particular column. Therefore, we must pass in the information on how to sort a column to the sorting procedure. One possible way for getting this information into the called functions is to extend `SortField` (`arrow-row`) and `SortColumn` (`arrow-ord`). I think this extension could happen in one of two ways: 1. *Provide an Implementation*: In this approach users directly provide an implementation. In `arrow-row` this could be a custom byte encoder, in `arrow-ord` this could be a function that maps to a `Ordering`. 2. *Provide a User Defined Type*: In this approach users provide a user defined type. Somehow we would then need the ability to deduce the implementations from 1. based on the given type. Maybe one way to go forward is implement 1. and then implement 2. if we think this is sensible. I am a bit unsure on how we can directly use the arrow extension types for this use case. If anyone has a better idea here, I'd love to hear your take on that (@mbrobbel maybe you have an opinion if this is within the scope of arrow extension types). ### DataFusion Here I think we should go with user defined types and attach the sorting information to that type. This should be possible as we can have a "central registry" (e.g., `SessionContext`) that holds all available user defined types and we can look up whether this particular column has a user defined type with a user defined ordering. If we have a use case, we could also think of adding the ability to override this ordering behavior. I don't have more details on how we could implement this. I think trying to implement 1. in arrow-rs is the first step, and then we should check on how we can connect that to user defined types in DataFusion. If this sounds good, I can start to work on a draft. Maybe lmk if you think that extending `SortField` and `SortColumn` is a reasonable approach here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Store spans for Value expressions [datafusion-sqlparser-rs]
alamb commented on code in PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#discussion_r1968440688 ## src/ast/mod.rs: ## @@ -8789,9 +8796,9 @@ mod tests { #[test] fn test_interval_display() { let interval = Expr::Interval(Interval { -value: Box::new(Expr::Value(Value::SingleQuotedString(String::from( -"123:45.67", -, +value: Box::new(Expr::Value( Review Comment: minor stuff we could use the `Expr::value` here too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[I] Create an MSRV policy in this crate [datafusion-sqlparser-rs]
alamb opened a new issue, #1744: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1744 TLDR woudl be: 1. Define a MSRV policy (maybe copy the existing DataFusion one) 2. Implement some sort of CI check (again can copy / reuse the datafusion one if we want) 3. 🤔 interestingly we don't seem to have an MSRV policy in this crate (at least not that I could find in https://github.com/apache/datafusion-sqlparser-rs/blob/main/README.md) We do have such a thing in DataFusion here: https://github.com/apache/datafusion?tab=readme-ov-file#rust-version-compatibility-policy Maybe it is worth considering adding some documentation / policy in the sqlparser crate too 🤔 _Originally posted by @alamb in https://github.com/apache/datafusion-sqlparser-rs/issues/1736#issuecomment-2676808035_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Improve performance of join building performance (up to 1.28x speedup) [datafusion]
Dandandan closed pull request #14861: Improve performance of join building performance (up to 1.28x speedup) URL: https://github.com/apache/datafusion/pull/14861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Improve join building performance [datafusion]
Dandandan closed issue #14859: Improve join building performance URL: https://github.com/apache/datafusion/issues/14859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]
bharath-techie commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2679245307 Hi @chenkovsky , Thanks a ton for quick POC on this. :) The row ids seems to be specific to each batch and not across the entire parquet file - is my understanding correct ? Reason is our use case will mainly benefit from parquet file level row ids. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] [WIP] Move `FileSourceConfig` and `FileStream` to the new `datafusion-datasource` [datafusion]
AdamGS commented on code in PR #14838: URL: https://github.com/apache/datafusion/pull/14838#discussion_r1968175868 ## datafusion/common/src/test_util.rs: ## @@ -28,7 +28,7 @@ use std::{error::Error, path::PathBuf}; /// /// Expects to be called about like this: /// -/// `assert_batch_eq!(expected_lines: &[&str], batches: &[RecordBatch])` Review Comment: this was just a typo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Support bounds evaluation for temporal data types [datafusion]
ch-sc commented on code in PR #14523: URL: https://github.com/apache/datafusion/pull/14523#discussion_r1968018999 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -398,6 +399,51 @@ impl PhysicalExpr for InListExpr { self.static_filter.clone(), ))) } + +/// The output interval is computed by checking if the list item intervals are +/// a subset of, overlap, or are disjoint with the input expression's interval. +/// +/// If [InListExpr::negated] is true, the output interval gets negated. +/// +/// # Example: +/// If the input expression's interval is a superset of the +/// conjunction of the list items intervals, the output +/// interval is [`Interval::CERTAINLY_TRUE`]. +/// +/// ```text +/// interval of expr: - +/// Some list items:..|..|.|.|... +/// +/// output interval:[`true`, `true`] +/// ``` +fn evaluate_bounds(&self, children: &[&Interval]) -> Result { +let expr_bounds = children[0]; + +debug_assert!( +children.len() >= 2, +"InListExpr requires at least one list item" +); + +// conjunction of list item intervals +let list_bounds = children +.iter() +.skip(2) +.try_fold(children[1].clone(), |acc, item| acc.union(*item))?; + +if self.negated { +expr_bounds.contains(list_bounds)?.boolean_negate() +} else { +expr_bounds.contains(list_bounds) Review Comment: `contains` for `[3,5]` and `[0,9]` should return `UNCERTAIN`. I added a test with your example in in_list.rs. `CERTAINLY_TRUE` should only be returned if expr_bounds is a superset of the union of the list bounds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] [WIP] Move `FileSourceConfig` and `FileStream` to the new `datafusion-datasource` [datafusion]
AdamGS commented on code in PR #14838: URL: https://github.com/apache/datafusion/pull/14838#discussion_r1968197144 ## datafusion/datasource/Cargo.toml: ## @@ -69,6 +71,7 @@ xz2 = { version = "0.1", optional = true, features = ["static"] } zstd = { version = "0.13", optional = true, default-features = false } [dev-dependencies] +datafusion = { workspace = true, default-features = true } Review Comment: seems like some was unnecessary, and some were already moved into `memory.rs`, so I just moved it up under `#[cfg(test)]`. ## datafusion/datasource/Cargo.toml: ## @@ -69,6 +71,7 @@ xz2 = { version = "0.1", optional = true, features = ["static"] } zstd = { version = "0.13", optional = true, default-features = false } [dev-dependencies] +datafusion = { workspace = true, default-features = true } Review Comment: seems like some were unnecessary, and some were already moved into `memory.rs`, so I just moved it up under `#[cfg(test)]`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Strip debuginfo symbols for release [datafusion]
comphead commented on PR #14843: URL: https://github.com/apache/datafusion/pull/14843#issuecomment-2679349895 DF does not catch panics, so the process will crash anyway no matter what the setting is. Although there are some test cases which does `catch_unwind` but we do run tests in debug mode. Another thing for abort it doesn't call object destructors if process crashed which for DF imho it is not an issue. Potentially DF even can gain some performance which I will check separately and share the results here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[I] Experimental native scan test failures [datafusion-comet]
mbutrovich opened a new issue, #1441: URL: https://github.com/apache/datafusion-comet/issues/1441 We have two experimental native scans based on DataFusion's ParquetExec operator. This issue will track the remaining test failures and any related notes as we bring the test failures down to zero. **ParquetReadV1Suite:** - test multiple pages with different sizes and nulls (prefetch enabled) - test multiple pages with different sizes and nulls - scan metrics (prefetch enabled) - scan metrics **CometJoinSuite:** - HashJoin struct key (`native_datafusion` only) **CometCastSuite:** - cast TimestampType to LongType - cast TimestampType to StringType - cast TimestampType to DateType - cast StructType to StringType **CometArrayExpressionSuite:** - array_intersect **CometExecSuite:** - Comet native metrics: scan - explain native plan (`native_datafusion` only) **CometExpressionSuite:** - round - hex -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Strip debuginfo symbols for release [datafusion]
rkrishn7 commented on PR #14843: URL: https://github.com/apache/datafusion/pull/14843#issuecomment-2679441117 > DF does not catch panics, so the process will crash anyway no matter what the setting is. @comphead If I'm not mistaken, DF may catch panics in certain cases. For example, if a serialization task panics during writing to some object store (see [here](https://github.com/apache/datafusion/blob/e799097cbd2079d658ddc6243817ac529e2f9807/datafusion/datasource/src/write/orchestration.rs#L135)). I confirmed this by inserting a dummy panic in the serialization task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]
kevinjqliu commented on PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#issuecomment-2679610682 woot! I see 45.2.0 on https://pypi.org/project/datafusion/ thanks for working on the release! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] build(deps): bump prost-types from 0.13.4 to 0.13.5 [datafusion-python]
dependabot[bot] commented on PR #1021: URL: https://github.com/apache/datafusion-python/pull/1021#issuecomment-2679578887 Looks like prost-types is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Release DataFusion `46.0.0` [datafusion]
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2679454880 I made a PR for testing in delta here: - https://github.com/delta-io/delta-rs/pull/3261 Still has some issues to work out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]
alamb commented on PR #14860: URL: https://github.com/apache/datafusion/pull/14860#issuecomment-2679711714 Hi @anlinc It looks like there is a small clippy failure with this PR: https://github.com/apache/datafusion/actions/runs/13507870042/job/37742937287?pr=14860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Move `FileSourceConfig` and `FileStream` to the new `datafusion-datasource` [datafusion]
alamb commented on code in PR #14838: URL: https://github.com/apache/datafusion/pull/14838#discussion_r1968468146 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -15,19 +15,611 @@ // specific language governing permissions and limitations // under the License. -use std::{borrow::Cow, collections::HashMap, marker::PhantomData, sync::Arc}; +//! [`FileScanConfig`] to configure scanning of possibly partitioned +//! file sources. + +use std::{ +any::Any, borrow::Cow, collections::HashMap, fmt::Debug, fmt::Formatter, +fmt::Result as FmtResult, marker::PhantomData, sync::Arc, +}; use arrow::{ array::{ ArrayData, ArrayRef, BufferBuilder, DictionaryArray, RecordBatch, RecordBatchOptions, }, buffer::Buffer, -datatypes::{ArrowNativeType, DataType, SchemaRef, UInt16Type}, +datatypes::{ArrowNativeType, DataType, Field, Schema, SchemaRef, UInt16Type}, +}; +use datafusion_common::{ +exec_err, stats::Precision, ColumnStatistics, Constraints, Result, Statistics, }; -use datafusion_common::{exec_err, Result}; use datafusion_common::{DataFusionError, ScalarValue}; -use log::warn; +use datafusion_execution::{ +object_store::ObjectStoreUrl, SendableRecordBatchStream, TaskContext, +}; +use datafusion_physical_expr::{ +expressions::Column, EquivalenceProperties, LexOrdering, Partitioning, +PhysicalSortExpr, +}; +use datafusion_physical_plan::{ +display::{display_orderings, ProjectSchemaDisplay}, +metrics::ExecutionPlanMetricsSet, +projection::{all_alias_free_columns, new_projections_for_columns, ProjectionExec}, +DisplayAs, DisplayFormatType, ExecutionPlan, +}; +use log::{debug, warn}; + +use crate::{ +display::FileGroupsDisplay, +file::FileSource, +file_compression_type::FileCompressionType, +file_stream::FileStream, +source::{DataSource, DataSourceExec}, +statistics::MinMaxStatistics, +PartitionedFile, +}; + +/// The base configurations for a [`DataSourceExec`], the a physical plan for +/// any given file format. +/// +/// Use [`Self::build`] to create a [`DataSourceExec`] from a ``FileScanConfig`. +/// +/// # Example +/// ```ignore +/// # use std::sync::Arc; +/// # use arrow::datatypes::{Field, Fields, DataType, Schema}; +/// # use datafusion_datasource::PartitionedFile; +/// # use datafusion_datasource::file_scan_config::FileScanConfig; +/// # use datafusion_execution::object_store::ObjectStoreUrl; +/// # use datafusion::datasource::physical_plan::ArrowSource; Review Comment: This is a weird example anyways because it actually refers to `ParquetFiles` I took the liberty of pushing a commit to this branch to restore the doc test by mocking out a ParquetSource in 35e6ab727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Store spans for Value expressions [datafusion-sqlparser-rs]
lovasoa commented on code in PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#discussion_r1968471231 ## src/ast/mod.rs: ## @@ -8789,9 +8796,9 @@ mod tests { #[test] fn test_interval_display() { let interval = Expr::Interval(Interval { -value: Box::new(Expr::Value(Value::SingleQuotedString(String::from( -"123:45.67", -, +value: Box::new(Expr::Value( Review Comment: Yes! But I don't have the strength to fight ast-grep again 🙂↔️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]
alamb commented on code in PR #14860: URL: https://github.com/apache/datafusion/pull/14860#discussion_r1968445509 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -63,6 +64,32 @@ use indexmap::IndexSet; /// Default table name for unnamed table pub const UNNAMED_TABLE: &str = "?table?"; +#[derive(Debug, Clone)] +pub struct LogicalPlanBuilderOptions { +/// Flag indicating whether the plan builder should add +/// functionally dependent expressions as additional aggregation groupings. +add_implicit_group_by_exprs: bool, +} + +impl Default for LogicalPlanBuilderOptions { +fn default() -> Self { +Self { +add_implicit_group_by_exprs: false, +} +} +} + +impl LogicalPlanBuilderOptions { +pub fn new() -> Self { +Default::default() +} + +pub fn with_add_implicit_group_by_exprs(mut self, add: bool) -> Self { Review Comment: ```suggestion /// Should the builder add functionally dependent expressions as additional aggregation groupings. pub fn with_add_implicit_group_by_exprs(mut self, add: bool) -> Self { ``` ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -63,6 +64,32 @@ use indexmap::IndexSet; /// Default table name for unnamed table pub const UNNAMED_TABLE: &str = "?table?"; +#[derive(Debug, Clone)] Review Comment: ```suggestion /// Options for [`LogicalPlanBuilder`] #[derive(Debug, Clone)] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] [WIP] Move `FileSourceConfig` and `FileStream` to the new `datafusion-datasource` [datafusion]
AdamGS commented on code in PR #14838: URL: https://github.com/apache/datafusion/pull/14838#discussion_r1968201066 ## datafusion/core/Cargo.toml: ## @@ -40,7 +40,7 @@ nested_expressions = ["datafusion-functions-nested"] # This feature is deprecated. Use the `nested_expressions` feature instead. array_expressions = ["nested_expressions"] # Used to enable the avro format -avro = ["apache-avro", "num-traits", "datafusion-common/avro"] +avro = ["apache-avro", "num-traits", "datafusion-common/avro", "datafusion-datasource/avro"] Review Comment: I think the next PR will try and do that, but some will always stay even if just for stuff like `AvroError`/`ParquetError` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] [WIP] Move `FileSourceConfig` and `FileStream` to the new `datafusion-datasource` [datafusion]
AdamGS commented on code in PR #14838: URL: https://github.com/apache/datafusion/pull/14838#discussion_r1968368078 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -15,19 +15,611 @@ // specific language governing permissions and limitations // under the License. -use std::{borrow::Cow, collections::HashMap, marker::PhantomData, sync::Arc}; +//! [`FileScanConfig`] to configure scanning of possibly partitioned +//! file sources. + +use std::{ +any::Any, borrow::Cow, collections::HashMap, fmt::Debug, fmt::Formatter, +fmt::Result as FmtResult, marker::PhantomData, sync::Arc, +}; use arrow::{ array::{ ArrayData, ArrayRef, BufferBuilder, DictionaryArray, RecordBatch, RecordBatchOptions, }, buffer::Buffer, -datatypes::{ArrowNativeType, DataType, SchemaRef, UInt16Type}, +datatypes::{ArrowNativeType, DataType, Field, Schema, SchemaRef, UInt16Type}, +}; +use datafusion_common::{ +exec_err, stats::Precision, ColumnStatistics, Constraints, Result, Statistics, }; -use datafusion_common::{exec_err, Result}; use datafusion_common::{DataFusionError, ScalarValue}; -use log::warn; +use datafusion_execution::{ +object_store::ObjectStoreUrl, SendableRecordBatchStream, TaskContext, +}; +use datafusion_physical_expr::{ +expressions::Column, EquivalenceProperties, LexOrdering, Partitioning, +PhysicalSortExpr, +}; +use datafusion_physical_plan::{ +display::{display_orderings, ProjectSchemaDisplay}, +metrics::ExecutionPlanMetricsSet, +projection::{all_alias_free_columns, new_projections_for_columns, ProjectionExec}, +DisplayAs, DisplayFormatType, ExecutionPlan, +}; +use log::{debug, warn}; + +use crate::{ +display::FileGroupsDisplay, +file::FileSource, +file_compression_type::FileCompressionType, +file_stream::FileStream, +source::{DataSource, DataSourceExec}, +statistics::MinMaxStatistics, +PartitionedFile, +}; + +/// The base configurations for a [`DataSourceExec`], the a physical plan for +/// any given file format. +/// +/// Use [`Self::build`] to create a [`DataSourceExec`] from a ``FileScanConfig`. +/// +/// # Example +/// ```ignore +/// # use std::sync::Arc; +/// # use arrow::datatypes::{Field, Fields, DataType, Schema}; +/// # use datafusion_datasource::PartitionedFile; +/// # use datafusion_datasource::file_scan_config::FileScanConfig; +/// # use datafusion_execution::object_store::ObjectStoreUrl; +/// # use datafusion::datasource::physical_plan::ArrowSource; Review Comment: Had to ignore this very good rustdoc test because we don't have `ArrowSource` here. My plan is to only have that as a temporary thing between PRs, with the fix being either: 1. Have some `InMemoryFileSource` here that just gets `Vec` and knows how to return it through `FileSource` (or maybe even the `MockSource` I added for tests somewhere in this PR). 2. Potentially put `ArrowSource` in this crate as the "canonical" source, especially given how small it is at less than 200 lines of actual code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] build(deps): bump prost from 0.13.4 to 0.13.5 [datafusion-python]
dependabot[bot] closed pull request #1022: build(deps): bump prost from 0.13.4 to 0.13.5 URL: https://github.com/apache/datafusion-python/pull/1022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] build(deps): bump prost from 0.13.4 to 0.13.5 [datafusion-python]
dependabot[bot] commented on PR #1022: URL: https://github.com/apache/datafusion-python/pull/1022#issuecomment-267957 Looks like prost is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] build(deps): bump prost-types from 0.13.4 to 0.13.5 [datafusion-python]
dependabot[bot] closed pull request #1021: build(deps): bump prost-types from 0.13.4 to 0.13.5 URL: https://github.com/apache/datafusion-python/pull/1021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]
timsaucer merged PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Set projection before configuring the source [datafusion]
milenkovicm commented on PR #14685: URL: https://github.com/apache/datafusion/pull/14685#issuecomment-2679834606 Would it make sense to add a test which would cover plan round trip? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Add example to spark-expr crate [datafusion-comet]
andygrove closed issue #1365: Add example to spark-expr crate URL: https://github.com/apache/datafusion-comet/issues/1365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]
chenkovsky commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2679992458 > Hi @chenkovsky , > Thanks a ton for quick POC on this. :) > > The row ids seems to be specific to each batch and not across the entire parquet file - is my understanding correct ? > > Reason is our use case will mainly benefit from parquet file level row ids. @bharath-techie yes,This is just an example, not for real situation. If it needs to meet the actual requirements, I will need more time. i think i have to learn more about parquet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1968654009 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { val cometMessage = if (cometException.getCause != null) cometException.getCause.getMessage else cometException.getMessage - if (CometSparkSessionExtensions.isSpark40Plus) { -// for Spark 4 we expect to sparkException carries the message -assert( - sparkException.getMessage -.replace(".WITH_SUGGESTION] ", "]") -.startsWith(cometMessage)) - } else if (CometSparkSessionExtensions.isSpark34Plus) { -// for Spark 3.4 we expect to reproduce the error message exactly -assert(cometMessage == sparkMessage) + // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match. + if (sparkMessage.contains("cannot be represented as")) { +cometMessage.contains("cannot be represented as") || cometMessage.contains( + "too large to store") } else { Review Comment: > Or you can move the message check below and switch the expected messages based on the fromType/toType. I'll try this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] DataFusion 45 blog post [datafusion-site]
phillipleblanc commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1968653103 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,315 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +author: pmc +categories: [release] +--- + + + + + +## Introduction + +We are very proud to announce [DataFusion 45.0.0]. This blog highlights some of the +many major improvements since we released [DataFusion 40.0.0] and a preview of +what the community is thinking about in the next 6 months. It has been an exciting +period of development for DataFusion! + +[DataFusion 40.0.0]: https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/ +[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0 + +[Apache DataFusion] is an extensible query engine, written in [Rust], that +uses [Apache Arrow] as its in-memory format. DataFusion is used by developers to +create new, fast data centric systems such as databases, dataframe libraries, +machine learning and streaming applications. While [DataFusion’s primary design +goal] is to accelerate the creation of other data centric systems, it has a +reasonable experience directly out of the box as a [dataframe library], +[python library] and [command line SQL tool]. + +[apache datafusion]: https://datafusion.apache.org/ +[rust]: https://www.rust-lang.org/ +[apache arrow]: https://arrow.apache.org +[DataFusion’s primary design goal]: https://datafusion.apache.org/user-guide/introduction.html#project-goals +[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html +[python library]: https://datafusion.apache.org/python/ +[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/ + +DataFusion's core thesis is that as a community, together we can build much more +advanced technology than any of us as individuals or companies could do alone. +Without DataFusion, highly performant vectorized query engines would remain +the domain of a few large companies and world-class research institutions. +With DataFusion, we can all build on top of a shared foundation, and focus on +what makes our projects unique. + + +## Community Growth 📈 + +In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to +grow in new and exciting ways. + +1. We added several PMC members and new committers: [@jayzhan211] and [@jonahgao] joined the PMC, + [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], [@Weijun-H], [@Michael-J-Ward] and [@korowa] + joined as committers. See the [mailing list] for more details. +2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 PRs from 206 different + committers, created over 1100 issues and closed 751 of them 🚀. All changes are listed in the detailed + [changelogs]. +3. DataFusion focused meetups happened in multiple cities around the world: [Hangzhou], [Belgrade], [New York], + [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup in NYC focussed on DataFusion. Review Comment: ```suggestion [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup in NYC focused on DataFusion. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1968655592 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { val cometMessage = if (cometException.getCause != null) cometException.getCause.getMessage else cometException.getMessage - if (CometSparkSessionExtensions.isSpark40Plus) { -// for Spark 4 we expect to sparkException carries the message -assert( - sparkException.getMessage -.replace(".WITH_SUGGESTION] ", "]") -.startsWith(cometMessage)) - } else if (CometSparkSessionExtensions.isSpark34Plus) { -// for Spark 3.4 we expect to reproduce the error message exactly -assert(cometMessage == sparkMessage) + // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match. + if (sparkMessage.contains("cannot be represented as")) { +cometMessage.contains("cannot be represented as") || cometMessage.contains( + "too large to store") } else { Review Comment: > One or the other way. Otherwise we cannot confidently say that the test is passing due to the cannot be represented as message or the too large to store message. I'm contemplating creating a different test/check function for decimal to decimal test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Browser-accessible official DataFusion playground / DataFusion fiddle [datafusion]
waynexia commented on issue #13818: URL: https://github.com/apache/datafusion/issues/13818#issuecomment-2680102365 I'm imaging a highly encapsulated JS component like ```tsx ``` and the user who wants an interactive terminal can render a DataFusion webpage with one line, who wants to use DataFusion WASM to do computation can invoke it as a JS function -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Release DataFusion `46.0.0` [datafusion]
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2680109294 I will test with Sail by Wednesday! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add DataFrame fill_null [datafusion]
kosiew commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2680284864 > its documented in https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html not in the code itself I added fill_null example usage in dataframe/mod.rs for rustdoc to update https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]
xudong963 commented on code in PR #14207: URL: https://github.com/apache/datafusion/pull/14207#discussion_r1968751449 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3154,3 +3164,104 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> { Ok(()) } + +#[tokio::test] +async fn apply_enforce_distribution_multiple_times() -> Result<()> { +// Create a configuration +let config = SessionConfig::new(); +let ctx = SessionContext::new_with_config(config); +let testdata = datafusion::test_util::arrow_test_data(); +let csv_file = format!("{testdata}/csv/aggregate_test_100.csv"); +// Create table schema and data +let sql = format!( +"CREATE EXTERNAL TABLE aggregate_test_100 ( +c1 VARCHAR NOT NULL, +c2 TINYINT NOT NULL, +c3 SMALLINT NOT NULL, +c4 SMALLINT, +c5 INT, +c6 BIGINT NOT NULL, +c7 SMALLINT NOT NULL, +c8 INT NOT NULL, +c9 BIGINT UNSIGNED NOT NULL, +c10 VARCHAR NOT NULL, +c11 FLOAT NOT NULL, +c12 DOUBLE NOT NULL, +c13 VARCHAR NOT NULL +) +STORED AS CSV +LOCATION '{csv_file}' +OPTIONS ('format.has_header' 'true')" +); + +ctx.sql(sql.as_str()).await?; + +let df = ctx.sql("SELECT * FROM(SELECT * FROM aggregate_test_100 UNION ALL SELECT * FROM aggregate_test_100) ORDER BY c13 LIMIT 5").await?; +let logical_plan = df.logical_plan().clone(); +let analyzed_logical_plan = ctx.state().analyzer().execute_and_check( +logical_plan, +ctx.state().config_options(), +|_, _| (), +)?; +let optimized_logical_plan = ctx.state().optimizer().optimize( +analyzed_logical_plan, +&ctx.state(), +|_, _| (), +)?; + +let optimizers: Vec> = vec![ +Arc::new(OutputRequirements::new_add_mode()), +Arc::new(EnforceDistribution::new()), +Arc::new(EnforceSorting::new()), +Arc::new(ProjectionPushdown::new()), +Arc::new(CoalesceBatches::new()), +Arc::new(EnforceDistribution::new()), // -- Add enforce distribution rule again +// The second `EnforceDistribution` should be run before removing `OutputRequirements` to reproduce the bug. +Arc::new(OutputRequirements::new_remove_mode()), +Arc::new(ProjectionPushdown::new()), +Arc::new(LimitPushdown::new()), +Arc::new(SanityCheckPlan::new()), Review Comment: https://github.com/apache/datafusion/pull/14207/commits/ffb1eb3d3545ff367bbfac715cb005651f433848 done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add DataFrame fill_null [datafusion]
kosiew commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1968745726 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified columns with a given value +/// If no columns are specified, applies to all columns +/// Only fills if the value can be cast to the column's type +/// +/// # Arguments +/// * `value` - Value to fill nulls with +/// * `columns` - Optional list of column names to fill. If None, fills all columns +pub fn fill_null( +&self, +value: ScalarValue, +columns: Option>, Review Comment: Fixed. Thanks for spotting this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add DataFrame fill_null [datafusion]
kosiew commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1968745726 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified columns with a given value +/// If no columns are specified, applies to all columns +/// Only fills if the value can be cast to the column's type +/// +/// # Arguments +/// * `value` - Value to fill nulls with +/// * `columns` - Optional list of column names to fill. If None, fills all columns +pub fn fill_null( +&self, +value: ScalarValue, +columns: Option>, Review Comment: Fixed/ Thanks for spotting this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add DataFrame fill_null [datafusion]
kosiew commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1967089979 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified columns with a given value +/// If no columns are specified, applies to all columns +/// Only fills if the value can be cast to the column's type +/// +/// # Arguments +/// * `value` - Value to fill nulls with +/// * `columns` - Optional list of column names to fill. If None, fills all columns +pub fn fill_null( +&self, +value: ScalarValue, +columns: Option>, +) -> Result { +let cols = match columns { +Some(names) => self.find_columns(&names)?, +None => self Review Comment: I was following the Pandas convention, where no limits (columns) means fill_null for all columns https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html https://github.com/user-attachments/assets/c4eab1eb-f1d2-4133-90c3-0a2c57b0349d"; /> I could amend to no op as well if this is the preferred convention -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]
kazuyukitanimura commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1968649099 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { val cometMessage = if (cometException.getCause != null) cometException.getCause.getMessage else cometException.getMessage - if (CometSparkSessionExtensions.isSpark40Plus) { -// for Spark 4 we expect to sparkException carries the message -assert( - sparkException.getMessage -.replace(".WITH_SUGGESTION] ", "]") -.startsWith(cometMessage)) - } else if (CometSparkSessionExtensions.isSpark34Plus) { -// for Spark 3.4 we expect to reproduce the error message exactly -assert(cometMessage == sparkMessage) + // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match. + if (sparkMessage.contains("cannot be represented as")) { +cometMessage.contains("cannot be represented as") || cometMessage.contains( + "too large to store") } else { Review Comment: Another way to remove the `if` block is to convert the error message to make it similar to Spark This is the place that we are defining Spark error messages https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/error.rs#L36 You can check how these are used. Or you can move the message check below and switch the expected messages based on the fromType/toType. One or the other way. Otherwise we cannot confidently say that the test is passing due to the `cannot be represented as` message or the `too large to store` message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] refactor: use TypeSignature::Coercible for crypto functions [datafusion]
jayzhan211 commented on PR #14826: URL: https://github.com/apache/datafusion/pull/14826#issuecomment-2680085289 > [SQL] EXPLAIN SELECT digest(column1_utf8view, 'md5') as c FROM test; [Diff] (-expected|+actual) logical_plan - 01)Projection: digest(test.column1_utf8view, Utf8View("md5")) AS c + 01)Projection: digest(test.column1_utf8view, Utf8("md5")) AS c 02)--TableScan: test projection=[column1_utf8view] This looks like an improvement, we don't need to cast it to utf8view -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1968652251 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { val cometMessage = if (cometException.getCause != null) cometException.getCause.getMessage else cometException.getMessage - if (CometSparkSessionExtensions.isSpark40Plus) { -// for Spark 4 we expect to sparkException carries the message -assert( - sparkException.getMessage -.replace(".WITH_SUGGESTION] ", "]") -.startsWith(cometMessage)) - } else if (CometSparkSessionExtensions.isSpark34Plus) { -// for Spark 3.4 we expect to reproduce the error message exactly -assert(cometMessage == sparkMessage) + // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match. + if (sparkMessage.contains("cannot be represented as")) { +cometMessage.contains("cannot be represented as") || cometMessage.contains( + "too large to store") } else { Review Comment: > Another way to remove the if block is to convert the error message to make it similar to Spark This is the place that we are defining Spark error messages https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/error.rs#L36 You can check how these are used This one I checked - problem here is from native execution, we get back -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1968652251 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { val cometMessage = if (cometException.getCause != null) cometException.getCause.getMessage else cometException.getMessage - if (CometSparkSessionExtensions.isSpark40Plus) { -// for Spark 4 we expect to sparkException carries the message -assert( - sparkException.getMessage -.replace(".WITH_SUGGESTION] ", "]") -.startsWith(cometMessage)) - } else if (CometSparkSessionExtensions.isSpark34Plus) { -// for Spark 3.4 we expect to reproduce the error message exactly -assert(cometMessage == sparkMessage) + // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match. + if (sparkMessage.contains("cannot be represented as")) { +cometMessage.contains("cannot be represented as") || cometMessage.contains( + "too large to store") } else { Review Comment: > Another way to remove the if block is to convert the error message to make it similar to Spark This is the place that we are defining Spark error messages https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/error.rs#L36 You can check how these are used This one I checked - problem here is from native execution, we get back `Arrow(ArrowError)` which only has a string, precision and scale information is not present. Also to construct an error message from a string - we need to check specific string in the message. We can try to change the ArrowError to have parameter but that will be a big change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] chore: remove jackson dependency [datafusion-comet]
kazuyukitanimura opened a new pull request, #1442: URL: https://github.com/apache/datafusion-comet/pull/1442 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? Removed jackson dependency ## How are these changes tested? CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: remove jackson dependency [datafusion-comet]
codecov-commenter commented on PR #1442: URL: https://github.com/apache/datafusion-comet/pull/1442#issuecomment-2680235325 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1442?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 34.77%. Comparing base [(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) to head [(`1783e2f`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/1783e2f9222776be82c6805e5ac54d40fb871d7b?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). > Report is 47 commits behind head on main. Additional details and impacted files ```diff @@ Coverage Diff @@ ## main#1442 +/- ## = - Coverage 56.12% 34.77% -21.36% - Complexity 976 1071 +95 = Files 119 143 +24 Lines 1174349374+37631 Branches 225110804 +8553 = + Hits 659117168+10577 - Misses 401228777+24765 - Partials 1140 3429 +2289 ``` [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1442?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Implement actual count wildcard in physical layer and fix duplicated schema name error from count wildcard [datafusion]
jayzhan211 commented on code in PR #14824: URL: https://github.com/apache/datafusion/pull/14824#discussion_r1968726701 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -1145,9 +1145,9 @@ async fn test_count_wildcard() -> Result<()> { .build() .unwrap(); -let expected = "Sort: count(*) ASC NULLS LAST [count(*):Int64]\ -\n Projection: count(*) [count(*):Int64]\ -\nAggregate: groupBy=[[test.b]], aggr=[[count(*)]] [b:UInt32, count(*):Int64]\ +let expected = "Sort: count(Int64(1)) ASC NULLS LAST [count(Int64(1)):Int64]\ Review Comment: count_all() is count(1) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]
alamb commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2679691606 > Adding an MSRV sounds good to me! Filed an issue to track: - https://github.com/apache/datafusion-sqlparser-rs/issues/1744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[I] Compilation error due to rkyv/rkyv#434 [datafusion]
ryzhyk opened a new issue, #14862: URL: https://github.com/apache/datafusion/issues/14862 ### Describe the bug When datafusion is used in a workspace that enables the `rkyv-64` feature in the `chrono` crate, this triggers a Rust compilation error: ``` error[E0277]: can't compare `Option<&std::string::String>` with `Option<&mut std::string::String>` --> datafusion/expr/src/logical_plan/plan.rs:2872:43 | 2872 | merged.retain(|k, v| input.get(k) == Some(v)); | ^^ no implementation for `Option<&std::string::String> == Option<&mut std::string::String>` | = help: the trait `PartialEq>` is not implemented for `Option<&std::string::String>` = help: the following other types implement trait `PartialEq`: `Option` implements `PartialEq` `Option` implements `PartialEq>` ``` The root cause of the error is incorrect type unification in the Rust compiler, as explained in https://github.com/rkyv/rkyv/issues/434. I will submit a PR with a workaround. ### To Reproduce Change `Cargo.toml` to enable the `rkyv-64` feature in chrono and try building the `datafusion-expr` crate: ``` chrono = { version = "0.4.38", default-features = false, features = ["rkyv-64"] } ``` ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Workaround for compilation error due to rkyv#434. [datafusion]
ryzhyk opened a new pull request, #14863: URL: https://github.com/apache/datafusion/pull/14863 ## Which issue does this PR close? - Closes #14862 ## Rationale for this change When datafusion is used in a workspace that enables the `rkyv-64` feature in the `chrono` crate, this triggered a Rust compilation error: ``` error[E0277]: can't compare `Option<&std::string::String>` with `Option<&mut std::string::String>`. ``` The root cause of the error is incorrect type unification in the Rust compiler, as explained in https://github.com/rkyv/rkyv/issues/434. ## What changes are included in this PR? The workaround pushes the compiler in the right direction by converting the mutable reference to an immutable one manually. ## Are these changes tested? I don't think this requires additional tests. ## Are there any user-facing changes? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] test: Register Spark-compatible expressions with a DataFusion context [datafusion-comet]
andygrove merged PR #1432: URL: https://github.com/apache/datafusion-comet/pull/1432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Blog / Example of how to compile DataFusion to WASM [datafusion]
waynexia commented on issue #13715: URL: https://github.com/apache/datafusion/issues/13715#issuecomment-2680048213 There is a developer guide for building, debugging, and publishing the WASM bindings: https://github.com/datafusion-contrib/datafusion-wasm-bindings/blob/main/CONTRIBUTING.md (forget to link 🙈) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Implement actual count wildcard in physical layer and fix duplicated schema name error from count wildcard [datafusion]
jayzhan211 commented on code in PR #14824: URL: https://github.com/apache/datafusion/pull/14824#discussion_r1968712649 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2455,7 +2455,7 @@ async fn test_count_wildcard_on_sort() -> Result<()> { let ctx = create_join_context()?; let sql_results = ctx -.sql("select b,count(*) from t1 group by b order by count(*)") +.sql("select b,count(1) from t1 group by b order by count(1)") Review Comment: count_all() is count(1) now, so we need to change to count(1) to have a consistent name (count(*) is count(a) AS count(*) now). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1968617896 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! [DedicatedExecutor] for running CPU-bound tasks on a separate tokio runtime. + +use crate::SendableRecordBatchStream; +use async_trait::async_trait; +use datafusion::physical_plan::stream::RecordBatchStreamAdapter; +use datafusion_common::error::GenericError; +use datafusion_common::DataFusionError; +use futures::stream::BoxStream; +use futures::{ +future::{BoxFuture, Shared}, +Future, FutureExt, Stream, StreamExt, TryFutureExt, +}; +use log::{info, warn}; +use object_store::path::Path; +use object_store::{ +GetOptions, GetResult, GetResultPayload, ListResult, MultipartUpload, ObjectMeta, +ObjectStore, PutMultipartOpts, PutOptions, PutPayload, PutResult, UploadPart, +}; +use std::cell::RefCell; +use std::pin::Pin; +use std::sync::RwLock; +use std::task::{Context, Poll}; +use std::{fmt::Display, sync::Arc, time::Duration}; +use tokio::runtime::Builder; +use tokio::task::JoinHandle; +use tokio::{ +runtime::Handle, +sync::{oneshot::error::RecvError, Notify}, +task::JoinSet, +}; +use tokio_stream::wrappers::ReceiverStream; + +/// Create a [`DedicatedExecutorBuilder`] from a tokio [`Builder`] +impl From for DedicatedExecutorBuilder { +fn from(value: Builder) -> Self { +Self::new_from_builder(value) +} +} + +/// Manages a separate tokio [`Runtime`] (thread pool) for executing CPU bound +/// tasks such as DataFusion `ExecutionPlans`. +/// +/// See [`DedicatedExecutorBuilder`] for creating a new instance. +/// +/// A `DedicatedExecutor` can helps avoid issues when runnnig IO and CPU bound tasks on the +/// same thread pool by running futures (and any `tasks` that are +/// `tokio::task::spawned` by them) on a separate tokio [`Executor`]. +/// +/// `DedicatedExecutor`s can be `clone`ed and all clones share the same thread pool. +/// +/// Since the primary use for a `DedicatedExecutor` is offloading CPU bound +/// work, IO work can not be performed on tasks launched in the Executor. +/// +/// To perform IO, see: +/// - [`Self::spawn_io`] +/// - [`Self::wrap_object_store`] +/// +/// When [`DedicatedExecutorBuilder::build`] is called, a reference to the +/// "current" tokio runtime will be stored and used, via [`register_io_runtime`] by all +/// threads spawned by the executor. Any I/O done by threads in this +/// [`DedicatedExecutor`] should use [`spawn_io`], which will run them on the +/// I/O runtime. +/// +/// # TODO examples +/// +/// # Background +/// +/// Tokio has the notion of the "current" runtime, which runs the current future +/// and any tasks spawned by it. Typically, this is the runtime created by +/// `tokio::main` and is used for the main application logic and I/O handling +/// +/// For CPU bound work, such as DataFusion plan execution, it is important to +/// run on a separate thread pool to avoid blocking the I/O handling for extended +/// periods of time in order to avoid long poll latencies (which decreases the +/// throughput of small requests under concurrent load). +/// +/// # IO Scheduling +/// +/// I/O, such as network calls, should not be performed on the runtime managed +/// by [`DedicatedExecutor`]. As tokio is a cooperative scheduler, long-running +/// CPU tasks will not be preempted and can therefore starve servicing of other +/// tasks. This manifests in long poll-latencies, where a task is ready to run +/// but isn't being scheduled to run. For CPU-bound work this isn't a problem as +/// there is no external party waiting on a response, however, for I/O tasks, +/// long poll latencies can prevent timely servicing of IO, which can have a +/// significant detrimental effect. +/// +/// # Details +/// +/// The worker thread priority is set to low so that such tasks do +/// not starve other more important tasks (such as answering health checks) +/// +/// Follows the example from stack overflow and spawns a new +/// thread to install a Tokio runtime "context" +///
Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2679630951 https://github.com/apache/datafusion/pull/14838 turned out to be easier than I thought, mostly because @logan-keede did much of the work to move test infrastructure over. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]
kazuyukitanimura commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1968689241 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { val cometMessage = if (cometException.getCause != null) cometException.getCause.getMessage else cometException.getMessage - if (CometSparkSessionExtensions.isSpark40Plus) { -// for Spark 4 we expect to sparkException carries the message -assert( - sparkException.getMessage -.replace(".WITH_SUGGESTION] ", "]") -.startsWith(cometMessage)) - } else if (CometSparkSessionExtensions.isSpark34Plus) { -// for Spark 3.4 we expect to reproduce the error message exactly -assert(cometMessage == sparkMessage) + // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match. + if (sparkMessage.contains("cannot be represented as")) { +cometMessage.contains("cannot be represented as") || cometMessage.contains( + "too large to store") } else { Review Comment: Just a thought. For the former approach, can we get precision info similar to https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/conversion_funcs/cast.rs#L932 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Feat: Implement hf:// / "hugging face" integration in datafusion-cli [datafusion]
github-actions[bot] closed pull request #10792: Feat: Implement hf:// / "hugging face" integration in datafusion-cli URL: https://github.com/apache/datafusion/pull/10792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Adding node_id to ExecutionPlanProperties [datafusion]
github-actions[bot] closed pull request #12186: Adding node_id to ExecutionPlanProperties URL: https://github.com/apache/datafusion/pull/12186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Add support for `CREATE SCHEMA` options [datafusion-sqlparser-rs]
iffyio opened a new pull request, #1742: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1742 ```sql CREATE SCHEMA DEFAULT COLLATE 'foo' myschema OPTIONS(key='value'); ``` [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_schema_statement) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Add support for `DROP MATERIALIZED VIEW` [datafusion-sqlparser-rs]
iffyio opened a new pull request, #1743: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1743 Adds support for e.g. ```sql DROP MATERIALIZED VIEW IF EXISTS a.b.c ``` [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#drop_materialized_view_statement) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]
lovasoa commented on code in PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#discussion_r1967169666 ## src/ast/value.rs: ## @@ -26,10 +26,25 @@ use bigdecimal::BigDecimal; #[cfg(feature = "serde")] use serde::{Deserialize, Serialize}; -use crate::ast::Ident; +use crate::{ast::Ident, tokenizer::Span}; #[cfg(feature = "visitor")] use sqlparser_derive::{Visit, VisitMut}; +/// Primitive SQL values such as number and string +#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] +#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] +#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))] +pub struct ValueWrapper { +pub value: Value, +pub span: Span, +} Review Comment: I don't know. What do you think is best? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org