Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-03-18 Thread via GitHub


shehabgamin commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2735261314

   I feel like this may be important enough to try to get into the release. 
Does anyone else have thoughts?
   
   https://github.com/apache/datafusion/issues/15174


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub


adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735401711

   Inspired by discussion in https://github.com/apache/datafusion/pull/13054 I 
went with adding this to `ExecutionPlan`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub


adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735408031

   Tomorrow I plan on doing some tracer bullet testing to see if this approach 
works at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub


adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735413403

   cc @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: `core_expressions` feature flag broken, move `overlay` into `core` functions [datafusion]

2025-03-18 Thread via GitHub


alamb commented on PR #15217:
URL: https://github.com/apache/datafusion/pull/15217#issuecomment-2734530255

   > hey @alamb, I have already added a re-export at the end of 
`datafusion/functions/src/string/overlay.rs` like this
   
   Thanks @shruti2522  - that looks good to me
   
   I double checked and it is appearing as we would expect
   
   ![Screenshot 2025-03-18 at 3 43 22 
PM](https://github.com/user-attachments/assets/2fc77338-9fcc-4e2d-976a-87158e88f76a)
   
   
   Thank you 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub


comphead commented on code in PR #1456:
URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2001739415


##
native/core/Cargo.toml:
##
@@ -77,6 +77,7 @@ jni = { version = "0.21", features = ["invocation"] }
 lazy_static = "1.4"
 assertables = "7"
 hex = "0.4.3"
+datafusion-functions-nested = "46.0.0"

Review Comment:
   I'm not sure if it is possible to make it `workspace = true`, this package 
is optional for test only and optional packages are not allowed on workspace 
level. 
   
   I can include it in workspace as non optional but it will be compiled into 
the target binary which is probably not expected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub


akurmustafa commented on PR #59:
URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734756045

   > > Thanks @alamb, I was working on to add the example you gave ("DataFusion 
can find / use orderings based on query intermediates"). Should we add this to 
the document what do you think? We can leave the document in current form also?
   > 
   > I think this would be great!
   > 
   > We could also write an entire other blog post about it too -- whatever you 
prefer / have time for!
   
   I think, for the completeness we can add this section to the current post. 
However, independent of this I think we can write a dedicated blog post for 
this definitely (with lots of examples, and with a focus on queries and plans 
). 
   
   However, before that post I want to migrate following 
[post](https://akurmustafa.github.io/blogs/query_optimization_filter_pushdown/query_optimization_filter_pushdown.html),
 to the Datafusion website. This post is an analysis of filter pushdown with a 
more theoretical focus: "when pushing down filter makes sense? (e.g. 
always:))". During migration I plan to update that post a lot, however main 
approach and content will be very similar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub


alamb merged PR #15209:
URL: https://github.com/apache/datafusion/pull/15209


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore(deps): bump uuid from 1.15.1 to 1.16.0 [datafusion]

2025-03-18 Thread via GitHub


xudong963 merged PR #15292:
URL: https://github.com/apache/datafusion/pull/15292


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-03-18 Thread via GitHub


shehabgamin commented on code in PR #15168:
URL: https://github.com/apache/datafusion/pull/15168#discussion_r2002067067


##
datafusion/spark/src/function/math/expm1.rs:
##
@@ -0,0 +1,169 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::function::error_utils::{
+invalid_arg_count_exec_err, unsupported_data_type_exec_err,
+};
+use arrow::array::{ArrayRef, AsArray};
+use arrow::datatypes::{DataType, Float64Type};
+use datafusion_common::{Result, ScalarValue};
+use datafusion_expr::{
+ColumnarValue, ScalarFunctionArgs, ScalarUDFImpl, Signature, Volatility,
+};
+use datafusion_macros::user_doc;
+use std::any::Any;
+use std::sync::Arc;
+
+#[user_doc(
+doc_section(label = "Spark Math Functions"),
+description = "Returns exp(expr) - 1 as a Float64.",
+syntax_example = "expm1(expr)",
+sql_example = r#"```sql
+> select expm1(0);
+++
+| expm1(0)   |
+++
+| 0.0|
+++
+> select expm1(1);
+++
+| expm1(1)   |
+++
+| 50 |
+++
+```"#,
+argument(
+name = "expr",
+description = "An expression that evaluates to a numeric."
+)
+)]
+#[derive(Debug)]
+pub struct SparkExpm1 {
+signature: Signature,
+aliases: Vec,
+}
+
+impl Default for SparkExpm1 {
+fn default() -> Self {
+Self::new()
+}
+}
+
+impl SparkExpm1 {
+pub fn new() -> Self {
+Self {
+signature: Signature::user_defined(Volatility::Immutable),
+aliases: vec!["spark_expm1".to_string()],
+}
+}
+}
+
+impl ScalarUDFImpl for SparkExpm1 {
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn name(&self) -> &str {
+"expm1"
+}
+
+fn signature(&self) -> &Signature {
+&self.signature
+}
+
+fn return_type(&self, _arg_types: &[DataType]) -> Result {
+Ok(DataType::Float64)
+}
+
+fn invoke_with_args(&self, args: ScalarFunctionArgs) -> 
Result {
+if args.args.len() != 1 {
+return Err(invalid_arg_count_exec_err("expm1", (1, 1), 
args.args.len()));
+}
+match &args.args[0] {
+ColumnarValue::Scalar(ScalarValue::Float64(value)) => Ok(
+ColumnarValue::Scalar(ScalarValue::Float64(value.map(|x| 
x.exp_m1(,
+),
+ColumnarValue::Array(array) => match array.data_type() {
+DataType::Float64 => Ok(ColumnarValue::Array(Arc::new(
+array
+.as_primitive::()
+.unary::<_, Float64Type>(|x| x.exp_m1()),
+)
+as ArrayRef)),
+other => Err(unsupported_data_type_exec_err(
+"expm1",
+format!("{}", DataType::Float64).as_str(),
+other,
+)),
+},
+other => Err(unsupported_data_type_exec_err(
+"expm1",
+format!("{}", DataType::Float64).as_str(),
+&other.data_type(),
+)),
+}
+}
+
+fn aliases(&self) -> &[String] {
+&self.aliases
+}
+
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return Err(invalid_arg_count_exec_err("expm1", (1, 1), 
arg_types.len()));
+}
+if arg_types[0].is_numeric() {
+Ok(vec![DataType::Float64])
+} else {
+Err(unsupported_data_type_exec_err(
+"expm1",
+"Numeric Type",
+&arg_types[0],
+))
+}
+}
+}
+
+#[cfg(test)]
+mod tests {
+use crate::function::math::expm1::SparkExpm1;
+use crate::function::utils::test::test_scalar_function;
+use arrow::array::{Array, Float64Array};
+use arrow::datatypes::DataType::Float64;
+use datafusion_common::{Result, ScalarValue};
+use datafusion_expr::{ColumnarValue, ScalarUDFImpl};
+
+macro_rules! test_expm1_float64_invoke {
+($INPUT:expr, $EXPECTED:expr) => {
+test_scalar_function!(
+SparkExpm1::new(),
+  

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-18 Thread via GitHub


comphead commented on code in PR #60:
URL: https://github.com/apache/datafusion-site/pull/60#discussion_r2002069929


##
content/blog/2025-03-18-parquet-pruning.md:
##
@@ -0,0 +1,111 @@
+---
+layout: post
+title: Parquet pruning in DataFusion: Read Only What Matters
+date: 2025-03-18
+author: Xiangpeng Hao
+categories: [performance]
+---
+
+
+
+
+_Editor's Note: This blog was first published on [Xiangpeng Hao's blog]. 
Thanks to [InfluxData] for sponsoring this work as part of his PhD funding._
+
+[Xiangpeng Hao's blog]: https://blog.xiangpeng.systems/posts/parquet-to-arrow/
+[InfluxData]: https://www.influxdata.com/
+
+
+Parquet has become the industry standard for storing columnar data, and 
reading Parquet efficiently is crucial for query performance.
+
+To optimize this, DataFusion implements advanced Parquet support for effective 
data pruning and decoding.

Review Comment:
   I think we need to specify what exactly optimized?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improvement/improve wildcard error 15004 [datafusion]

2025-03-18 Thread via GitHub


Jiashu-Hu commented on code in PR #15287:
URL: https://github.com/apache/datafusion/pull/15287#discussion_r2001662346


##
datafusion/sql/src/select.rs:
##
@@ -826,6 +827,13 @@ impl SqlToRel<'_, S> {
 .map(|expr| rebase_expr(expr, &aggr_projection_exprs, input))
 .collect::>>()?;
 
+// check if the columns in the SELECT list are in the GROUP BY clause
+// or are part of an aggregate function, if not, throw an error.
+validate_columns_in_group_by_or_aggregate(

Review Comment:
   hey jay, thanks for your review! I've unified these 2 also updated error 
message in the test file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Parse `SUBSTR` as alias for `SUBSTRING` [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub


mvzink opened a new pull request, #1769:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1769

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001376856


##
datafusion/core/tests/parquet/custom_reader.rs:
##
@@ -96,17 +97,15 @@ async fn 
route_data_access_ops_to_parquet_file_reader_factory() {
 let task_ctx = session_ctx.task_ctx();
 let read = collect(parquet_exec, task_ctx).await.unwrap();
 
-let expected = [
-"+-+++",
-"| c1  | c2 | c3 |",
-"+-+++",
-"| Foo | 1  | 10 |",
-"| | 2  | 20 |",
-"| bar |||",
-"+-+++",
-];
-
-assert_batches_sorted_eq!(expected, &read);
+assert_snapshot!(batches_to_sort_string(&read), @r"

Review Comment:
   change to batch_to_string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-18 Thread via GitHub


timsaucer merged PR #15280:
URL: https://github.com/apache/datafusion/pull/15280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub


akurmustafa commented on PR #59:
URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734859868

   With the 
[commit](https://github.com/apache/datafusion-site/pull/59/commits/85eea6a572f95972a155ee9926319112e7149ce8),
 I have added the @alamb's suggestion to the post.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub


andygrove commented on code in PR #1456:
URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2002089283


##
native/core/src/execution/planner.rs:
##
@@ -3004,4 +3006,130 @@ mod tests {
 type_info: None,
 }
 }
+
+#[test]
+fn test_create_array() {
+let session_ctx = SessionContext::new();
+session_ctx.register_udf(ScalarUDF::from(
+datafusion_functions_nested::make_array::MakeArray::new(),
+));
+let task_ctx = session_ctx.task_ctx();
+let planner = PhysicalPlanner::new(Arc::from(session_ctx));
+
+// Create a plan for
+// ProjectionExec: expr=[make_array(col_0@0) as col_0]
+// ScanExec: source=[CometScan parquet  (unknown)], schema=[col_0: 
Int32]
+let op_scan = Operator {
+plan_id: 0,
+children: vec![],
+op_struct: Some(OpStruct::Scan(spark_operator::Scan {
+fields: vec![
+spark_expression::DataType {
+type_id: 3, // Int32
+type_info: None,
+},
+spark_expression::DataType {
+type_id: 3, // Int32
+type_info: None,
+},
+spark_expression::DataType {
+type_id: 3, // Int32
+type_info: None,
+},
+],
+source: "".to_string(),
+})),
+};
+
+let array_col = spark_expression::Expr {
+expr_struct: Some(Bound(spark_expression::BoundReference {
+index: 0,
+datatype: Some(spark_expression::DataType {
+type_id: 3,
+type_info: None,
+}),
+})),
+};
+
+let array_col_1 = spark_expression::Expr {
+expr_struct: Some(Bound(spark_expression::BoundReference {
+index: 1,
+datatype: Some(spark_expression::DataType {
+type_id: 3,
+type_info: None,
+}),
+})),
+};
+
+let projection = Operator {
+children: vec![op_scan],
+plan_id: 0,
+op_struct: Some(OpStruct::Projection(spark_operator::Projection {
+project_list: vec![spark_expression::Expr {
+expr_struct: 
Some(ExprStruct::ScalarFunc(spark_expression::ScalarFunc {
+func: "make_array".to_string(),
+args: vec![array_col, array_col_1],
+return_type: None,
+})),
+}],
+})),
+};
+
+let a = Int32Array::from(vec![0, 3]);
+let b = Int32Array::from(vec![1, 4]);
+let c = Int32Array::from(vec![2, 5]);
+let input_batch = InputBatch::Batch(vec![Arc::new(a), Arc::new(b), 
Arc::new(c)], 2);
+
+let (mut scans, datafusion_plan) =
+planner.create_plan(&projection, &mut vec![], 1).unwrap();
+scans[0].set_input_batch(input_batch);
+
+let mut stream = datafusion_plan.native_plan.execute(0, 
task_ctx).unwrap();
+
+let runtime = tokio::runtime::Runtime::new().unwrap();
+let (tx, mut rx) = mpsc::channel(1);
+
+// Separate thread to send the EOF signal once we've processed the 
only input batch
+runtime.spawn(async move {
+// Create a dictionary array with 100 values, and use it as input 
to the execution.
+let a = Int32Array::from(vec![0, 3]);
+let b = Int32Array::from(vec![1, 4]);
+let c = Int32Array::from(vec![2, 5]);
+let input_batch1 = InputBatch::Batch(vec![Arc::new(a), 
Arc::new(b), Arc::new(c)], 2);
+let input_batch2 = InputBatch::EOF;
+
+let batches = vec![input_batch1, input_batch2];
+
+for batch in batches.into_iter() {
+tx.send(batch).await.unwrap();
+}
+});
+
+runtime.block_on(async move {

Review Comment:
   It's nice to see an end-to-end test like this in the Rust project



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub


Omega359 commented on code in PR #59:
URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002099018


##
content/images/ordering_analysis/query_window_plan.png:
##


Review Comment:
   At the output of the window function the table has the ordering:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub


comphead commented on PR #1456:
URL: 
https://github.com/apache/datafusion-comet/pull/1456#issuecomment-2734862629

   Thanks  everyone


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub


comphead merged PR #1456:
URL: https://github.com/apache/datafusion-comet/pull/1456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] docs: Use a shallow clone for Spark SQL test instructions [datafusion-comet]

2025-03-18 Thread via GitHub


andygrove merged PR #1547:
URL: https://github.com/apache/datafusion-comet/pull/1547


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Blog for DataFusion 46.0.0 [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15053:
URL: https://github.com/apache/datafusion/issues/15053#issuecomment-2734564305

   Thanks @berkaysynnada 
   
   In general I suggest emphasizing things that many users of the crate will 
see / appreciate and mentioning, but not too deeply, things that developers of 
the crate care more about
   
   I also suggest that until we enable a feature by default we should not make 
a big deal about it. The rationale is that most users won't use it until it is 
on by default, and we can make a big deal about it when it is available
   
   I suggest emphasizing:
   - https://github.com/apache/datafusion/pull/14579
   - GSOC for sure
   
   I would also mention that we are working on statistics / monotonicity, but 
until there is something major we can point at that is user facing (e.g. an 
important SQL query that now goes faster with the monotonicity changes) I 
probably wouldn't emphasize them
   
   I think these ones are worth mentioning but don't make a huge deal:
   - https://github.com/apache/datafusion/pull/14224
   
   
   
   These ones are in progress so I would suggest we wait t:
   - https://github.com/apache/datafusion/pull/13664
   - https://github.com/apache/datafusion/pull/14699
   
   I would suggesting highlighting the performance improvements 
   - perf: Improve median with no grouping by 2X 
https://github.com/apache/datafusion/pull/14399 (2010YOUY01)
   - Improve performance 10%-100% in FIRST_VALUE / LAST_VALUE by not sort rows 
in FirstValueAccumulator https://github.com/apache/datafusion/pull/14402 
(blaginin)
   - Speed up uuid UDF (40x faster) 
https://github.com/apache/datafusion/pull/14675 (simonvandel)
   - perf: Drop RowConverter from GroupOrderingPartial 
https://github.com/apache/datafusion/pull/14566 (ctsk)
   Speedup to_hex (~2x faster) https://github.com/apache/datafusion/pull/14686 
(simonvandel)
   - Implement predicate pruning for not like expressions 
https://github.com/apache/datafusion/pull/14567 (UBarney)
   - Speed up chr UDF (~4x faster) 
https://github.com/apache/datafusion/pull/14700 (simonvandel)
   
   Also I suggest pointing people at the new upgrade guide: 
https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/upgrading.md#datafusion-4600
   
   And maybe we can refer people to  
https://github.com/apache/datafusion/blob/main/dev/changelog/46.0.0.md  for 
more details


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377


##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
 ScalarValue::try_from(target_type)
 }
 
+/// Merges two distributions into a single distribution that represents their 
combined statistics.
+/// This creates a more general distribution that approximates the mixture of 
the input distributions.
+pub fn merge_distributions(a: &Distribution, b: &Distribution) -> 
Result {
+let range_a = a.range()?;
+let range_b = b.range()?;
+
+// Determine data type and create combined range
+let combined_range = if range_a.is_unbounded() || range_b.is_unbounded() {

Review Comment:
   Great, I found the `Interval::union` works with intervals of the same data 
type.
   
   I seems that we can loose the requirement, such as, `Int64` with `Int32`, 
`int` with `float`, etc also can be unioned. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002309255


##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
 ScalarValue::try_from(target_type)
 }
 
+/// Merges two distributions into a single distribution that represents their 
combined statistics.
+/// This creates a more general distribution that approximates the mixture of 
the input distributions.

Review Comment:
   Maybe we can add some notes to `GenericDistribution`
   
   ```
   /// # Range Guarantees
   /// The provided range is assumed to be conservative - all possible 
values of the
   /// distribution must lie within this range. However, the range itself 
might be
   /// an approximation, as the actual distribution could occupy only a 
subset of the range.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on issue #15290:
URL: https://github.com/apache/datafusion/issues/15290#issuecomment-2732391652

   There is a proposal: https://github.com/apache/datafusion/pull/15296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub


UBarney commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000992780


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue {
 }
 }
 
+struct FirstGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+//  state ===
+vals: Vec,
+// Stores ordering values, of the aggregator requirement corresponding to 
first value
+// of the aggregator.
+// The `orderings` are stored row-wise, meaning that `orderings[group_idx]`
+// represents the ordering values corresponding to the `group_idx`-th 
group.
+orderings: Vec>,
+// At the beginning, `is_sets[group_idx]` is false, which means `first` is 
not seen yet.
+// Once we see the first value, we set the `is_sets[group_idx]` flag
+is_sets: BooleanBufferBuilder,
+// null_builder[group_idx] == false => vals[group_idx] is null
+null_builder: BooleanBufferBuilder,
+// size of `self.orderings`
+// Calculating the memory usage of `self.orderings` using 
`ScalarValue::size_of_vec` is quite costly.
+// Therefore, we cache it and compute `size_of` only after each update
+// to avoid calling `ScalarValue::size_of_vec` by Self.size.
+size_of_orderings: usize,
+
+// === option 
+
+// Stores the applicable ordering requirement.
+ordering_req: LexOrdering,
+// derived from `ordering_req`.
+sort_options: Vec,
+// Stores whether incoming data already satisfies the ordering requirement.
+input_requirement_satisfied: bool,
+// Ignore null values.
+ignore_nulls: bool,
+/// The output type
+data_type: DataType,
+default_orderings: Vec,
+}
+
+impl FirstGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+fn try_new(
+ordering_req: LexOrdering,
+ignore_nulls: bool,
+data_type: &DataType,
+ordering_dtypes: &[DataType],
+) -> Result {
+let requirement_satisfied = ordering_req.is_empty();
+
+let default_orderings = ordering_dtypes
+.iter()
+.map(ScalarValue::try_from)
+.collect::>>()?;
+
+let sort_options = get_sort_options(ordering_req.as_ref());
+
+Ok(Self {
+null_builder: BooleanBufferBuilder::new(0),
+ordering_req,
+sort_options,
+input_requirement_satisfied: requirement_satisfied,
+ignore_nulls,
+default_orderings,
+data_type: data_type.clone(),
+vals: Vec::new(),
+orderings: Vec::new(),
+is_sets: BooleanBufferBuilder::new(0),
+size_of_orderings: 0,
+})
+}
+
+fn need_update(&self, group_idx: usize) -> bool {
+if !self.is_sets.get_bit(group_idx) {
+return true;
+}
+
+if self.ignore_nulls && !self.null_builder.get_bit(group_idx) {
+return true;
+}
+
+!self.input_requirement_satisfied
+}
+
+fn should_update_state(
+&self,
+group_idx: usize,
+new_ordering_values: &[ScalarValue],
+) -> Result {
+if !self.is_sets.get_bit(group_idx) {
+return Ok(true);
+}
+
+assert!(new_ordering_values.len() == self.ordering_req.len());
+let current_ordering = &self.orderings[group_idx];
+compare_rows(current_ordering, new_ordering_values, &self.sort_options)
+.map(|x| x.is_gt())
+}
+
+fn take_orderings(&mut self, emit_to: EmitTo) -> Vec> {
+let result = emit_to.take_needed(&mut self.orderings);
+
+match emit_to {
+EmitTo::All => self.size_of_orderings = 0,
+EmitTo::First(_) => {
+self.size_of_orderings -=
+result.iter().map(ScalarValue::size_of_vec).sum::()
+}
+}
+
+result
+}
+
+fn take_need(
+bool_buf_builder: &mut BooleanBufferBuilder,
+emit_to: EmitTo,
+) -> BooleanBuffer {
+let bool_buf = bool_buf_builder.finish();
+match emit_to {
+EmitTo::All => bool_buf,
+EmitTo::First(n) => {
+// split off the first N values in seen_values
+//
+// TODO make this more efficient rather than two
+// copies and bitwise manipulation
+let first_n: BooleanBuffer = bool_buf.iter().take(n).collect();
+// reset the existing buffer
+for b in bool_buf.iter().skip(n) {
+bool_buf_builder.append(b);
+}
+first_n
+}
+}
+}
+
+fn resize_states(&mut self, new_size: usize) {
+self.vals.resize(new_size, T::default_value());
+
+if self.null_builder.len() < new_size {
+self.null_builder
+.append_n(new_size - self.null_bu

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15291:
URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2732103208

   Thanks @aectaan  -- what is the error message that you get?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub


UBarney commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000992780


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue {
 }
 }
 
+struct FirstGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+//  state ===
+vals: Vec,
+// Stores ordering values, of the aggregator requirement corresponding to 
first value
+// of the aggregator.
+// The `orderings` are stored row-wise, meaning that `orderings[group_idx]`
+// represents the ordering values corresponding to the `group_idx`-th 
group.
+orderings: Vec>,
+// At the beginning, `is_sets[group_idx]` is false, which means `first` is 
not seen yet.
+// Once we see the first value, we set the `is_sets[group_idx]` flag
+is_sets: BooleanBufferBuilder,
+// null_builder[group_idx] == false => vals[group_idx] is null
+null_builder: BooleanBufferBuilder,
+// size of `self.orderings`
+// Calculating the memory usage of `self.orderings` using 
`ScalarValue::size_of_vec` is quite costly.
+// Therefore, we cache it and compute `size_of` only after each update
+// to avoid calling `ScalarValue::size_of_vec` by Self.size.
+size_of_orderings: usize,
+
+// === option 
+
+// Stores the applicable ordering requirement.
+ordering_req: LexOrdering,
+// derived from `ordering_req`.
+sort_options: Vec,
+// Stores whether incoming data already satisfies the ordering requirement.
+input_requirement_satisfied: bool,
+// Ignore null values.
+ignore_nulls: bool,
+/// The output type
+data_type: DataType,
+default_orderings: Vec,
+}
+
+impl FirstGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+fn try_new(
+ordering_req: LexOrdering,
+ignore_nulls: bool,
+data_type: &DataType,
+ordering_dtypes: &[DataType],
+) -> Result {
+let requirement_satisfied = ordering_req.is_empty();
+
+let default_orderings = ordering_dtypes
+.iter()
+.map(ScalarValue::try_from)
+.collect::>>()?;
+
+let sort_options = get_sort_options(ordering_req.as_ref());
+
+Ok(Self {
+null_builder: BooleanBufferBuilder::new(0),
+ordering_req,
+sort_options,
+input_requirement_satisfied: requirement_satisfied,
+ignore_nulls,
+default_orderings,
+data_type: data_type.clone(),
+vals: Vec::new(),
+orderings: Vec::new(),
+is_sets: BooleanBufferBuilder::new(0),
+size_of_orderings: 0,
+})
+}
+
+fn need_update(&self, group_idx: usize) -> bool {
+if !self.is_sets.get_bit(group_idx) {
+return true;
+}
+
+if self.ignore_nulls && !self.null_builder.get_bit(group_idx) {
+return true;
+}
+
+!self.input_requirement_satisfied
+}
+
+fn should_update_state(
+&self,
+group_idx: usize,
+new_ordering_values: &[ScalarValue],
+) -> Result {
+if !self.is_sets.get_bit(group_idx) {
+return Ok(true);
+}
+
+assert!(new_ordering_values.len() == self.ordering_req.len());
+let current_ordering = &self.orderings[group_idx];
+compare_rows(current_ordering, new_ordering_values, &self.sort_options)
+.map(|x| x.is_gt())
+}
+
+fn take_orderings(&mut self, emit_to: EmitTo) -> Vec> {
+let result = emit_to.take_needed(&mut self.orderings);
+
+match emit_to {
+EmitTo::All => self.size_of_orderings = 0,
+EmitTo::First(_) => {
+self.size_of_orderings -=
+result.iter().map(ScalarValue::size_of_vec).sum::()
+}
+}
+
+result
+}
+
+fn take_need(
+bool_buf_builder: &mut BooleanBufferBuilder,
+emit_to: EmitTo,
+) -> BooleanBuffer {
+let bool_buf = bool_buf_builder.finish();
+match emit_to {
+EmitTo::All => bool_buf,
+EmitTo::First(n) => {
+// split off the first N values in seen_values
+//
+// TODO make this more efficient rather than two
+// copies and bitwise manipulation
+let first_n: BooleanBuffer = bool_buf.iter().take(n).collect();
+// reset the existing buffer
+for b in bool_buf.iter().skip(n) {
+bool_buf_builder.append(b);
+}
+first_n
+}
+}
+}
+
+fn resize_states(&mut self, new_size: usize) {
+self.vals.resize(new_size, T::default_value());
+
+if self.null_builder.len() < new_size {
+self.null_builder
+.append_n(new_size - self.null_bu

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-18 Thread via GitHub


Omega359 commented on code in PR #15242:
URL: https://github.com/apache/datafusion/pull/15242#discussion_r2001009802


##
datafusion/sqllogictest/test_files/union_by_name.slt:
##
@@ -287,3 +287,137 @@ SELECT '0' as c UNION ALL BY NAME SELECT 0 as c;
 
 0
 0
+
+# Regression tests for https://github.com/apache/datafusion/issues/15236
+# Ensure that the correct output is produced even if the width of an input 
node's
+# schema is the same as the resulting schema width after the union is applied.
+
+statement ok
+create table t3 (x varchar(255), y varchar(255), z varchar(255));
+
+statement ok
+create table t4 (x varchar(255), y varchar(255), z varchar(255));
+
+statement ok
+insert into t3 values ('a', 'b', 'c');
+
+statement ok
+insert into t4 values ('a', 'b', 'c');
+
+query  rowsort
+select t3.x, t3.y, t3.z from t3 union by name select t3.z, t3.y, t3.x, 'd' as 
zz from t3;
+
+a b c NULL
+a b c d
+
+query  rowsort
+select t3.x, t3.y, t3.z from t3 union by name select t4.z, t4.y, t4.x, 'd' as 
zz from t4;
+
+a b c NULL
+a b c d
+
+query TTT rowsort
+select x, y, z from t3 union all by name select z, y, x from t3;
+
+a b c
+a b c
+
+query TTT rowsort
+select x, y, z from t3 union all by name select z, y, x from t4;
+
+a b c
+a b c
+
+query TTT
+select x, y, z from t3 union all by name select z, y, x from t4 order by x;
+
+a b c
+a b c
+
+
+# FIXME: The following should pass without error, but currently it is failing
+# due to differing record batch schemas when the SLT runner collects results in
+# normalize::convert_batches.
+#
+# More context can be found here: 
https://github.com/apache/datafusion/pull/15242#issuecomment-2730402547
+query error
+select x, y, z from t3 union all by name select z, y, x, 'd' as zz from t3;
+
+DataFusion error: Internal error: Schema mismatch. Previously had
+Schema {
+fields: [
+Field {
+name: "x",
+data_type: Utf8,
+nullable: true,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+Field {
+name: "y",
+data_type: Utf8,
+nullable: true,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+Field {
+name: "z",
+data_type: Utf8,
+nullable: true,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+Field {
+name: "zz",
+data_type: Utf8,
+nullable: false,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+],
+metadata: {},
+}
+
+Got:
+Schema {
+fields: [
+Field {
+name: "x",
+data_type: Utf8,
+nullable: true,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+Field {
+name: "y",
+data_type: Utf8,
+nullable: true,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+Field {
+name: "z",
+data_type: Utf8,
+nullable: true,
+dict_id: 0,
+dict_is_ordered: false,
+metadata: {},
+},
+Field {
+name: "zz",
+data_type: Utf8,
+nullable: true,

Review Comment:
   Interesting that the nullable changed from false to true here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate datasource tests to insta [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #15258:
URL: https://github.com/apache/datafusion/pull/15258#discussion_r2001416176


##
datafusion/core/Cargo.toml:
##
@@ -126,6 +126,7 @@ datafusion-physical-plan = { workspace = true }
 datafusion-sql = { workspace = true }
 flate2 = { version = "1.1.0", optional = true }
 futures = { workspace = true }
+insta = "1.42.2"

Review Comment:
   you should be use to use the workspace version 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001379803


##
datafusion/core/tests/parquet/schema.rs:
##
@@ -153,7 +151,15 @@ async fn schema_merge_can_preserve_metadata() {
 
 let actual = df.collect().await.unwrap();
 
-assert_batches_sorted_eq!(expected, &actual);
+assert_snapshot!(batches_to_sort_string(&actual), @r"

Review Comment:
   change to batches_to_string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001384434


##
datafusion/core/tests/sql/path_partition.rs:
##
@@ -145,16 +146,7 @@ async fn parquet_distinct_partition_col() -> Result<()> {
 .collect()
 .await?;
 
-let expected = [
-"+--+---+-+",
-"| year | month | day |",
-"+--+---+-+",
-"| 2021 | 09| 09  |",
-"| 2021 | 10| 09  |",
-"| 2021 | 10| 28  |",
-"+--+---+-+",
-];
-assert_batches_sorted_eq!(expected, &result);
+assert_snapshot!(batches_to_sort_string(&result));

Review Comment:
   missing inline snapshot



##
datafusion/core/tests/sql/path_partition.rs:
##
@@ -275,18 +267,7 @@ async fn csv_filter_with_file_col() -> Result<()> {
 .collect()
 .await?;
 
-let expected = [
-"+++",
-"| c1 | c2 |",
-"+++",
-"| a  | 1  |",
-"| b  | 1  |",
-"| b  | 5  |",
-"| c  | 2  |",
-"| d  | 5  |",
-"+++",
-];
-assert_batches_sorted_eq!(expected, &result);
+assert_snapshot!(batches_to_sort_string(&result));

Review Comment:
   missing inline snapshot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub


sdht0 commented on PR #15286:
URL: https://github.com/apache/datafusion/pull/15286#issuecomment-2733877823

   Ah fixed it thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001108185


##
datafusion/sqllogictest/test_files/explain_tree.slt:
##
@@ -202,53 +202,48 @@ physical_plan
 02)│   AggregateExec   │
 03)│   │
 04)│   aggr: sum(bigint_col)   │
-05)│   │
-06)│ group_by: │
-07)│ string_col@0 as string_col│

Review Comment:
   got it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on PR #15268:
URL: https://github.com/apache/datafusion/pull/15268#issuecomment-2733344214

   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-18 Thread via GitHub


xudong963 merged PR #15268:
URL: https://github.com/apache/datafusion/pull/15268


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on issue #15290:
URL: https://github.com/apache/datafusion/issues/15290#issuecomment-2732132955

   > Will require an accurate distribution (not just an approximation
   
   Yes, it depends on whether each distribution is accurate, if they're, the 
merged distribution should be accurate, or we should merge them conservatively


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub


Dandandan commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000574008


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue {
 }
 }
 
+struct FirstGroupsAccumulator

Review Comment:
   ```suggestion
   struct FirstPrimitiveGroupsAccumulator
   ```
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


xudong963 commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r2000832270


##
datafusion/sqllogictest/test_files/explain_tree.slt:
##
@@ -202,53 +202,48 @@ physical_plan
 02)│   AggregateExec   │
 03)│   │
 04)│   aggr: sum(bigint_col)   │
-05)│   │
-06)│ group_by: │
-07)│ string_col@0 as string_col│

Review Comment:
   I think it'll be helpful to keep the index `@0`, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-03-18 Thread via GitHub


alamb commented on PR #14286:
URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2733529188

   Converting to a draft until we hav spawn service
   - https://github.com/apache/arrow-rs/pull/7253


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-18 Thread via GitHub


jayzhan211 commented on code in PR #15201:
URL: https://github.com/apache/datafusion/pull/15201#discussion_r2000904176


##
datafusion/core/tests/dataframe/mod.rs:
##
@@ -1571,14 +1571,18 @@ async fn with_column_join_same_columns() -> Result<()> {
 
 assert_snapshot!(
 df_with_column.logical_plan(),
-@r###"
+@r"
 Projection: t1.c1, t2.c1, Boolean(true) AS new_column
   Limit: skip=0, fetch=1
 Sort: t1.c1 ASC NULLS FIRST
   Inner Join: t1.c1 = t2.c1
-TableScan: t1
-TableScan: t2
-"###
+SubqueryAlias: t1

Review Comment:
   I explain it in the above 
https://github.com/apache/datafusion/pull/15201#discussion_r1996487546



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Triggering extended tests through PR comment [datafusion]

2025-03-18 Thread via GitHub


Omega359 commented on PR #15101:
URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2733251482

   Is this ready for review or is there something outstanding for it to be 
still in draft?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


irenjj commented on PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2733266739

   Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two 
points that remain unclear:
   1. For GROUP BY, is it necessary to preserve the row index -- for more 
information, additional functionality, or just better aesthetics?
   2. Should the default Display implementation of Expr be modified to replace 
SqlDisplay?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub


sdht0 commented on PR #15286:
URL: https://github.com/apache/datafusion/pull/15286#issuecomment-2733900724

   Weird that I received an email about another comment from Weijun-H but can't 
see it here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


irenjj commented on PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2732945218

   > Is it possible to modify `Display` for Expr for explain statement?
   
   I haven't tried it, not sure how it will affect other logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Fix multi-lines printing issue for datafusion-cli and add the streaming printing feature back [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #14954:
URL: https://github.com/apache/datafusion/pull/14954#discussion_r2001448048


##
datafusion-cli/tests/cli_integration.rs:
##
@@ -51,6 +51,163 @@ fn init() {
 ["--command", "show datafusion.execution.batch_size", "--format", "json", 
"-q", "-b", "1"],
 "[{\"name\":\"datafusion.execution.batch_size\",\"value\":\"1\"}]\n"
 )]
+

Review Comment:
   you should be able to use snapshots now (merge latest main to see the 
changes in that file). lmk if you need help



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Require Comet 0.6 Docker image for Spark 3.5.5 [datafusion-comet]

2025-03-18 Thread via GitHub


RaghavendraGanesh commented on issue #1509:
URL: 
https://github.com/apache/datafusion-comet/issues/1509#issuecomment-2733053103

   Thanks @andygrove , will give it a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [D] enum Expr extension on logical level [datafusion]

2025-03-18 Thread via GitHub


GitHub user bertvermeiren edited a discussion: enum Expr extension on logical 
level

Hi,

In order to write some additional logical and physical plan implementations, we 
do have to create some kind of "composite" logical expression. 

Nowadays in the code base you have already existing expression implementations 
with such composite behaviour like the `ScalarFunction` expression containing 
other expressions (the arguments) ; the `AggregateFunction` containing 
_filtering_ and _order by_ expressions, etc.

Lot's of interfaces/API/traits are working on the `enum Expr` directly :

`UserLogicalNode::with_exprs_and_inputs` and related 
`UserLogicalNode::expressions`
Applying optimisation rules on those logical plans by analysing and potentially 
rewriting expressions.

By not having some customisation, we loose the internal semantics  of 
“composite” logical expressions if those have to be “decomposed” again into 
existing expression implementations to be used throughout the codebase.

Would it be opportune the have an extension point on logical expression as 
well, like it has been introduced on the `LogicalPlan::Extension(Extension)`?

What are the opinions on this ? Are there other alternatives ? How would such 
an extension look like ? 

Regards, Bert.


GitHub link: https://github.com/apache/datafusion/discussions/15297


This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001378037


##
datafusion/core/tests/parquet/schema.rs:
##
@@ -82,7 +69,18 @@ async fn schema_merge_ignores_metadata_by_default() {
 .unwrap();
 let actual = df.collect().await.unwrap();
 
-assert_batches_sorted_eq!(expected, &actual);
+assert_snapshot!(batches_to_sort_string(&actual), @r"

Review Comment:
   change to batches_to_string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001380462


##
datafusion/core/tests/parquet/schema.rs:
##
@@ -167,7 +173,15 @@ async fn schema_merge_can_preserve_metadata() {
 assert_eq!(actual.clone(), expected_metadata);
 
 let actual = df.collect().await.unwrap();
-assert_batches_sorted_eq!(expected, &actual);
+assert_snapshot!(batches_to_sort_string(&actual), @r"

Review Comment:
   change to batch_to_string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001378793


##
datafusion/core/tests/parquet/schema.rs:
##
@@ -97,7 +95,18 @@ async fn schema_merge_ignores_metadata_by_default() {
 .collect()
 .await
 .unwrap();
-assert_batches_sorted_eq!(expected, &actual);
+assert_snapshot!(batches_to_sort_string(&actual), @r"

Review Comment:
   change to batches_to_string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001386538


##
datafusion/core/tests/sql/path_partition.rs:
##
@@ -430,21 +381,7 @@ async fn parquet_multiple_partitions() -> Result<()> {
 .collect()
 .await?;
 
-let expected = [
-"++-+",
-"| id | day |",
-"++-+",
-"| 0  | 09  |",
-"| 1  | 09  |",
-"| 2  | 09  |",
-"| 3  | 09  |",
-"| 4  | 09  |",
-"| 5  | 09  |",
-"| 6  | 09  |",
-"| 7  | 09  |",
-"++-+",
-];
-assert_batches_sorted_eq!(expected, &result);
+assert_snapshot!(batches_to_sort_string(&result));

Review Comment:
   missing inline snapshot



##
datafusion/core/tests/sql/path_partition.rs:
##
@@ -476,21 +413,7 @@ async fn parquet_multiple_nonstring_partitions() -> 
Result<()> {
 .collect()
 .await?;
 
-let expected = [
-"++-+",
-"| id | day |",
-"++-+",
-"| 0  | 9   |",
-"| 1  | 9   |",
-"| 2  | 9   |",
-"| 3  | 9   |",
-"| 4  | 9   |",
-"| 5  | 9   |",
-"| 6  | 9   |",
-"| 7  | 9   |",
-"++-+",
-];
-assert_batches_sorted_eq!(expected, &result);
+assert_snapshot!(batches_to_sort_string(&result));

Review Comment:
   missing inline snapshot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001387282


##
datafusion/core/tests/sql/select.rs:
##
@@ -30,23 +30,7 @@ async fn test_list_query_parameters() -> Result<()> {
 .with_param_values(vec![ScalarValue::from(3i32)])?
 .collect()
 .await?;
-let expected = vec![
-"+++---+",
-"| c1 | c2 | c3|",
-"+++---+",
-"| 3  | 1  | false |",
-"| 3  | 10 | true  |",
-"| 3  | 2  | true  |",
-"| 3  | 3  | false |",
-"| 3  | 4  | true  |",
-"| 3  | 5  | false |",
-"| 3  | 6  | true  |",
-"| 3  | 7  | false |",
-"| 3  | 8  | true  |",
-"| 3  | 9  | false |",
-"+++---+",
-];
-assert_batches_sorted_eq!(expected, &results);
+assert_snapshot!(batches_to_sort_string(&results));

Review Comment:
   missing inline snapshot



##
datafusion/core/tests/sql/select.rs:
##
@@ -66,33 +50,7 @@ async fn test_named_query_parameters() -> Result<()> {
 ])?
 .collect()
 .await?;
-let expected = vec![
-"+++",
-"| c1 | c2 |",
-"+++",
-"| 1  | 1  |",
-"| 1  | 2  |",
-"| 1  | 3  |",
-"| 1  | 4  |",
-"| 1  | 5  |",
-"| 1  | 6  |",
-"| 1  | 7  |",
-"| 1  | 8  |",
-"| 1  | 9  |",
-"| 1  | 10 |",
-"| 2  | 1  |",
-"| 2  | 2  |",
-"| 2  | 3  |",
-"| 2  | 4  |",
-"| 2  | 5  |",
-"| 2  | 6  |",
-"| 2  | 7  |",
-"| 2  | 8  |",
-"| 2  | 9  |",
-"| 2  | 10 |",
-"+++",
-];
-assert_batches_sorted_eq!(expected, &results);
+assert_snapshot!(batches_to_sort_string(&results));

Review Comment:
   missing inline snapshot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001385323


##
datafusion/core/tests/sql/path_partition.rs:
##
@@ -313,18 +294,7 @@ async fn csv_filter_with_file_nonstring_col() -> 
Result<()> {
 .collect()
 .await?;
 
-let expected = [
-"++++",
-"| c1 | c2 | date   |",
-"++++",
-"| a  | 1  | 2021-10-28 |",
-"| b  | 1  | 2021-10-28 |",
-"| b  | 5  | 2021-10-28 |",
-"| c  | 2  | 2021-10-28 |",
-"| d  | 5  | 2021-10-28 |",
-"++++",
-];
-assert_batches_sorted_eq!(expected, &result);
+assert_snapshot!(batches_to_sort_string(&result));

Review Comment:
   missing inline snapshot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001388172


##
datafusion/core/tests/sql/select.rs:
##
@@ -114,33 +72,7 @@ async fn test_prepare_statement() -> Result<()> {
 let dataframe = dataframe.with_param_values(param_values)?;
 let results = dataframe.collect().await?;
 
-let expected = vec![
-"+++",
-"| c1 | c2 |",
-"+++",
-"| 1  | 1  |",
-"| 1  | 10 |",
-"| 1  | 2  |",
-"| 1  | 3  |",
-"| 1  | 4  |",
-"| 1  | 5  |",
-"| 1  | 6  |",
-"| 1  | 7  |",
-"| 1  | 8  |",
-"| 1  | 9  |",
-"| 2  | 1  |",
-"| 2  | 10 |",
-"| 2  | 2  |",
-"| 2  | 3  |",
-"| 2  | 4  |",
-"| 2  | 5  |",
-"| 2  | 6  |",
-"| 2  | 7  |",
-"| 2  | 8  |",
-"| 2  | 9  |",
-"+++",
-];
-assert_batches_sorted_eq!(expected, &results);
+assert_snapshot!(batches_to_sort_string(&results));

Review Comment:
   missing inline snapshot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub


jsai28 commented on code in PR #15288:
URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001376856


##
datafusion/core/tests/parquet/custom_reader.rs:
##
@@ -96,17 +97,15 @@ async fn 
route_data_access_ops_to_parquet_file_reader_factory() {
 let task_ctx = session_ctx.task_ctx();
 let read = collect(parquet_exec, task_ctx).await.unwrap();
 
-let expected = [
-"+-+++",
-"| c1  | c2 | c3 |",
-"+-+++",
-"| Foo | 1  | 10 |",
-"| | 2  | 20 |",
-"| bar |||",
-"+-+++",
-];
-
-assert_batches_sorted_eq!(expected, &read);
+assert_snapshot!(batches_to_sort_string(&read), @r"

Review Comment:
   double check this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #15255:
URL: https://github.com/apache/datafusion/pull/15255#discussion_r2000955601


##
datafusion/core/tests/user_defined/expr_planner.rs:
##
@@ -73,52 +73,62 @@ async fn plan_and_collect(sql: &str) -> 
Result> {
 ctx.sql(sql).await?.collect().await
 }
 
+fn fmt_batches(batches: &[RecordBatch]) -> String {
+use arrow::util::pretty::pretty_format_batches;
+match pretty_format_batches(batches) {
+Ok(formatted) => formatted.to_string(),
+Err(e) => format!("Error formatting record batches: {}", e),
+}
+}

Review Comment:
   can we use `batches_to_string` for that? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-18 Thread via GitHub


alamb commented on PR #15280:
URL: https://github.com/apache/datafusion/pull/15280#issuecomment-2732820890

   🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #15255:
URL: https://github.com/apache/datafusion/pull/15255#discussion_r2000968071


##
datafusion/core/tests/user_defined/user_defined_table_functions.rs:
##
@@ -34,11 +34,19 @@ use datafusion::physical_plan::{collect, ExecutionPlan};
 use datafusion::prelude::SessionContext;
 use datafusion_catalog::Session;
 use datafusion_catalog::TableFunctionImpl;
-use datafusion_common::{assert_batches_eq, DFSchema, ScalarValue};
+use datafusion_common::{DFSchema, ScalarValue};
 use datafusion_expr::{EmptyRelation, Expr, LogicalPlan, Projection, TableType};
 
 use async_trait::async_trait;
 
+fn fmt_batches(batches: &[RecordBatch]) -> String {
+use arrow::util::pretty::pretty_format_batches;
+match pretty_format_batches(batches) {
+Ok(formatted) => formatted.to_string(),
+Err(e) => format!("Error formatting record batches: {}", e),
+}
+}
+

Review Comment:
   and this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #15255:
URL: https://github.com/apache/datafusion/pull/15255#discussion_r2000967075


##
datafusion/core/tests/user_defined/user_defined_window_functions.rs:
##
@@ -57,30 +57,38 @@ const BOUNDED_WINDOW_QUERY:  &str  =
  odd_counter(val) OVER (PARTITION BY x ORDER BY y ROWS BETWEEN 1 PRECEDING 
AND 1 FOLLOWING) \
  from t ORDER BY x, y";
 
-/// Test to show the contents of the setup
+fn fmt_batches(batches: &[RecordBatch]) -> String {
+use arrow::util::pretty::pretty_format_batches;
+match pretty_format_batches(batches) {
+Ok(formatted) => formatted.to_string(),
+Err(e) => format!("Error formatting record batches: {}", e),
+}
+}
+

Review Comment:
   i think this can be removed as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub


Omega359 opened a new issue, #15298:
URL: https://github.com/apache/datafusion/issues/15298

   ### Is your feature request related to a problem or challenge?
   
   A recent [supply chain 
attack](https://arstechnica.com/information-technology/2025/03/supply-chain-attack-exposing-credentials-affects-23k-users-of-tj-actions/)
 has made it extremely apparent that github workflows should only use actions 
that are tied to a specific hash, not a version. This applies to any 
non-github, non-apache action of which there seems to be a few:
   
   - 
[dev.yml](https://github.com/apache/datafusion/blob/main/.github/workflows/dev.yml)
 -> - uses: korandoru/hawkeye@v6
   - 
[rust.yml](https://github.com/apache/datafusion/blob/main/.github/workflows/rust.yml)
 -> - uses: korandoru/hawkeye@v6
   - 
[setup-macos-aarch64-builder/action.yaml](https://github.com/apache/datafusion/blob/main/.github/actions/setup-macos-aarch64-builder/action.yaml)
 -> uses: Swatinem/rust-cache@v2
   - 
[setup-rust-runtime/action.yaml](https://github.com/apache/datafusion/blob/main/.github/actions/setup-rust-runtime/action.yaml)
 -> uses: mozilla-actions/sccache-action@v0.0.4
   
   
   an example of how to use a sha hash instead of a version can be seen in the 
extended.yml file:
   
   `uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be`
   
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub


MohamedAbdeen21 commented on code in PR #1767:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2000753514


##
src/ast/mod.rs:
##
@@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier {
 write!(f, "")
 }
 Self::Local => {
-write!(f, " LOCAL")
+write!(f, "LOCAL ")
 }
 Self::Session => {
-write!(f, " SESSION")
+write!(f, "SESSION ")
 }
+Self::Global => {
+write!(f, "GLOBAL ")
+}
+}
+}
+}
+
+impl From> for ContextModifier {
+fn from(kw: Option) -> Self {
+match kw {
+Some(kw) => match kw {
+Keyword::LOCAL => Self::Local,
+Keyword::SESSION => Self::Session,
+Keyword::GLOBAL => Self::Global,
+_ => Self::None,
+},
+None => Self::None,
 }
 }

Review Comment:
   I'm not following. Instead of an `impl From> for 
ContextModifier` you'd prefer a `fn into_modifier(k: Option) -> 
ContextModifier`?e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15298:
URL: https://github.com/apache/datafusion/issues/15298#issuecomment-2734055350

   Thank you @Omega359  -- I agree this is very important


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-18 Thread via GitHub


alamb commented on code in PR #15263:
URL: https://github.com/apache/datafusion/pull/15263#discussion_r200126


##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -224,6 +224,327 @@ mod tests {
 )
 }
 
+#[tokio::test]
+async fn test_pushdown_with_missing_column_in_file() {
+let c1: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
+
+let file_schema =
+Arc::new(Schema::new(vec![Field::new("c1", DataType::Int32, 
true)]));
+
+let table_schema = Arc::new(Schema::new(vec![
+Field::new("c1", DataType::Int32, true),
+Field::new("c2", DataType::Int32, true),
+]));
+
+let batch = RecordBatch::try_new(file_schema.clone(), 
vec![c1]).unwrap();
+
+// Since c2 is missing from the file and we didn't supply a custom 
`SchemaAdapterFactory`,
+// the default behavior is to fill in missing columns with nulls.
+// Thus this predicate will come back as false.
+let filter = col("c2").eq(lit(1_i32));
+let rt = RoundTrip::new()
+.with_schema(table_schema.clone())
+.with_predicate(filter.clone())
+.with_pushdown_predicate()
+.round_trip(vec![batch.clone()])
+.await;
+let total_rows = rt
+.batches
+.unwrap()
+.iter()
+.map(|b| b.num_rows())
+.sum::();
+assert_eq!(total_rows, 0, "Expected no rows to match the predicate");
+let metrics = rt.parquet_exec.metrics().unwrap();
+let metric = get_value(&metrics, "pushdown_rows_pruned");
+assert_eq!(metric, 3, "Expected all rows to be pruned");
+
+// If we excplicitly allow nulls the rest of the predicate should work
+let filter = col("c2").is_null().and(col("c1").eq(lit(1_i32)));
+let rt = RoundTrip::new()
+.with_schema(table_schema.clone())
+.with_predicate(filter.clone())
+.with_pushdown_predicate()
+.round_trip(vec![batch.clone()])
+.await;
+let batches = rt.batches.unwrap();
+#[rustfmt::skip]
+let expected = [
+"+++",
+"| c1 | c2 |",
+"+++",
+"| 1  ||",
+"+++",
+];
+assert_batches_sorted_eq!(expected, &batches);
+let metrics = rt.parquet_exec.metrics().unwrap();
+let metric = get_value(&metrics, "pushdown_rows_pruned");
+assert_eq!(metric, 2, "Expected all rows to be pruned");
+}
+
+#[tokio::test]
+async fn test_pushdown_with_missing_column_in_file_multiple_types() {
+let c1: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
+
+let file_schema =
+Arc::new(Schema::new(vec![Field::new("c1", DataType::Int32, 
true)]));
+
+let table_schema = Arc::new(Schema::new(vec![
+Field::new("c1", DataType::Int32, true),
+Field::new("c2", DataType::Utf8, true),
+]));
+
+let batch = RecordBatch::try_new(file_schema.clone(), 
vec![c1]).unwrap();
+
+// Since c2 is missing from the file and we didn't supply a custom 
`SchemaAdapterFactory`,
+// the default behavior is to fill in missing columns with nulls.
+// Thus this predicate will come back as false.
+let filter = col("c2").eq(lit("abc"));
+let rt = RoundTrip::new()
+.with_schema(table_schema.clone())
+.with_predicate(filter.clone())
+.with_pushdown_predicate()
+.round_trip(vec![batch.clone()])
+.await;
+let total_rows = rt
+.batches
+.unwrap()
+.iter()
+.map(|b| b.num_rows())
+.sum::();
+assert_eq!(total_rows, 0, "Expected no rows to match the predicate");
+let metrics = rt.parquet_exec.metrics().unwrap();
+let metric = get_value(&metrics, "pushdown_rows_pruned");
+assert_eq!(metric, 3, "Expected all rows to be pruned");
+
+// If we excplicitly allow nulls the rest of the predicate should work
+let filter = col("c2").is_null().and(col("c1").eq(lit(1_i32)));
+let rt = RoundTrip::new()
+.with_schema(table_schema.clone())
+.with_predicate(filter.clone())
+.with_pushdown_predicate()
+.round_trip(vec![batch.clone()])
+.await;
+let batches = rt.batches.unwrap();
+#[rustfmt::skip]
+let expected = [
+"+++",
+"| c1 | c2 |",
+"+++",
+"| 1  ||",
+"+++",
+];
+assert_batches_sorted_eq!(expected, &batches);
+let metrics = rt.parquet_exec.metrics().unwrap();
+let metric = get_value(&metrics, "pushdown_rows_pruned");
+assert_eq!(metric, 2, "Expected all rows to be pruned");
+}
+

Re: [I] Add support for S3 Object Store in default binaries [datafusion-ballista]

2025-03-18 Thread via GitHub


milenkovicm commented on issue #1205:
URL: 
https://github.com/apache/datafusion-ballista/issues/1205#issuecomment-2734056479

   once we have s3 support it should work with minio. would you be interested 
in contributing @fithisux ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15298:
URL: https://github.com/apache/datafusion/issues/15298#issuecomment-2734058015

   I think this is a good first issue as the write up is clear and there is an 
example to follow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15291:
URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734059396

   Thanks @aectaan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15291:
URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734063672

   I wonder if you can get a pure SQL (`datafusion-cli` based) reproducer? Or 
does it require creating and configuring a custom context / optimizer rules 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub


xudong963 opened a new pull request, #15296:
URL: https://github.com/apache/datafusion/pull/15296

   ## Which issue does this PR close?
   
   
   
   - Closes https://github.com/apache/datafusion/issues/15290
   
   ## Rationale for this change
   
   
   
   See issue #15290 
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   
   
   
   I'll do it after we are consistent.
   
   ## Are there any user-facing changes?
   
   
   
   
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub


2010YOUY01 commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000593852


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue {
 }
 }
 
+struct FirstGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+//  state ===
+vals: Vec,
+// Stores ordering values, of the aggregator requirement corresponding to 
first value
+// of the aggregator.
+// The `orderings` are stored row-wise, meaning that `orderings[group_idx]`
+// represents the ordering values corresponding to the `group_idx`-th 
group.
+orderings: Vec>,
+// At the beginning, `is_sets[group_idx]` is false, which means `first` is 
not seen yet.
+// Once we see the first value, we set the `is_sets[group_idx]` flag
+is_sets: BooleanBufferBuilder,
+// null_builder[group_idx] == false => vals[group_idx] is null
+null_builder: BooleanBufferBuilder,
+// size of `self.orderings`
+// Calculating the memory usage of `self.orderings` using 
`ScalarValue::size_of_vec` is quite costly.
+// Therefore, we cache it and compute `size_of` only after each update
+// to avoid calling `ScalarValue::size_of_vec` by Self.size.
+size_of_orderings: usize,
+
+// === option 
+
+// Stores the applicable ordering requirement.
+ordering_req: LexOrdering,
+// derived from `ordering_req`.
+sort_options: Vec,
+// Stores whether incoming data already satisfies the ordering requirement.
+input_requirement_satisfied: bool,
+// Ignore null values.
+ignore_nulls: bool,
+/// The output type
+data_type: DataType,
+default_orderings: Vec,
+}
+
+impl FirstGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+fn try_new(
+ordering_req: LexOrdering,
+ignore_nulls: bool,
+data_type: &DataType,
+ordering_dtypes: &[DataType],
+) -> Result {
+let requirement_satisfied = ordering_req.is_empty();
+
+let default_orderings = ordering_dtypes
+.iter()
+.map(ScalarValue::try_from)
+.collect::>>()?;
+
+let sort_options = get_sort_options(ordering_req.as_ref());
+
+Ok(Self {
+null_builder: BooleanBufferBuilder::new(0),
+ordering_req,
+sort_options,
+input_requirement_satisfied: requirement_satisfied,
+ignore_nulls,
+default_orderings,
+data_type: data_type.clone(),
+vals: Vec::new(),
+orderings: Vec::new(),
+is_sets: BooleanBufferBuilder::new(0),
+size_of_orderings: 0,
+})
+}
+
+fn need_update(&self, group_idx: usize) -> bool {
+if !self.is_sets.get_bit(group_idx) {
+return true;
+}
+
+if self.ignore_nulls && !self.null_builder.get_bit(group_idx) {
+return true;
+}
+
+!self.input_requirement_satisfied
+}
+
+fn should_update_state(
+&self,
+group_idx: usize,
+new_ordering_values: &[ScalarValue],
+) -> Result {
+if !self.is_sets.get_bit(group_idx) {
+return Ok(true);
+}
+
+assert!(new_ordering_values.len() == self.ordering_req.len());
+let current_ordering = &self.orderings[group_idx];
+compare_rows(current_ordering, new_ordering_values, &self.sort_options)
+.map(|x| x.is_gt())
+}
+
+fn take_orderings(&mut self, emit_to: EmitTo) -> Vec> {
+let result = emit_to.take_needed(&mut self.orderings);
+
+match emit_to {
+EmitTo::All => self.size_of_orderings = 0,
+EmitTo::First(_) => {
+self.size_of_orderings -=
+result.iter().map(ScalarValue::size_of_vec).sum::()
+}
+}
+
+result
+}
+
+fn take_need(
+bool_buf_builder: &mut BooleanBufferBuilder,
+emit_to: EmitTo,
+) -> BooleanBuffer {
+let bool_buf = bool_buf_builder.finish();
+match emit_to {
+EmitTo::All => bool_buf,
+EmitTo::First(n) => {
+// split off the first N values in seen_values
+//
+// TODO make this more efficient rather than two
+// copies and bitwise manipulation
+let first_n: BooleanBuffer = bool_buf.iter().take(n).collect();
+// reset the existing buffer
+for b in bool_buf.iter().skip(n) {
+bool_buf_builder.append(b);
+}
+first_n
+}
+}
+}
+
+fn resize_states(&mut self, new_size: usize) {
+self.vals.resize(new_size, T::default_value());
+
+if self.null_builder.len() < new_size {
+self.null_builder
+.append_n(new_size - self.null

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


irenjj commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r2000877388


##
datafusion/sqllogictest/test_files/explain_tree.slt:
##
@@ -202,53 +202,48 @@ physical_plan
 02)│   AggregateExec   │
 03)│   │
 04)│   aggr: sum(bigint_col)   │
-05)│   │
-06)│ group_by: │
-07)│ string_col@0 as string_col│

Review Comment:
   It's true that column indices provide precise tracking and tracing 
capabilities for complex query analysis, but I think tree explain should be as 
simple as possible🤔, as doc says:
   ```
   /// TreeRender, displayed in the `tree` explain type.
   ///
   /// This format is inspired by DuckDB's explain plans. The information
   /// presented should be "user friendly", and contain only the most 
relevant
   /// information for understanding a plan. It should NOT contain the same 
level
   /// of detail information as the  [`Self::Default`] format.
```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore(deps): bump async-trait from 0.1.87 to 0.1.88 [datafusion]

2025-03-18 Thread via GitHub


xudong963 merged PR #15294:
URL: https://github.com/apache/datafusion/pull/15294


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [D] Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? [datafusion]

2025-03-18 Thread via GitHub


GitHub user dadepo added a comment to the discussion: Does DataFusion Support 
JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL?

> You could create a function called `jsonb_path_exists` that takes a binary 
> column and a json path string perhaps?

I think what I am missing is how this can be used via the expression API. Using 
sql, I know I can do something like:

```
SessionContext.sql("select jsonb_path_exists(d_col, path)")
```

But via the expression API, how will that look? ie

```
df.filter(col('d_col')...)
```

How will that look?

GitHub link: 
https://github.com/apache/datafusion/discussions/15264#discussioncomment-12534943


This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] [EPIC] A collection of tickets for improved WASM support in DataFusion [datafusion]

2025-03-18 Thread via GitHub


savaliyabhargav commented on issue #13815:
URL: https://github.com/apache/datafusion/issues/13815#issuecomment-2732178094

   @matthewmturner  yes sure i am interested can you please give me more detail 
about it
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub


xudong963 opened a new issue, #15290:
URL: https://github.com/apache/datafusion/issues/15290

   ### Is your feature request related to a problem or challenge?
   
   I'm working on the ticket: https://github.com/apache/datafusion/issues/10316.
   
   Given that, we'll replace all `Precision` with `Distribution`: 
https://github.com/synnada-ai/datafusion-upstream/pull/63. So, while I make the 
design for #10316, I presumably use `Distribution` in statistics.
   
   There is a spot where I'll do the `merge` for statistics, and it'll be 
spread to the `Distribution`.
   
   The specific case is that I need to compute the partition-level statistics, 
aka, files will be grouped as the filegroup, each file group will be treated as 
a partition, and different partitions will be processed in parallel. So, the 
partition-level statistics will be from the merge of the files in a filegroup.
   
   ### Describe the solution you'd like
   
   Create a function that combines their statistical properties into a new 
distribution. The most appropriate approach is to create a GenericDistribution 
that approximates the mixture of the two input distributions.
   
   ```rust
   pub fn merge_distributions(a: &Distribution, b: &Distribution) -> 
Result {
   ...
   }
   ```
   ---
   
   I'll open a PR and we can do more discussions based on the PR.
   
   ### Describe alternatives you've considered
   
   No
   
   ### Additional context
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Update links for released version [datafusion-comet]

2025-03-18 Thread via GitHub


andygrove commented on code in PR #1540:
URL: https://github.com/apache/datafusion-comet/pull/1540#discussion_r2001302504


##
docs/source/user-guide/kubernetes.md:
##
@@ -65,31 +65,31 @@ metadata:
 spec:
   type: Scala
   mode: cluster
-  image: ghcr.io/apache/datafusion-comet:spark-3.4-scala-2.12-0.5.0
+  image: apache/datafusion-comet:0.7.0-spark3.5.4-scala2.12-java11

Review Comment:
   I have not actually tested the changes in this doc. @comphead does this look 
correct to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Add support for S3 Object Store in default binaries [datafusion-ballista]

2025-03-18 Thread via GitHub


fithisux commented on issue #1205:
URL: 
https://github.com/apache/datafusion-ballista/issues/1205#issuecomment-2733646600

   I would rather have minio support because they provide full working docker 
images that you incorporate in a full fledge docker compose educational stack.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub


aectaan commented on issue #15291:
URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2732112476

   @alamb it's related to common subexpr eliminate:
   ```
   called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 
'common_sub_expression_eliminate' failed", SchemaError(FieldNotFound { field: 
Column { relation: None, name: "count(test.col_utf8) FILTER (WHERE $1 - 
Int64(1) <= test.col_int64)" }, valid_fields: [Column { relation: None, name: 
"count(test.col_utf8) FILTER (WHERE __common_expr_1 AS $1 - Int64(1) <= 
test.col_int64)" }, Column { relation: None, name: "count(test.col_utf8) FILTER 
(WHERE $1 - Int64(2) <= test.col_int64 AND test.col_uint32 >= Int64(0))" }, 
Column { relation: None, name: "count(test.col_utf8) FILTER (WHERE $1 - 
Int64(2) <= test.col_int64 AND test.col_uint32 >= Int64(0) AND test.col_uint32 
>= Int64(0))" }] }, Some("")))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-18 Thread via GitHub


kosiew opened a new pull request, #15295:
URL: https://github.com/apache/datafusion/pull/15295

   ## Which issue does this PR close?
   
   - Closes #14757.
   
   ## Rationale for this change
   
   This PR introduces a `NestedStructSchemaAdapter` to improve schema evolution 
handling in DataFusion when dealing with nested struct types. Currently, schema 
evolution primarily supports flat schemas, but evolving nested structures (such 
as adding new fields to existing structs) requires special handling. This 
change ensures better compatibility and adaptability for evolving datasets.
   
   ## What changes are included in this PR?
   
   - Introduces `NestedStructSchemaAdapter` to handle schema evolution for 
nested struct fields.
   - Implements `NestedStructSchemaAdapterFactory` to determine whether the 
specialized adapter is needed based on schema characteristics.
   - Enhances `SchemaMapping` with a new constructor for improved usability.
   - Updates `schema_adapter.rs` and integrates the new adapter into the 
`datafusion_datasource` module.
   - Adds comprehensive unit tests to verify the correctness of schema 
adaptation, including nested struct evolution scenarios.
   
   ## Are these changes tested?
   
   Yes, extensive unit tests have been added to verify:
   - Proper mapping of fields, including added and missing nested struct fields.
   - Correct adaptation from flat schemas to nested schemas.
   - Validation of different adapter selection logic based on schema 
characteristics.
   
   ## Are there any user-facing changes?
   
   No breaking changes.  
   However, users working with evolving nested struct schemas will benefit from 
improved support for automatic schema adaptation. This enhances compatibility 
with sources like Parquet, where schemas may change over time.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-18 Thread via GitHub


kosiew commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2000486806


##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,582 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! [`SchemaAdapter`] and [`SchemaAdapterFactory`] to adapt file-level record 
batches to a table schema.
+//!
+//! Adapter provides a method of translating the RecordBatches that come out 
of the
+//! physical format into how they should be used by DataFusion.  For instance, 
a schema
+//! can be stored external to a parquet file that maps parquet logical types 
to arrow types.
+
+use arrow::datatypes::{DataType, Field, Fields, Schema, SchemaRef};
+use datafusion_common::Result;
+use std::collections::HashMap;
+use std::sync::Arc;
+
+use crate::schema_adapter::DefaultSchemaAdapterFactory;
+use crate::schema_adapter::SchemaAdapter;
+use crate::schema_adapter::SchemaAdapterFactory;
+use crate::schema_adapter::SchemaMapper;
+use crate::schema_adapter::SchemaMapping;
+
+/// Factory for creating [`NestedStructSchemaAdapter`]
+///
+/// This factory creates schema adapters that properly handle schema evolution
+/// for nested struct fields, allowing new fields to be added to struct columns
+/// over time.
+#[derive(Debug, Clone, Default)]
+pub struct NestedStructSchemaAdapterFactory;
+
+impl SchemaAdapterFactory for NestedStructSchemaAdapterFactory {
+fn create(
+&self,
+projected_table_schema: SchemaRef,
+table_schema: SchemaRef,
+) -> Box {
+Box::new(NestedStructSchemaAdapter::new(
+projected_table_schema,
+table_schema,
+))
+}
+}
+
+impl NestedStructSchemaAdapterFactory {
+/// Create a new factory for mapping batches from a file schema to a table
+/// schema with support for nested struct evolution.
+///
+/// This is a convenience method that handles nested struct fields 
properly.
+pub fn from_schema(table_schema: SchemaRef) -> Box {
+Self.create(Arc::clone(&table_schema), table_schema)
+}
+
+/// Determines if a schema contains nested struct fields that would benefit
+/// from special handling during schema evolution
+pub fn has_nested_structs(schema: &Schema) -> bool {
+schema
+.fields()
+.iter()
+.any(|field| matches!(field.data_type(), DataType::Struct(_)))
+}
+
+/// Create an appropriate schema adapter based on schema characteristics.
+/// Returns a NestedStructSchemaAdapter if the projected schema contains 
nested structs,
+/// otherwise returns a DefaultSchemaAdapter.
+pub fn create_appropriate_adapter(
+projected_table_schema: SchemaRef,
+table_schema: SchemaRef,
+) -> Box {
+// Use nested adapter if target has nested structs
+if Self::has_nested_structs(table_schema.as_ref()) {
+NestedStructSchemaAdapterFactory.create(projected_table_schema, 
table_schema)
+} else {
+// Default case for simple schemas
+DefaultSchemaAdapterFactory.create(projected_table_schema, 
table_schema)
+}
+}
+}
+
+/// A SchemaAdapter that handles schema evolution for nested struct types
+#[derive(Debug, Clone)]
+pub struct NestedStructSchemaAdapter {
+/// The schema for the table, projected to include only the fields being 
output (projected) by the
+/// associated ParquetSource
+projected_table_schema: SchemaRef,
+/// The entire table schema for the table we're using this to adapt.
+///
+/// This is used to evaluate any filters pushed down into the scan
+/// which may refer to columns that are not referred to anywhere
+/// else in the plan.
+table_schema: SchemaRef,
+}
+
+impl NestedStructSchemaAdapter {
+/// Create a new NestedStructSchemaAdapter with the target schema
+pub fn new(projected_table_schema: SchemaRef, table_schema: SchemaRef) -> 
Self {
+Self {
+projected_table_schema,
+table_schema,
+}
+}
+
+pub fn projected_table_schema(&self) -> &Schema {
+self.projected_table_schema.as_ref()
+}
+
+pub fn table_schema(&self) -> &Schema {

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-18 Thread via GitHub


kosiew commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2000486806


##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,582 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! [`SchemaAdapter`] and [`SchemaAdapterFactory`] to adapt file-level record 
batches to a table schema.
+//!
+//! Adapter provides a method of translating the RecordBatches that come out 
of the
+//! physical format into how they should be used by DataFusion.  For instance, 
a schema
+//! can be stored external to a parquet file that maps parquet logical types 
to arrow types.
+
+use arrow::datatypes::{DataType, Field, Fields, Schema, SchemaRef};
+use datafusion_common::Result;
+use std::collections::HashMap;
+use std::sync::Arc;
+
+use crate::schema_adapter::DefaultSchemaAdapterFactory;
+use crate::schema_adapter::SchemaAdapter;
+use crate::schema_adapter::SchemaAdapterFactory;
+use crate::schema_adapter::SchemaMapper;
+use crate::schema_adapter::SchemaMapping;
+
+/// Factory for creating [`NestedStructSchemaAdapter`]
+///
+/// This factory creates schema adapters that properly handle schema evolution
+/// for nested struct fields, allowing new fields to be added to struct columns
+/// over time.
+#[derive(Debug, Clone, Default)]
+pub struct NestedStructSchemaAdapterFactory;
+
+impl SchemaAdapterFactory for NestedStructSchemaAdapterFactory {
+fn create(
+&self,
+projected_table_schema: SchemaRef,
+table_schema: SchemaRef,
+) -> Box {
+Box::new(NestedStructSchemaAdapter::new(
+projected_table_schema,
+table_schema,
+))
+}
+}
+
+impl NestedStructSchemaAdapterFactory {
+/// Create a new factory for mapping batches from a file schema to a table
+/// schema with support for nested struct evolution.
+///
+/// This is a convenience method that handles nested struct fields 
properly.
+pub fn from_schema(table_schema: SchemaRef) -> Box {
+Self.create(Arc::clone(&table_schema), table_schema)
+}
+
+/// Determines if a schema contains nested struct fields that would benefit
+/// from special handling during schema evolution
+pub fn has_nested_structs(schema: &Schema) -> bool {
+schema
+.fields()
+.iter()
+.any(|field| matches!(field.data_type(), DataType::Struct(_)))
+}
+
+/// Create an appropriate schema adapter based on schema characteristics.
+/// Returns a NestedStructSchemaAdapter if the projected schema contains 
nested structs,
+/// otherwise returns a DefaultSchemaAdapter.
+pub fn create_appropriate_adapter(
+projected_table_schema: SchemaRef,
+table_schema: SchemaRef,
+) -> Box {
+// Use nested adapter if target has nested structs
+if Self::has_nested_structs(table_schema.as_ref()) {
+NestedStructSchemaAdapterFactory.create(projected_table_schema, 
table_schema)
+} else {
+// Default case for simple schemas
+DefaultSchemaAdapterFactory.create(projected_table_schema, 
table_schema)
+}
+}
+}
+
+/// A SchemaAdapter that handles schema evolution for nested struct types
+#[derive(Debug, Clone)]
+pub struct NestedStructSchemaAdapter {
+/// The schema for the table, projected to include only the fields being 
output (projected) by the
+/// associated ParquetSource
+projected_table_schema: SchemaRef,
+/// The entire table schema for the table we're using this to adapt.
+///
+/// This is used to evaluate any filters pushed down into the scan
+/// which may refer to columns that are not referred to anywhere
+/// else in the plan.
+table_schema: SchemaRef,
+}
+
+impl NestedStructSchemaAdapter {
+/// Create a new NestedStructSchemaAdapter with the target schema
+pub fn new(projected_table_schema: SchemaRef, table_schema: SchemaRef) -> 
Self {
+Self {
+projected_table_schema,
+table_schema,
+}
+}
+
+pub fn projected_table_schema(&self) -> &Schema {
+self.projected_table_schema.as_ref()
+}
+
+pub fn table_schema(&self) -> &Schema {

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


irenjj commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001153997


##
datafusion/physical-plan/src/aggregates/mod.rs:
##
@@ -801,6 +803,16 @@ impl DisplayAs for AggregateExec {
 }
 }
 DisplayFormatType::TreeRender => {
+let format_expr_with_alias =
+|(e, alias): &(Arc, String)| -> String {
+let expr_sql = fmt_sql(e.as_ref()).to_string();

Review Comment:
   We have to construct a string to compare with alias.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub


onlyjackfrost commented on code in PR #15209:
URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001158621


##
datafusion/sql/src/expr/unary_op.rs:
##
@@ -45,7 +45,18 @@ impl SqlToRel<'_, S> {
 {
 Ok(operand)
 } else {
-plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types")
+plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types").map_err(|e| {
+let span = operand.spans().and_then(|s| s.first());
+let mut diagnostic = Diagnostic::new_error(
+format!("+ cannot be used with {data_type}"), 
+span
+);
+if span.is_none() {
+diagnostic.add_note("+ can only be used with 
numbers, intervals, and timestamps", None);
+diagnostic.add_help(format!("perhaps you need to 
cast {operand}"), None);
+}

Review Comment:
   ```
   # from the reply above
   note (without span) = "+ can only be used with numbers, intervals, and 
timestamps"
   help (without span) = "perhaps you need to cast {expression}"
   ```
   oops...  I might have misunderstood your reply.
   I'll remove the condition from here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add support for `RAISE` statement [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub


iffyio commented on code in PR #1766:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1766#discussion_r2001170116


##
src/ast/mod.rs:
##
@@ -2256,6 +2256,57 @@ impl fmt::Display for ConditionalStatements {
 }
 }
 
+/// A `RAISE` statement.
+///
+/// Examples:
+/// ```sql
+/// RAISE USING MESSAGE = 'error';
+///
+/// RAISE myerror;
+/// ```
+///
+/// 
[BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#raise)
+/// 
[Snowflake](https://docs.snowflake.com/en/sql-reference/snowflake-scripting/raise)
+#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
+pub struct RaiseStatement {
+pub value: Option,
+}
+
+impl fmt::Display for RaiseStatement {
+fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+let RaiseStatement { value } = self;
+
+write!(f, "RAISE")?;
+if let Some(value) = value {
+write!(f, " {value}")?;
+}
+
+Ok(())
+}
+}
+
+/// Represents the error value of a [RaiseStatement].
+#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
+#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
+#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
+pub enum RaiseStatementValue {
+/// `RAISE USING MESSAGE = 'error'`
+UsingMessage(Expr),

Review Comment:
   Yeah they can be arbitrary expressions, bigquery for example would accept 
this sql
   ```sql
   begin
select 1;
exception when error then
  raise using message = 2+3;
end;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub


irenjj commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001176277


##
datafusion/expr/src/expr.rs:
##
@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> {
 }
 }
 
+struct SqlDisplay<'a>(&'a Expr);
+impl Display for SqlDisplay<'_> {

Review Comment:
   Maybe we should take nested expr into consideration, like `aggr(case aggr() 
when...)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub


onlyjackfrost commented on code in PR #15209:
URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001158621


##
datafusion/sql/src/expr/unary_op.rs:
##
@@ -45,7 +45,18 @@ impl SqlToRel<'_, S> {
 {
 Ok(operand)
 } else {
-plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types")
+plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types").map_err(|e| {
+let span = operand.spans().and_then(|s| s.first());
+let mut diagnostic = Diagnostic::new_error(
+format!("+ cannot be used with {data_type}"), 
+span
+);
+if span.is_none() {
+diagnostic.add_note("+ can only be used with 
numbers, intervals, and timestamps", None);
+diagnostic.add_help(format!("perhaps you need to 
cast {operand}"), None);
+}

Review Comment:
   
   # from the reply above
   > note (without span) = "+ can only be used with numbers, intervals, and 
timestamps"
   > help (without span) = "perhaps you need to cast {expression}"
   
   oops...  I might have misunderstood your reply.
   I'll remove the condition from here.



##
datafusion/sql/src/expr/unary_op.rs:
##
@@ -45,7 +45,18 @@ impl SqlToRel<'_, S> {
 {
 Ok(operand)
 } else {
-plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types")
+plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types").map_err(|e| {
+let span = operand.spans().and_then(|s| s.first());
+let mut diagnostic = Diagnostic::new_error(
+format!("+ cannot be used with {data_type}"), 
+span
+);
+if span.is_none() {
+diagnostic.add_note("+ can only be used with 
numbers, intervals, and timestamps", None);
+diagnostic.add_help(format!("perhaps you need to 
cast {operand}"), None);
+}

Review Comment:
   
   from the reply above
   > note (without span) = "+ can only be used with numbers, intervals, and 
timestamps"
   > help (without span) = "perhaps you need to cast {expression}"
   
   oops...  I might have misunderstood your reply.
   I'll remove the condition from here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add support for `RAISE` statement [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub


iffyio merged PR #1766:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1766


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub


eliaperantoni commented on code in PR #15209:
URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001166961


##
datafusion/sql/src/expr/unary_op.rs:
##
@@ -45,7 +45,18 @@ impl SqlToRel<'_, S> {
 {
 Ok(operand)
 } else {
-plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types")
+plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types").map_err(|e| {
+let span = operand.spans().and_then(|s| s.first());
+let mut diagnostic = Diagnostic::new_error(
+format!("+ cannot be used with {data_type}"), 
+span
+);
+if span.is_none() {
+diagnostic.add_note("+ can only be used with 
numbers, intervals, and timestamps", None);
+diagnostic.add_help(format!("perhaps you need to 
cast {operand}"), None);
+}

Review Comment:
   Ahh sorry! Hahah. Yeah I meant to say that the note and the help should have 
no `Span` attached to them, in my opinion. But they should always be there.



##
datafusion/sql/src/expr/unary_op.rs:
##
@@ -45,7 +45,18 @@ impl SqlToRel<'_, S> {
 {
 Ok(operand)
 } else {
-plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types")
+plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types").map_err(|e| {
+let span = operand.spans().and_then(|s| s.first());
+let mut diagnostic = Diagnostic::new_error(
+format!("+ cannot be used with {data_type}"), 
+span
+);
+if span.is_none() {
+diagnostic.add_note("+ can only be used with 
numbers, intervals, and timestamps", None);
+diagnostic.add_help(format!("perhaps you need to 
cast {operand}"), None);
+}

Review Comment:
   Ahh sorry! Hahah. Yeah I meant to say that the note and the help should have 
no `Span` attached to them, in my opinion. But they (the note and the help) 
should always be there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub


onlyjackfrost commented on code in PR #15209:
URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001158621


##
datafusion/sql/src/expr/unary_op.rs:
##
@@ -45,7 +45,18 @@ impl SqlToRel<'_, S> {
 {
 Ok(operand)
 } else {
-plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types")
+plan_err!("Unary operator '+' only supports numeric, 
interval and timestamp types").map_err(|e| {
+let span = operand.spans().and_then(|s| s.first());
+let mut diagnostic = Diagnostic::new_error(
+format!("+ cannot be used with {data_type}"), 
+span
+);
+if span.is_none() {
+diagnostic.add_note("+ can only be used with 
numbers, intervals, and timestamps", None);
+diagnostic.add_help(format!("perhaps you need to 
cast {operand}"), None);
+}

Review Comment:
   
   > note (without span) = "+ can only be used with numbers, intervals, and 
timestamps"
   > help (without span) = "perhaps you need to cast {expression}"
   
   oops...  I might have misunderstood your reply.
   I'll remove the condition from here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2001206529


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue {
 }
 }
 
+struct FirstPrimitiveGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+//  state ===
+vals: Vec,
+// Stores ordering values, of the aggregator requirement corresponding to 
first value
+// of the aggregator.
+// The `orderings` are stored row-wise, meaning that `orderings[group_idx]`
+// represents the ordering values corresponding to the `group_idx`-th 
group.
+orderings: Vec>,
+// At the beginning, `is_sets[group_idx]` is false, which means `first` is 
not seen yet.
+// Once we see the first value, we set the `is_sets[group_idx]` flag
+is_sets: BooleanBufferBuilder,
+// null_builder[group_idx] == false => vals[group_idx] is null
+null_builder: BooleanBufferBuilder,

Review Comment:
   should we use `NullState` for that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub


blaginin commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2001210201


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue {
 }
 }
 
+struct FirstPrimitiveGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+//  state ===
+vals: Vec,
+// Stores ordering values, of the aggregator requirement corresponding to 
first value
+// of the aggregator.
+// The `orderings` are stored row-wise, meaning that `orderings[group_idx]`
+// represents the ordering values corresponding to the `group_idx`-th 
group.
+orderings: Vec>,
+// At the beginning, `is_sets[group_idx]` is false, which means `first` is 
not seen yet.
+// Once we see the first value, we set the `is_sets[group_idx]` flag
+is_sets: BooleanBufferBuilder,
+// null_builder[group_idx] == false => vals[group_idx] is null
+null_builder: BooleanBufferBuilder,
+// size of `self.orderings`
+// Calculating the memory usage of `self.orderings` using 
`ScalarValue::size_of_vec` is quite costly.
+// Therefore, we cache it and compute `size_of` only after each update
+// to avoid calling `ScalarValue::size_of_vec` by Self.size.
+size_of_orderings: usize,
+
+// === option 
+
+// Stores the applicable ordering requirement.
+ordering_req: LexOrdering,
+// derived from `ordering_req`.
+sort_options: Vec,
+// Stores whether incoming data already satisfies the ordering requirement.
+input_requirement_satisfied: bool,
+// Ignore null values.
+ignore_nulls: bool,
+/// The output type
+data_type: DataType,
+default_orderings: Vec,
+}
+
+impl FirstPrimitiveGroupsAccumulator
+where
+T: ArrowPrimitiveType + Send,
+{
+fn try_new(
+ordering_req: LexOrdering,
+ignore_nulls: bool,
+data_type: &DataType,
+ordering_dtypes: &[DataType],
+) -> Result {
+let requirement_satisfied = ordering_req.is_empty();
+
+let default_orderings = ordering_dtypes
+.iter()
+.map(ScalarValue::try_from)
+.collect::>>()?;
+
+let sort_options = get_sort_options(ordering_req.as_ref());
+
+Ok(Self {
+null_builder: BooleanBufferBuilder::new(0),
+ordering_req,
+sort_options,
+input_requirement_satisfied: requirement_satisfied,
+ignore_nulls,
+default_orderings,
+data_type: data_type.clone(),
+vals: Vec::new(),
+orderings: Vec::new(),
+is_sets: BooleanBufferBuilder::new(0),
+size_of_orderings: 0,
+})
+}
+
+fn need_update(&self, group_idx: usize) -> bool {
+if !self.is_sets.get_bit(group_idx) {
+return true;
+}
+
+if self.ignore_nulls && !self.null_builder.get_bit(group_idx) {
+return true;
+}
+
+!self.input_requirement_satisfied
+}
+
+fn should_update_state(
+&self,
+group_idx: usize,
+new_ordering_values: &[ScalarValue],
+) -> Result {
+if !self.is_sets.get_bit(group_idx) {
+return Ok(true);
+}
+
+assert!(new_ordering_values.len() == self.ordering_req.len());
+let current_ordering = &self.orderings[group_idx];
+compare_rows(current_ordering, new_ordering_values, &self.sort_options)
+.map(|x| x.is_gt())
+}
+
+fn take_orderings(&mut self, emit_to: EmitTo) -> Vec> {
+let result = emit_to.take_needed(&mut self.orderings);
+
+match emit_to {
+EmitTo::All => self.size_of_orderings = 0,
+EmitTo::First(_) => {
+self.size_of_orderings -=
+result.iter().map(ScalarValue::size_of_vec).sum::()
+}
+}
+
+result
+}
+
+fn take_need(
+bool_buf_builder: &mut BooleanBufferBuilder,
+emit_to: EmitTo,
+) -> BooleanBuffer {
+let bool_buf = bool_buf_builder.finish();
+match emit_to {
+EmitTo::All => bool_buf,
+EmitTo::First(n) => {
+// split off the first N values in seen_values
+//
+// TODO make this more efficient rather than two
+// copies and bitwise manipulation
+let first_n: BooleanBuffer = bool_buf.iter().take(n).collect();
+// reset the existing buffer
+for b in bool_buf.iter().skip(n) {
+bool_buf_builder.append(b);
+}
+first_n
+}
+}
+}
+
+fn resize_states(&mut self, new_size: usize) {
+self.vals.resize(new_size, T::default_value());
+
+if self.null_builder.len() < new_size {
+self.null_builder
+.append_n(new_

Re: [I] Timeouts reading "large" files from object stores over "slow" connections [datafusion]

2025-03-18 Thread via GitHub


alamb commented on issue #15067:
URL: https://github.com/apache/datafusion/issues/15067#issuecomment-2733546979

   I am convinced this issue would be solved with automatic retries
   - https://github.com/apache/arrow-rs/issues/7242


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub


iffyio commented on code in PR #1767:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2000290321


##
src/ast/mod.rs:
##
@@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier {
 write!(f, "")
 }
 Self::Local => {
-write!(f, " LOCAL")
+write!(f, "LOCAL ")
 }
 Self::Session => {
-write!(f, " SESSION")
+write!(f, "SESSION ")
 }
+Self::Global => {
+write!(f, "GLOBAL ")
+}
+}
+}
+}
+
+impl From> for ContextModifier {
+fn from(kw: Option) -> Self {
+match kw {
+Some(kw) => match kw {
+Keyword::LOCAL => Self::Local,
+Keyword::SESSION => Self::Session,
+Keyword::GLOBAL => Self::Global,
+_ => Self::None,
+},
+None => Self::None,
 }
 }

Review Comment:
   I think we can instead turn this into a regular function if the goal is to 
reuse it, since its not expected to be able to turn an arbitrary keyword into a 
ContextModifier. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] chore(deps): bump uuid from 1.15.1 to 1.16.0 [datafusion]

2025-03-18 Thread via GitHub


dependabot[bot] opened a new pull request, #15292:
URL: https://github.com/apache/datafusion/pull/15292

   Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.15.1 to 1.16.0.
   
   Release notes
   Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases.
   
   v1.16.0
   What's Changed
   
   Mark Uuid::new_v8 const by https://github.com/tguichaoua";>@​tguichaoua in https://redirect.github.com/uuid-rs/uuid/pull/815";>uuid-rs/uuid#815
   Prepare for 1.16.0 release by https://github.com/KodrAus";>@​KodrAus in https://redirect.github.com/uuid-rs/uuid/pull/817";>uuid-rs/uuid#817
   
   New Contributors
   
   https://github.com/tguichaoua";>@​tguichaoua 
made their first contribution in https://redirect.github.com/uuid-rs/uuid/pull/815";>uuid-rs/uuid#815
   
   Full Changelog: https://github.com/uuid-rs/uuid/compare/v1.15.1...v1.16.0";>https://github.com/uuid-rs/uuid/compare/v1.15.1...v1.16.0
   
   
   
   Commits
   
   https://github.com/uuid-rs/uuid/commit/c36beb14d50f835c1f1220117ca51aae64860a3e";>c36beb1
 Merge pull request https://redirect.github.com/uuid-rs/uuid/issues/817";>#817 from 
uuid-rs/cargo/v1.16.0
   https://github.com/uuid-rs/uuid/commit/5338b246b7a8244cab3cfaa85b14fe1d1bcdcd96";>5338b24
 prepare for 1.16.0 release
   https://github.com/uuid-rs/uuid/commit/420f6279aeff48f0e12b0b39af43a5c149963382";>420f627
 Merge pull request https://redirect.github.com/uuid-rs/uuid/issues/815";>#815 from 
tguichaoua/new_v8_const
   https://github.com/uuid-rs/uuid/commit/254258c8c7c7d6c41aaf6f573dc1731549d519b2";>254258c
 mark Uuid::new_v8 const
   See full diff in https://github.com/uuid-rs/uuid/compare/v1.15.1...v1.16.0";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=uuid&package-manager=cargo&previous-version=1.15.1&new-version=1.16.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] chore(deps): bump rust_decimal from 1.36.0 to 1.37.0 [datafusion]

2025-03-18 Thread via GitHub


dependabot[bot] opened a new pull request, #15293:
URL: https://github.com/apache/datafusion/pull/15293

   Bumps [rust_decimal](https://github.com/paupino/rust-decimal) from 1.36.0 to 
1.37.0.
   
   Release notes
   Sourced from https://github.com/paupino/rust-decimal/releases";>rust_decimal's 
releases.
   
   1.37.0
   What's Changed
   
   Bumps diesel version by https://github.com/Dylan-DPC";>@​Dylan-DPC in https://redirect.github.com/paupino/rust-decimal/pull/677";>paupino/rust-decimal#677
   Fix f64 dec limit edge case by https://github.com/finnbear";>@​finnbear in https://redirect.github.com/paupino/rust-decimal/pull/678";>paupino/rust-decimal#678
   Rkyv 0.8 example by https://github.com/Tony-Samuels";>@​Tony-Samuels in https://redirect.github.com/paupino/rust-decimal/pull/683";>paupino/rust-decimal#683
   Add explanation of how to enable maths feature by https://github.com/olgabot";>@​olgabot in https://redirect.github.com/paupino/rust-decimal/pull/681";>paupino/rust-decimal#681
   Create pub constant Decimal::MAX_SCALE by https://github.com/Tony-Samuels";>@​Tony-Samuels in https://redirect.github.com/paupino/rust-decimal/pull/685";>paupino/rust-decimal#685
   Fix panic when parsing improper scientific notation. by https://github.com/schungx";>@​schungx in https://redirect.github.com/paupino/rust-decimal/pull/700";>paupino/rust-decimal#700
   Clean up tests to use Ok/Err format by https://github.com/paupino";>@​paupino in https://redirect.github.com/paupino/rust-decimal/pull/703";>paupino/rust-decimal#703
   Add dec!() macro by example by https://github.com/daniel-pfeiffer";>@​daniel-pfeiffer in 
https://redirect.github.com/paupino/rust-decimal/pull/692";>paupino/rust-decimal#692
   feat: add rand-0_9 crate feature by https://github.com/robjtede";>@​robjtede in https://redirect.github.com/paupino/rust-decimal/pull/702";>paupino/rust-decimal#702
   Cargo.toml:  Only enable mysql_backend on diesel, not the whole mysql 
feature by https://github.com/mcronce";>@​mcronce in 
https://redirect.github.com/paupino/rust-decimal/pull/707";>paupino/rust-decimal#707
   Backwards compatibility for dec!() macro by https://github.com/paupino";>@​paupino in https://redirect.github.com/paupino/rust-decimal/pull/711";>paupino/rust-decimal#711
   Specify fixed version for macros dependency by https://github.com/paupino";>@​paupino in https://redirect.github.com/paupino/rust-decimal/pull/712";>paupino/rust-decimal#712
   
   New Contributors
   
   https://github.com/Dylan-DPC";>@​Dylan-DPC made 
their first contribution in https://redirect.github.com/paupino/rust-decimal/pull/677";>paupino/rust-decimal#677
   https://github.com/finnbear";>@​finnbear made 
their first contribution in https://redirect.github.com/paupino/rust-decimal/pull/678";>paupino/rust-decimal#678
   https://github.com/Tony-Samuels";>@​Tony-Samuels made 
their first contribution in https://redirect.github.com/paupino/rust-decimal/pull/683";>paupino/rust-decimal#683
   https://github.com/olgabot";>@​olgabot made 
their first contribution in https://redirect.github.com/paupino/rust-decimal/pull/681";>paupino/rust-decimal#681
   https://github.com/daniel-pfeiffer";>@​daniel-pfeiffer 
made their first contribution in https://redirect.github.com/paupino/rust-decimal/pull/692";>paupino/rust-decimal#692
   https://github.com/mcronce";>@​mcronce made 
their first contribution in https://redirect.github.com/paupino/rust-decimal/pull/707";>paupino/rust-decimal#707
   
   Full Changelog: https://github.com/paupino/rust-decimal/compare/1.36.0...1.37.0";>https://github.com/paupino/rust-decimal/compare/1.36.0...1.37.0
   
   
   
   Changelog
   Sourced from https://github.com/paupino/rust-decimal/blob/master/CHANGELOG.md";>rust_decimal's
 changelog.
   
   Version History
   Please see https://github.com/paupino/rust-decimal/releases";>Github Releases for 
version history going forward.
   
   
   
   Commits
   
   https://github.com/paupino/rust-decimal/commit/ffe9495d7734d8021718050f98e3f48f39c428b2";>ffe9495
 Specify fixed version for macros dependency
   https://github.com/paupino/rust-decimal/commit/39fe37eca9d78aad65d9be2b1bdb74c2ad037c37";>39fe37e
 Specify fixed version for macros dependency (https://redirect.github.com/paupino/rust-decimal/issues/712";>#712)
   https://github.com/paupino/rust-decimal/commit/ad974543184f7047fa97acae8a4a15201d3bafdf";>ad97454
 Refactor macro for backwards compatability  (https://redirect.github.com/paupino/rust-decimal/issues/711";>#711)
   https://github.com/paupino/rust-decimal/commit/f34cefc1127d358acbd6eb73c3bdd1de088e552a";>f34cefc
 Reduce required mysql dependencies for diesel (https://redirect.github.com/paupino/rust-decimal/issues/707";>#707)
   https://github.com/paupino/rust-decimal/commit/41ce632f0f84a62f187359f9dacafa86c0b223bd";>41ce632
 feat: add rand-0_9 crate feature (https://redirect.github.com/paupino/rust-decimal/issues/702";>#702)
   https://github.com/paupino/rust-decimal/commit/968997115a1d031e2

[PR] chore(deps): bump async-trait from 0.1.87 to 0.1.88 [datafusion]

2025-03-18 Thread via GitHub


dependabot[bot] opened a new pull request, #15294:
URL: https://github.com/apache/datafusion/pull/15294

   Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.87 to 
0.1.88.
   
   Release notes
   Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's 
releases.
   
   0.1.88
   
   Fix lifetime bounding on generic parameters that have cfg (https://redirect.github.com/dtolnay/async-trait/issues/289";>#289)
   
   
   
   
   Commits
   
   https://github.com/dtolnay/async-trait/commit/b3a59195c29c5b336490cec1bac23cff8d3e4483";>b3a5919
 Release 0.1.88
   https://github.com/dtolnay/async-trait/commit/a306be84ec998f46acc700e8b24a3b68b77a873a";>a306be8
 Merge pull request https://redirect.github.com/dtolnay/async-trait/issues/289";>#289 from 
dtolnay/cfg
   https://github.com/dtolnay/async-trait/commit/d3059849a4024425f80f0713bc802d8959290d96";>d305984
 Fix lifetime bounding on generic parameters that have cfg
   https://github.com/dtolnay/async-trait/commit/78506f17149e08594c1a120f1df828411772a0b8";>78506f1
 Add regression test for issue 288
   https://github.com/dtolnay/async-trait/commit/a11384eec60634098f66a3d6ac89c23beccdbbc8";>a11384e
 Add issue 283 link in test
   See full diff in https://github.com/dtolnay/async-trait/compare/0.1.87...0.1.88";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=async-trait&package-manager=cargo&previous-version=0.1.87&new-version=0.1.88)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-18 Thread via GitHub


suibianwanwank commented on PR #15281:
URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2731893665

   @alamb Hi, I have updated the PR title to start with "fix," but it seems 
that the "bug" label has not been added. And the PR is now ready for review, 
Please take a look at your convenience~ Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: consistently apply `clippy::clone_on_ref_ptr` in all crates [datafusion]

2025-03-18 Thread via GitHub


berkaysynnada merged PR #15284:
URL: https://github.com/apache/datafusion/pull/15284


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: consistently apply `clippy::clone_on_ref_ptr` in all crates [datafusion]

2025-03-18 Thread via GitHub


berkaysynnada commented on PR #15284:
URL: https://github.com/apache/datafusion/pull/15284#issuecomment-2731952749

   Thank you @alamb and @comphead 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Upgrade Guide for DataFusion 46 does not include the array signatures change [datafusion]

2025-03-18 Thread via GitHub


alamb closed issue #15105: Upgrade Guide for DataFusion 46 does not include the 
array signatures change
URL: https://github.com/apache/datafusion/issues/15105


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



  1   2   3   >