Re: [PR] feat: metadata columns [datafusion]
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2621372943 I also have an another approach. ```rust struct ColumnIndex { pub index: usize, pub is_metadata_column: bool } impl Into for ColumnIndex { } impl From for ColumnIndex { } ``` then we can hide METADATA_OFFSET logic without performance concern. what do you think about this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Fix UNION field nullability tracking [datafusion]
findepi opened a new pull request, #14356: URL: https://github.com/apache/datafusion/pull/14356 This commit fixes two bugs related to UNION handling - when constructing union plan nullability of the other union branch was ignored, thus resulting field could easily have incorrect nullability - when pruning/simplifying projects, in `recompute_schema` function there was similar logic, thus loosing nullability information even for correctly constructed Union plan node As a result, other optimizer logic (e.g. `expr_simplifier.rs`) could draw incorrect conclusions and thus lead to incorrect query results, as demonstrated with the attached SLT test. - fixes https://github.com/apache/datafusion/issues/14352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Feature: Monotonic Sets [datafusion]
ozankabak commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1933977485 ## datafusion/physical-expr/src/window/standard.rs: ## @@ -65,33 +65,19 @@ impl StandardWindowExpr { &self.expr } -/// Adds any equivalent orderings generated by the `self.expr` -/// to `builder`. +/// Adds any equivalent orderings generated by the `self.expr` to `builder`. Review Comment: ```suggestion /// Adds any equivalent orderings generated by `self.expr` to `builder`. ``` ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -21,16 +21,17 @@ use std::any::Any; use std::ops::Range; use std::sync::Arc; -use arrow::array::Array; -use arrow::record_batch::RecordBatch; -use arrow::{array::ArrayRef, datatypes::Field}; - use crate::aggregate::AggregateFunctionExpr; +use crate::window::standard::add_new_ordering_expr_with_partition_by; use crate::window::window_expr::AggregateWindowExpr; use crate::window::{ PartitionBatches, PartitionWindowAggStates, SlidingAggregateWindowExpr, WindowExpr, }; -use crate::{reverse_order_bys, PhysicalExpr}; +use crate::{reverse_order_bys, EquivalenceProperties, PhysicalExpr}; + +use arrow::array::Array; +use arrow::record_batch::RecordBatch; +use arrow::{array::ArrayRef, datatypes::Field}; use datafusion_common::ScalarValue; use datafusion_common::{DataFusionError, Result}; Review Comment: ```suggestion use datafusion_common::{DataFusionError, ScalarValue, Result}; ``` ## datafusion/physical-expr/src/aggregate.rs: ## @@ -533,6 +536,29 @@ impl AggregateFunctionExpr { pub fn default_value(&self, data_type: &DataType) -> Result { self.fun.default_value(data_type) } + +/// Indicates whether the aggregation function is monotonic as a set function. A set +/// function is monotonically increasing if its value increases as its argument grows +/// (as a set). Formally, `f` is a monotonically increasing set function if `f(S) >= f(T)` +/// whenever `S` is a superset of `T`. +pub fn set_monotonicity(&self) -> AggregateExprSetMonotonicity { +let field = self.field(); +let data_type = field.data_type(); +self.fun.inner().set_monotonicity(data_type) +} + +/// Returns PhysicalSortExpr based on monotonicity of the function Review Comment: ```suggestion /// Returns `PhysicalSortExpr` based on the set monotonicity of the function. ``` ## datafusion/physical-expr/src/aggregate.rs: ## @@ -35,24 +35,27 @@ pub mod utils { }; } +use std::fmt::Debug; +use std::sync::Arc; + +use crate::expressions::Column; + use arrow::datatypes::{DataType, Field, Schema, SchemaRef}; +use arrow_schema::SortOptions; use datafusion_common::ScalarValue; use datafusion_common::{internal_err, not_impl_err, Result}; Review Comment: ```suggestion use datafusion_common::{internal_err, not_impl_err, ScalarValue, Result}; ``` ## datafusion/physical-expr/src/window/standard.rs: ## @@ -283,3 +269,27 @@ impl WindowExpr for StandardWindowExpr { } } } + +/// Adds new ordering expression into the existing ordering equivalence class based on partition by information. Review Comment: ```suggestion /// Adds a new ordering expression into existing ordering equivalence class(es) based on /// PARTITION BY information (if it exists). ``` ## datafusion/physical-expr/src/aggregate.rs: ## @@ -35,24 +35,27 @@ pub mod utils { }; } +use std::fmt::Debug; +use std::sync::Arc; + +use crate::expressions::Column; + use arrow::datatypes::{DataType, Field, Schema, SchemaRef}; +use arrow_schema::SortOptions; use datafusion_common::ScalarValue; use datafusion_common::{internal_err, not_impl_err, Result}; -use datafusion_expr::AggregateUDF; use datafusion_expr::ReversedUDAF; +use datafusion_expr::{AggregateExprSetMonotonicity, AggregateUDF}; Review Comment: ```suggestion use datafusion_expr::{AggregateExprSetMonotonicity, AggregateUDF, ReversedUDAF}; ``` ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -648,11 +649,32 @@ impl AggregateExec { group_expr_mapping: &ProjectionMapping, mode: &AggregateMode, input_order_mode: &InputOrderMode, +aggr_exprs: Vec>, Review Comment: A cursory look suggests this can be a slice. If that is true, let's avoid copying. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands
Re: [PR] Fix incorrect searched CASE optimization [datafusion]
findepi closed pull request #14349: Fix incorrect searched CASE optimization URL: https://github.com/apache/datafusion/pull/14349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Implement physical optimizer rule to apply type coercion to physical plans [datafusion]
andygrove commented on issue #14324: URL: https://github.com/apache/datafusion/issues/14324#issuecomment-2621810595 > BTW this might be fairly straightforward to add as an extension in comet (maybe it doesn't have to be in the datafusion core if it is only used by systems that use the physical plan) We will probably do this in the short term, but don't we want DataFusion to be supported for the execution engine use case, similar to Velox? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] add manual trigger for extended tests in pull requests [datafusion]
Omega359 commented on code in PR #14331: URL: https://github.com/apache/datafusion/pull/14331#discussion_r1934000529 ## .github/workflows/extended.yml: ## @@ -31,14 +31,38 @@ on: push: branches: - main + issue_comment: +types: [created] + +permissions: + pull-requests: write jobs: + # Check issue comment and notify that extended tests are running + check_issue_comment: +name: Check issue comment +runs-on: ubuntu-latest +if: github.event.issue.pull_request && github.event.comment.body == 'run extended tests' +steps: + - uses: actions/github-script@v7 +with: + github-token: ${{secrets.GITHUB_TOKEN}} + script: | +github.rest.issues.createComment({ Review Comment: Using other actions in an apache repo is problematic as I previously found out. However, it's not hard to have an action post a comment and then edit it to show progress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[I] Decimal promotion is not applied consistently [datafusion-comet]
andygrove opened a new issue, #1350: URL: https://github.com/apache/datafusion-comet/issues/1350 ### Describe the bug In `QueryPlanSerde`, we have code in `exprToProto` to ensure the decimal precision is the same for both children of a binary expression. ```scala def exprToProto( expr: Expression, inputs: Seq[Attribute], binding: Boolean = true): Option[Expr] = { val conf = SQLConf.get val newExpr = DecimalPrecision.promote(conf.decimalOperationsAllowPrecisionLoss, expr, !conf.ansiEnabled) exprToProtoInternal(newExpr, inputs, binding) } ``` However, this calls `exprToProtoInternal` which is a recursive method. For example, when creating a binary expression: ```scala val leftExpr = exprToProtoInternal(left, inputs, binding) val rightExpr = exprToProtoInternal(right, inputs, binding) ``` The decimal promotion logic is not applied in this case. Given that `exprToProto` contains a single line of code and then delegates to `exprToProtoInternal`, I propose that we simply combine these two methods. ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add related source code locations to errors [datafusion]
alamb commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1933723286 ## datafusion/common/src/diagnostic.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use sqlparser::tokenizer::Span; + +/// Additional contextual information intended for end users, to help them +/// understand what went wrong by providing human-readable messages, and +/// locations in the source query that relate to the error in some way. +/// +/// You can think of a single [`Diagnostic`] as a single "block" of output from +/// rustc. i.e. either an error or a warning, optionally with some notes and +/// help messages. +/// +/// If the diagnostic, a note, or a help message doesn't need to point to a +/// specific location in the original SQL query (or the [`Span`] is not +/// available), use [`Span::empty`]. +/// +/// Example: +/// +/// ```rust +/// # use datafusion_common::Diagnostic; +/// # use sqlparser::tokenizer::Span; +/// # let span = Span::empty(); +/// let diagnostic = Diagnostic::new_error("Something went wrong", span) +/// .with_help("Have you tried turning it on and off again?", Span::empty()); +/// ``` +#[derive(Debug, Clone)] +pub struct Diagnostic { +pub kind: DiagnosticKind, +pub message: String, +pub span: Span, +pub notes: Vec, +pub helps: Vec, +} + +#[derive(Debug, Clone)] +pub struct DiagnosticNote { +pub message: String, +pub span: Span, +} + +#[derive(Debug, Clone)] +pub struct DiagnosticHelp { Review Comment: perhaps that could be added in comments as it may be non obvious to other readers ```suggestion /// Suggestions as to how to solve the error /// /// This is different than [`DiagnosticNote`] which explains *why* the error occurred. /// For exmaple that a non aggregate expression is not in the `GROUP BY` clause. /// `DiagnosticHelp` contains a suggestion to solve the error, for example /// add the expression to the `GROUP BY`. pub struct DiagnosticHelp { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add related source code locations to errors [datafusion]
alamb commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1933725543 ## datafusion/common/src/diagnostic.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use sqlparser::tokenizer::Span; + +/// Additional contextual information intended for end users, to help them +/// understand what went wrong by providing human-readable messages, and +/// locations in the source query that relate to the error in some way. +/// +/// You can think of a single [`Diagnostic`] as a single "block" of output from +/// rustc. i.e. either an error or a warning, optionally with some notes and +/// help messages. +/// +/// If the diagnostic, a note, or a help message doesn't need to point to a +/// specific location in the original SQL query (or the [`Span`] is not +/// available), use [`Span::empty`]. +/// +/// Example: +/// +/// ```rust +/// # use datafusion_common::Diagnostic; +/// # use sqlparser::tokenizer::Span; +/// # let span = Span::empty(); +/// let diagnostic = Diagnostic::new_error("Something went wrong", span) +/// .with_help("Have you tried turning it on and off again?", Span::empty()); +/// ``` +#[derive(Debug, Clone)] +pub struct Diagnostic { +pub kind: DiagnosticKind, +pub message: String, +pub span: Span, +pub notes: Vec, +pub helps: Vec, +} + +#[derive(Debug, Clone)] +pub struct DiagnosticNote { +pub message: String, +pub span: Span, +} + +#[derive(Debug, Clone)] +pub struct DiagnosticHelp { Review Comment: NM -- I see you already did that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] add manual trigger for extended tests in pull requests [datafusion]
edmondop commented on code in PR #14331: URL: https://github.com/apache/datafusion/pull/14331#discussion_r1933841345 ## .github/workflows/extended.yml: ## @@ -31,14 +31,38 @@ on: push: branches: - main + issue_comment: +types: [created] + +permissions: + pull-requests: write jobs: + # Check issue comment and notify that extended tests are running + check_issue_comment: +name: Check issue comment +runs-on: ubuntu-latest +if: github.event.issue.pull_request && github.event.comment.body == 'run extended tests' +steps: + - uses: actions/github-script@v7 +with: + github-token: ${{secrets.GITHUB_TOKEN}} + script: | +github.rest.issues.createComment({ Review Comment: I wonder if we can create a check in the PR to show that the running the tests is in progress, we can use https://octokit.github.io/rest.js/v21/#checks to create a check if the comment is available, and when the pipeline runs, we can update the check using the update the check ```js octokit.rest.checks.update({ owner, repo, check_run_id, output.summary, output.annotations[].path, output.annotations[].start_line, output.annotations[].end_line, output.annotations[].annotation_level, output.annotations[].message, output.images[].alt, output.images[].image_url, actions[].label, actions[].description, actions[].identifier }) ``` see in the docs that I have linked the "Update a check run" under the checks api wdyt @alamb ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Implement physical optimizer rule to apply type coercion to physical plans [datafusion]
edmondop commented on issue #14324: URL: https://github.com/apache/datafusion/issues/14324#issuecomment-2621615427 Is it possible to do type coercion only at the physical layer? Or does it bring benefits at the logical layer so that we need to do type coercion twice, once at the logical layer for users of the "datafusion frontend" and once at the physical layer for applications like Comet that "skips" the datafusion frontend? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1934090147 ## datafusion/physical-optimizer/src/limit_pushdown.rs: ## @@ -248,7 +247,15 @@ pub fn pushdown_limit_helper( } } else { // Add fetch or a `LimitExec`: -global_state.satisfied = true; +// If the plan's children have limit and the child's limit < parent's limit, we shouldn't change the global state to true, +// because the children limit will be overridden if the global state is changed. +if pushdown_plan +.children() +.iter() +.any(|&child| extract_limit(child).is_some()) +{ +global_state.satisfied = false; +} Review Comment: Got it @xudong963 , the previous logic will always setting true in the global_state.satisfied == false. This logic is keep the false for some cases. I change the logic to more clear that, we only setting to true to exclude the above case. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1934092407 ## datafusion/physical-optimizer/src/limit_pushdown.rs: ## @@ -248,7 +247,15 @@ pub fn pushdown_limit_helper( } } else { // Add fetch or a `LimitExec`: -global_state.satisfied = true; Review Comment: Here is the original logic for setting back to true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Implement Common Subexpression Elimination optimizer rule [datafusion-comet]
andygrove commented on issue #942: URL: https://github.com/apache/datafusion-comet/issues/942#issuecomment-2621952623 The DataFusion PR https://github.com/apache/datafusion/pull/13046 is still waiting for a review. I am adding this issue back onto the 0.6 milestone as a reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Fix DDL generation in case of an empty arguments FUNCTION. [datafusion-sqlparser-rs]
remysaissy opened a new pull request, #1690: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1690 Functions with an empty argument list are properly parsed into the AST but the string representation of such function removes the parenthesis required after the function name, causing an error at the database level. For eg. CREATE OR REPLACE FUNCTION no_arg() RETURNS VOID LANGUAGE plpgsql AS $$ BEGIN DELETE FROM my_table; END; $$ Becomes: CREATE OR REPLACE FUNCTION no_arg RETURNS VOID LANGUAGE plpgsql AS $$ BEGIN DELETE FROM my_table; END; $$ And causes Postgres to reject the query. This PR fixes the string representation in this specific case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]
alamb commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2621967843 I merged up from main to resolve a conflict on this PR as part of my review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] refactor: switch `BooleanBufferBuilder` to `NullBufferBuilder` in single_group_by [datafusion]
Chen-Yuan-Lai opened a new pull request, #14360: URL: https://github.com/apache/datafusion/pull/14360 ## Which issue does this PR close? Closes #14115 . ## Rationale for this change As mentioned in #14115 , several examples in DataFusion codebase still using BooleanBufferBuilder rather than NullBufferBuilder, they should be replaced with NullBufferBuilder for optimization. ## What changes are included in this PR? ## Are these changes tested? ## Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Decimal promotion is not applied consistently [datafusion-comet]
andygrove closed issue #1350: Decimal promotion is not applied consistently URL: https://github.com/apache/datafusion-comet/issues/1350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] bug: Fix NULL handling in array_slice [datafusion]
jkosh44 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1933570686 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -186,6 +187,15 @@ impl PhysicalExpr for ScalarFunctionExpr { .map(|e| e.evaluate(batch)) .collect::>>()?; +if self.fun.signature().null_handling == NullHandling::Propagate +&& args.iter().any( +|arg| matches!(arg, ColumnarValue::Scalar(scalar) if scalar.is_null()), Review Comment: I'm not super confident about this check, how should `ColumnarValue::Array`s be treated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Fix Float and Decimal coercion [datafusion]
findepi commented on PR #14273: URL: https://github.com/apache/datafusion/pull/14273#issuecomment-2621162538 I am OK having DF coercions extensible (https://github.com/apache/datafusion/issues/14296), but first and foremost I want them to be **_optional_**. At ~SDF~ dbt, we create fully resolved and coerced plans, and so DF's coercion rules cannot do any good. For our needs the best would be clear separation of SQL analysis layer, where coercions should exist, and formal plan processing layer, where coercions should no longer exist. This was outlined in https://github.com/apache/datafusion/issues/12723. Regardless of future apt plans and potential customizability via new or existing extension points, we need to decide what is the "out of the box" behavior. We have a rule to follow PostgreSQL in principle and this is what this PR does. I consider it non-controversial. The existing comments revolve our "let's make sure it's properly reviewed" and by now it's obviously is. Will merge tomorrow unless there are actionable objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] bug: Fix NULL handling in array_slice [datafusion]
jkosh44 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1933573232 ## datafusion/functions-nested/src/extract.rs: ## @@ -330,7 +330,8 @@ pub(super) struct ArraySlice { impl ArraySlice { pub fn new() -> Self { Self { -signature: Signature::variadic_any(Volatility::Immutable), +// TODO: This signature should use the actual accepted types, not variadic_any. Review Comment: Just pushed an update to do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Add hook for sharing join state in distributed execution [datafusion]
thinkharderdev opened a new pull request, #12523: URL: https://github.com/apache/datafusion/pull/12523 ## Which issue does this PR close? Closes #12454 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ## Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] support: Date +/plus Int or date_add function [datafusion]
DanCodedThis commented on issue #6876: URL: https://github.com/apache/datafusion/issues/6876#issuecomment-2621860544 @edmondop I think I agree with your statement. - Also, does the new `TypeSignatureClass` + `logical_string()` allow parsing `month` as not a column/keyword, but as `Utf8` without the single quotes? In our own implementation, there is a preprocessing that fixes that in a hacky way. If we later add the Snowflake way to DF, I don't think I will be able to reproduce it here. - I saw the new `date_part` two weeks/week and a half ago on the Apache docs, and they were using `month` not `'month'` in the example, perhaps it was a mistake in the docs that was fixed, or I didn't perceive it correctly, I could swear I saw it. - I tried doing something similar for the aforementioned `date_diff`, but the example that I was referring too was gone by that time I wrote it and tested it. (didn't work) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] equivalence classes: use normalized mapping for projection [datafusion]
ozankabak commented on PR #14327: URL: https://github.com/apache/datafusion/pull/14327#issuecomment-2621860123 I am guessing what is meant was writing an SLT test whose output plan would change if this logic/fix weren't there. Maybe an optimization that wouldn't take place because of the normalization is not done properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Minor: include the number of files run in sqllogictest display [datafusion]
alamb opened a new pull request, #14359: URL: https://github.com/apache/datafusion/pull/14359 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/14355 - Follow on to ## Rationale for this change While testing https://github.com/apache/datafusion/pull/14355 from @findepi I found it hard to see how many files were run ## What changes are included in this PR? Before this PR: ```shell andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test --test sqllogictests -- union Finished `test` profile [unoptimized + debuginfo] target(s) in 0.16s Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-3d953efd822e1e41) Completed in 0 seconds ``` After this PR, it reports `Completed 2 tests`: ```shell andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test --test sqllogictests -- union Finished `test` profile [unoptimized + debuginfo] target(s) in 0.16s Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-3d953efd822e1e41) Completed 2 tests in 0 seconds ``` ## Are these changes tested? I tested them manually ## Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Minor: include the number of files run in sqllogictest display [datafusion]
alamb commented on code in PR #14359: URL: https://github.com/apache/datafusion/pull/14359#discussion_r1934030696 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -491,9 +497,7 @@ impl TestFile { } } -fn read_test_files<'a>( -options: &'a Options, -) -> Result + 'a>> { +fn read_test_files(options: &Options) -> Result> { Review Comment: This function was already basically returning a `Vec` so there didn't seem to be any need to have the iterator indirection -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Restore ability to run single SLT file [datafusion]
alamb commented on PR #14355: URL: https://github.com/apache/datafusion/pull/14355#issuecomment-2621874234 > The functionality was probably accidentally lost probably in #13936 BTW I think it is currently possible to run a single file. For example this will run both `union.slt` and `pg_compat_union.slt`. ```shell cargo test --test sqllogictests -- union.slt ``` However, it only matches on the file name (not the entire path) so using `test_files/union.slt` does not work as you observe > I made a small PR to your branch to show the # of files tested, verifying that LOL we did the same thing @Omega359 - https://github.com/apache/datafusion/pull/14359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] test: attempt to analyze boundaries for select columns [datafusion]
ozankabak commented on PR #14308: URL: https://github.com/apache/datafusion/pull/14308#issuecomment-2621875650 Seems like we need to make certain view types comparable with their parent types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore(deps): bump serde_json from 1.0.137 to 1.0.138 in /datafusion-cli [datafusion]
alamb merged PR #14351: URL: https://github.com/apache/datafusion/pull/14351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Fix UNION field nullability tracking [datafusion]
alamb commented on PR #14356: URL: https://github.com/apache/datafusion/pull/14356#issuecomment-2621877094 @wiedld can you also please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Feature Unifying source execution plans [datafusion]
ozankabak commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2621880797 Thanks for all the review. We will focus on this again next week, improve the PR, and ask for more thoughts from the community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore(deps): bump tempfile from 3.15.0 to 3.16.0 in /datafusion-cli [datafusion]
alamb merged PR #14350: URL: https://github.com/apache/datafusion/pull/14350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Decimal promotion is not applied consistently [datafusion-comet]
andygrove commented on issue #1350: URL: https://github.com/apache/datafusion-comet/issues/1350#issuecomment-2621887848 nm, I see now that `DecimalPrecision.promote` calls `expr.transformUp` and transforms the whole expression tree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1933509296 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -5182,3 +5177,32 @@ async fn register_non_parquet_file() { "1.json' does not match the expected extension '.parquet'" ); } + +// Test issue: https://github.com/apache/datafusion/issues/14204 Review Comment: Thank you @xudong963 for review, it makes sense to remove this UT. Addressed in latest PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] bug: Fix NULL handling in array_slice [datafusion]
jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2621158643 > ```rust > /// Any Null input causes the function to return Null. > Propogate, > ``` I've updated the PR to use this. I just realized though that window and aggregate functions (and maybe other function types?) completely ignore this field in the `Signature`. That feels wrong, but I'll have to think about what the right behavior should be. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Add `DataFrame::map` utility .map function for DataFrame for modifying internal LogicalPlan [datafusion]
phisn commented on issue #14317: URL: https://github.com/apache/datafusion/issues/14317#issuecomment-2621740237 @timsaucer My specific use case is what @alamb described. The problem with the transform approach is that I am forced to `.logical_plan().clone` as well as create a new `DataFrame` each time, since it's about directly manipulating the `LogicalPlan`. The problem arises with applying extensions and using functions that are not always called using a `DataFrame`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Ensure `to_timestamp` behaves consistently with PostgreSQL [datafusion]
logan-keede commented on issue #13351: URL: https://github.com/apache/datafusion/issues/13351#issuecomment-2621744869 > That is a very good question. There must have been I reason I didn't do that when I coded this up but right now I can't recall why it would have been. Let me think about that for a day and see if I can recall a good reason. I think I found the reason, According to documentation >`Optional Chrono format strings to use to parse the expression. Formats will be tried in the order they appear with the first successful one being returned. If none of the formats successfully parse the expression an error will be returned.` So, even if a format is wrong it should not throw error till it has tried all the formats and if it does then it just gives error ` no format matches`. So, I think we should leave that as it is, however if we really want to improve UX we can generate a warning, that `xyz` is not a valid chrono format. Let me know if that makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[I] Enable TableScan to return multiple arbitrary table references [datafusion]
phisn opened a new issue, #14358: URL: https://github.com/apache/datafusion/issues/14358 ### Is your feature request related to a problem or challenge? `TableProvider` has a type which indicates which type of table it contains. This can be a view. Therefore it makes sense to encode materialised views using `TableProvider` and use `TableScan`. Currently `TableScan` does force the use of one or no `TableReference`. Since a view can be derived from multiple tables having the same column names it can happen that two columns can only be distinguished by their respective `TableReference`. ### Describe the solution you'd like `TableScan` should contain unique `table_name: TableReference` for each column. ### Describe alternatives you've considered Even through `TableScan` contains a `DFSchema` which contains a list of `TableReference`, it is not possible to use it since some optimisation rules recreate the `TableScan` with the wrong schema. 1. We change the schema in `TableProvider` to `DFSchema` instead of `Schema`. I did not test if having multiple same columns in `TableProvider` does cause issues which would force this option. As in the execution only the column position is important, this seems unlikely. 2. In my current implementation I am inlining table names using `.flat_name()` and outline them again using a analyser rule. Since I can't use `TableScan` I am using a custom Extension which is resolved by the analyser too. 3. Leaving this to the user or creating some new alternative `LogicalPlan` node is also possible. ### Additional context This feature is slightly related to #14310. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Fix UNION field nullability tracking [datafusion]
findepi commented on PR #14356: URL: https://github.com/apache/datafusion/pull/14356#issuecomment-2621752435 cc @felipecrv, @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Move all array_* serde to new framework, use correct INCOMPAT config [datafusion-comet]
andygrove commented on code in PR #1349: URL: https://github.com/apache/datafusion-comet/pull/1349#discussion_r1933951486 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -3490,3 +3437,6 @@ trait CometExpressionSerde { inputs: Seq[Attribute], binding: Boolean): Option[ExprOuterClass.Expr] } + +/** Marker trait for an expression that is not guaranteed to be 100% compatible with Spark */ +trait IncompatExpr {} Review Comment: Later, we could add a method to this trait to provide the reason why an expression is incompatible and this could be used to generate documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Restore ability to run single SLT file [datafusion]
Omega359 commented on PR #14355: URL: https://github.com/apache/datafusion/pull/14355#issuecomment-2621782574 I made a small PR to your branch to show the # of files tested, verifying that `cargo test --test sqllogictests -- union.slt` runs 2 files whereas `cargo test --test sqllogictests -- test_files/union.slt` runs 1 file. I also verified that the sqlite tests ran without issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: metadata columns [datafusion]
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2621231579 > > Then we cannot make sure metadata columns always have same column index in different tables > > Not quite understand the reason, with modified metadata, meta column could be considered as the same column like the normal one inside `Schema`. **Field index** for accessing field in `Schema` in contiguous like before, indices are changed if the schema is changed. the question whether treat metadata column like normal column, I want to listen more ideas. by the way, at the beginning, I didn't use offset to separate metadata column and normal column. you said that it's error prone. I think when we do schema migration, non-stable column index is really error prone. did I misunderstand something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] Add RETURNS TABLE() support for CREATE FUNCTION in Postgresql [datafusion-sqlparser-rs]
remysaissy opened a new pull request, #1687: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1687 Currently CREATE FUNCTION doesn't support the RETURNS TABLE datatype (https://www.postgresql.org/docs/15/sql-createfunction.html). This PR adds it for the PostgresSQL and Generic dialect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] BigQuery: Fix column identifier reserved keywords list [datafusion-sqlparser-rs]
alamb merged PR #1678: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] support: Date +/plus Int or date_add function [datafusion]
edmondop commented on issue #6876: URL: https://github.com/apache/datafusion/issues/6876#issuecomment-2621668103 @DanCodedThis and @alamb I looked at [snowflake documentation](https://docs.snowflake.com/en/sql-reference/functions/dateadd), [Spark documentation](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.date_add.html) and [postgres documentation](https://www.postgresql.org/docs/current/functions-datetime.html) - Postgres provide a generic API that's based on Intervals - Snowflake provides a simpler API where you provide the interval as a combination of timeUnit * number of units - Spark supports only days as a timeunit Implementing the Postgres approach seems the most flexible solution, and then we can implement a Snowflake dialect maybe to parse a snowflake-like UDF invocation into the Postgres-style? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932661673 ## tests/sqlparser_bigquery.rs: ## @@ -2214,6 +2214,30 @@ fn test_select_as_value() { assert_eq!(Some(ValueTableMode::AsValue), select.value_table_mode); } +#[test] +fn test_typed_strings() { +let expr = bigquery().verified_expr(r#"JSON """{"foo":"bar's"}#); +assert_eq!( +Expr::TypedString { +data_type: DataType::JSON, +value: Value::TripleDoubleQuotedString(r#"{"foo":"bar's"}"#.into()) +}, +expr +); + +let expr = bigquery().verified_expr(r#"JSON '''{"foo":"bar's"}'''"#); +if let Expr::TypedString { data_type, value } = expr { +assert_eq!(DataType::JSON, data_type); +let string_value: String = value.into(); +assert_eq!(r#"{"foo":"bar's"}"#, string_value); +} + +// SingleQuotedString and DoubleQuotedString are currently not correctly formatted by `fmt::Display for Value`. +// BigQuery does not support double escaping, should be \' or \" instead. +//bigquery().verified_expr(r#"JSON '{"foo":"bar\'s"}'"#); +//bigquery().verified_expr(r#"JSON "{\"foo\":\"bar's\"}""#); Review Comment: The issue I ran into is this SQL `SELECT JSON '{"foo":"bar\'s"}'` gets parsed and then re-printed as `SELECT JSON '{"foo":"bar''s"}'` (note the `''` instead of `\'` – a syntax error in BQ). But let me do some more tests – for now I'll take this out of this PR, will create a separate issue/PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]
xudong963 commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1933897333 ## datafusion/physical-optimizer/src/limit_pushdown.rs: ## @@ -248,7 +247,15 @@ pub fn pushdown_limit_helper( } } else { // Add fetch or a `LimitExec`: -global_state.satisfied = true; +// If the plan's children have limit and the child's limit < parent's limit, we shouldn't change the global state to true, +// because the children limit will be overridden if the global state is changed. +if pushdown_plan +.children() +.iter() +.any(|&child| extract_limit(child).is_some()) +{ +global_state.satisfied = false; +} Review Comment: I think the logic is strange, if comes to the else branch(248 lines), it means `global_state.satisfied == false;`, then here(257 lines) sets `global_state.satisfied = false;` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: pass scale to DF round in spark_round [datafusion-comet]
cht42 commented on code in PR #1341: URL: https://github.com/apache/datafusion-comet/pull/1341#discussion_r1933876335 ## native/spark-expr/src/math_funcs/round.rs: ## @@ -85,9 +85,10 @@ pub fn spark_round( let (precision, scale) = get_precision_scale(data_type); make_decimal_array(array, precision, scale, &f) } -DataType::Float32 | DataType::Float64 => { -Ok(ColumnarValue::Array(round(&[Arc::clone(array)])?)) -} +DataType::Float32 | DataType::Float64 => Ok(ColumnarValue::Array(round(&[ Review Comment: added tests for floats -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Remove redundant processing from exprToProtoInternal [datafusion-comet]
codecov-commenter commented on PR #1351: URL: https://github.com/apache/datafusion-comet/pull/1351#issuecomment-2622069957 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1351?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 57.29%. Comparing base [(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) to head [(`a51967a`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/a51967aab4d3c322f1aa0fb97e395fe83dbc2ff2?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). > Report is 11 commits behind head on main. Additional details and impacted files ```diff @@ Coverage Diff @@ ## main#1351 +/- ## + Coverage 56.12% 57.29% +1.16% Complexity 976 976 Files 119 120 +1 Lines 1174311896 +153 Branches 2251 2252 +1 + Hits 6591 6816 +225 + Misses 4012 3918 -94 - Partials 1140 1162 +22 ``` [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1351?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] moving memory.rs out of datafusion/core [datafusion]
comphead commented on PR #14332: URL: https://github.com/apache/datafusion/pull/14332#issuecomment-2622072090 > if we go with that we should change name of every similar file in a similar manner (atleast `async.rs`). That is true, to be consistent it needs to be fixed for all files which is not part of this PR for sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]
alamb commented on code in PR #14207: URL: https://github.com/apache/datafusion/pull/14207#discussion_r1934089417 ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -932,12 +932,16 @@ fn add_hash_on_top( /// # Arguments /// /// * `input`: Current node. +/// * `fetch`: Possible fetch value Review Comment: I think it would help here to describe what is happening with the `fetch` argument as well and when it is updated Maybe something like the followig ```suggestion /// * `fetch`: Possible fetch value. If a `SortPreservingMerge` is created /// its fetch is set to this value and `fetch` is set to `None` ``` ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3173,3 +3182,78 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> { Ok(()) } + +#[tokio::test] +async fn apply_enforce_distribution_multiple_times() -> Result<()> { +// Create a configuration +let config = SessionConfig::new(); +let ctx = SessionContext::new_with_config(config); + +// Create table schema and data +let sql = "CREATE EXTERNAL TABLE aggregate_test_100 ( +c1 VARCHAR NOT NULL, +c2 TINYINT NOT NULL, +c3 SMALLINT NOT NULL, +c4 SMALLINT, +c5 INT, +c6 BIGINT NOT NULL, +c7 SMALLINT NOT NULL, +c8 INT NOT NULL, +c9 BIGINT UNSIGNED NOT NULL, +c10 VARCHAR NOT NULL, +c11 FLOAT NOT NULL, +c12 DOUBLE NOT NULL, +c13 VARCHAR NOT NULL +) +STORED AS CSV +LOCATION '../../testing/data/csv/aggregate_test_100.csv' +OPTIONS ('format.has_header' 'true')"; + +ctx.sql(sql).await?; + +let df = ctx.sql("SELECT * FROM(SELECT * FROM aggregate_test_100 UNION ALL SELECT * FROM aggregate_test_100) ORDER BY c13 LIMIT 5").await?; +let logical_plan = df.logical_plan().clone(); +let analyzed_logical_plan = ctx.state().analyzer().execute_and_check( +logical_plan, +ctx.state().config_options(), +|_, _| (), +)?; + +let optimized_logical_plan = ctx.state().optimizer().optimize( +analyzed_logical_plan, +&ctx.state(), +|_, _| (), +)?; + +let optimizers: Vec> = vec![ +Arc::new(OutputRequirements::new_add_mode()), +Arc::new(EnforceDistribution::new()), +Arc::new(EnforceSorting::new()), +Arc::new(ProjectionPushdown::new()), +Arc::new(CoalesceBatches::new()), +Arc::new(EnforceDistribution::new()), // -- Add enforce distribution rule again +Arc::new(OutputRequirements::new_remove_mode()), +Arc::new(ProjectionPushdown::new()), +Arc::new(LimitPushdown::new()), +Arc::new(SanityCheckPlan::new()), +]; + +let planner = DefaultPhysicalPlanner::default(); +let session_state = SessionStateBuilder::new() +.with_config(ctx.copied_config()) +.with_default_features() +.with_physical_optimizer_rules(optimizers) +.build(); +let optimized_physical_plan = planner +.create_physical_plan(&optimized_logical_plan, &session_state) +.await?; + +let mut results = optimized_physical_plan +.execute(0, ctx.task_ctx().clone()) +.unwrap(); + +let batch = results.next().await.unwrap()?; +// With the fix of https://github.com/apache/datafusion/pull/14207, the number of rows will be 10 Review Comment: ```suggestion // Without the fix of https://github.com/apache/datafusion/pull/14207, the number of rows will be 10 ``` ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -1362,6 +1383,21 @@ pub fn ensure_distribution( plan.with_new_children(children_plans)? }; +// If `fetch` was not consumed, it means that there was `SortPreservingMergeExec` with fetch before +// It was removed by `remove_dist_changing_operators` +// and we need to add it back. +if fetch.is_some() { +plan = Arc::new( +SortPreservingMergeExec::new( +plan.output_ordering() +.unwrap_or(&LexOrdering::default()) +.clone(), +plan, +) +.with_fetch(fetch.take()), +) Review Comment: Why does this add back a SortPreservingMerge without sort exprs? Wouldn't it be better to use a `GlobalLimitExec` or something? ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3172,3 +3181,78 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> { Ok(()) } + +#[tokio::test] +async fn apply_enforce_distribution_multiple_times() -> Result<()> { Review Comment: I think it is nice to have this "end to end" style test, but given the amount of code changed I think it is important to have more "unit style" tests otherwise it is hard to understand how general this fix is (or if it just works
Re: [PR] moving memory.rs out of datafusion/core [datafusion]
comphead merged PR #14332: URL: https://github.com/apache/datafusion/pull/14332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Restore ability to run single SLT file [datafusion]
alamb merged PR #14355: URL: https://github.com/apache/datafusion/pull/14355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Restore ability to run single SLT file [datafusion]
alamb commented on PR #14355: URL: https://github.com/apache/datafusion/pull/14355#issuecomment-2622078034 Thanks again @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] refactor: switch `BooleanBufferBuilder` to `NullBufferBuilder` in binary_map [datafusion]
comphead merged PR #14341: URL: https://github.com/apache/datafusion/pull/14341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore(deps): bump home from 0.5.9 to 0.5.11 in /datafusion-cli [datafusion]
alamb merged PR #14257: URL: https://github.com/apache/datafusion/pull/14257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] moving memory.rs out of datafusion/core [datafusion]
alamb commented on PR #14332: URL: https://github.com/apache/datafusion/pull/14332#issuecomment-2622083005 😍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Move all array_* serde to new framework, use correct INCOMPAT config [datafusion-comet]
comphead commented on code in PR #1349: URL: https://github.com/apache/datafusion-comet/pull/1349#discussion_r1934180368 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -605,6 +605,15 @@ object CometConf extends ShimCometConf { .booleanConf .createWithDefault(false) + val COMET_EXPR_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] = +conf("spark.comet.expression.allowIncompatible") + .doc( +"Comet is not currently fully compatible with Spark for all expressions. " + + "Set this config to true to allow them anyway. See compatibility guide " + Review Comment: ```suggestion "Set this config to true to allow them anyway. See Compatibility Guide " + ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Move all array_* serde to new framework, use correct INCOMPAT config [datafusion-comet]
comphead commented on code in PR #1349: URL: https://github.com/apache/datafusion-comet/pull/1349#discussion_r1934182431 ## docs/source/user-guide/configs.md: ## @@ -64,6 +64,7 @@ Comet provides the following configuration settings. | spark.comet.explain.native.enabled | When this setting is enabled, Comet will provide a tree representation of the native query plan before execution and again after execution, with metrics. | false | | spark.comet.explain.verbose.enabled | When this setting is enabled, Comet will provide a verbose tree representation of the extended information. | false | | spark.comet.explainFallback.enabled | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false | +| spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. See compatibility guide for more information. | false | Review Comment: ```suggestion | spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. See Compatibility Guide for more information. (docs/user-guide/compatibility.md) | false | ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Move all array_* serde to new framework, use correct INCOMPAT config [datafusion-comet]
comphead commented on code in PR #1349: URL: https://github.com/apache/datafusion-comet/pull/1349#discussion_r1934181473 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -605,6 +605,15 @@ object CometConf extends ShimCometConf { .booleanConf .createWithDefault(false) + val COMET_EXPR_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] = +conf("spark.comet.expression.allowIncompatible") + .doc( +"Comet is not currently fully compatible with Spark for all expressions. " + + "Set this config to true to allow them anyway. See compatibility guide " + + "for more information.") Review Comment: ```suggestion "for more information. (docs/user-guide/compatibility.md)") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] moving memory.rs out of datafusion/core [datafusion]
alamb commented on code in PR #14332: URL: https://github.com/apache/datafusion/pull/14332#discussion_r1934183938 ## datafusion/catalog/src/lib.rs: ## @@ -15,6 +15,16 @@ // specific language governing permissions and limitations // under the License. +//! Interfaces and default implementations of catalogs and schemas. +//! +//! Implementations +//! * Simple memory based catalog: [`MemoryCatalogProviderList`], [`MemoryCatalogProvider`], [`MemorySchemaProvider`] Review Comment: I think this looks good One way to review this is to run `cargo --doc --open` and then looking at how it is rendered -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Move all array_* serde to new framework, use correct INCOMPAT config [datafusion-comet]
comphead commented on code in PR #1349: URL: https://github.com/apache/datafusion-comet/pull/1349#discussion_r1934186350 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -929,6 +929,19 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim binding: Boolean): Option[Expr] = { SQLConf.get +def convert(handler: CometExpressionSerde): Option[Expr] = { + handler match { +case _: IncompatExpr if !CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.get() => + withInfo( +expr, +s"$expr is not fully compatible with Spark. To enable it anyway, set " + Review Comment: ```suggestion s"$expr is not fully compatible with Spark, see details in Compatibility Guide. To enable it anyway, set " + ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Move all array_* serde to new framework, use correct INCOMPAT config [datafusion-comet]
comphead commented on code in PR #1349: URL: https://github.com/apache/datafusion-comet/pull/1349#discussion_r1934192543 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2371,83 +2384,17 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim withInfo(expr, "unsupported arguments for GetArrayStructFields", child) None } - case expr: ArrayRemove => CometArrayRemove.convert(expr, inputs, binding) - case expr if expr.prettyName == "array_contains" => -createBinaryExpr( - expr, - expr.children(0), - expr.children(1), - inputs, - binding, - (builder, binaryExpr) => builder.setArrayContains(binaryExpr)) - case _ if expr.prettyName == "array_append" => -createBinaryExpr( - expr, - expr.children(0), - expr.children(1), - inputs, - binding, - (builder, binaryExpr) => builder.setArrayAppend(binaryExpr)) - case _ if expr.prettyName == "array_intersect" => -createBinaryExpr( - expr, - expr.children(0), - expr.children(1), - inputs, - binding, - (builder, binaryExpr) => builder.setArrayIntersect(binaryExpr)) - case ArrayJoin(arrayExpr, delimiterExpr, nullReplacementExpr) => -val arrayExprProto = exprToProto(arrayExpr, inputs, binding) -val delimiterExprProto = exprToProto(delimiterExpr, inputs, binding) - -if (arrayExprProto.isDefined && delimiterExprProto.isDefined) { - val arrayJoinBuilder = nullReplacementExpr match { -case Some(nrExpr) => - val nullReplacementExprProto = exprToProto(nrExpr, inputs, binding) - ExprOuterClass.ArrayJoin -.newBuilder() -.setArrayExpr(arrayExprProto.get) -.setDelimiterExpr(delimiterExprProto.get) -.setNullReplacementExpr(nullReplacementExprProto.get) -case None => - ExprOuterClass.ArrayJoin -.newBuilder() -.setArrayExpr(arrayExprProto.get) -.setDelimiterExpr(delimiterExprProto.get) - } - Some( -ExprOuterClass.Expr - .newBuilder() - .setArrayJoin(arrayJoinBuilder) - .build()) -} else { - val exprs: List[Expression] = nullReplacementExpr match { -case Some(nrExpr) => List(arrayExpr, delimiterExpr, nrExpr) -case None => List(arrayExpr, delimiterExpr) - } - withInfo(expr, "unsupported arguments for ArrayJoin", exprs: _*) - None -} - case ArraysOverlap(leftArrayExpr, rightArrayExpr) => -if (CometConf.COMET_CAST_ALLOW_INCOMPATIBLE.get()) { - createBinaryExpr( -expr, -leftArrayExpr, -rightArrayExpr, -inputs, -binding, -(builder, binaryExpr) => builder.setArraysOverlap(binaryExpr)) -} else { - withInfo( -expr, -s"$expr is not supported yet. To enable all incompatible casts, set " + - s"${CometConf.COMET_CAST_ALLOW_INCOMPATIBLE.key}=true") - None -} + case _: ArrayRemove => convert(CometArrayRemove) + case _: ArrayContains => convert(CometArrayContains) + case _ if expr.prettyName == "array_append" => convert(CometArrayAppend) Review Comment: ```suggestion // Function introduced in 3.4.0. Refer by name to provide compatibility with older Spark builds case _ if expr.prettyName == "array_append" => convert(CometArrayAppend) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: metadata columns [datafusion]
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2620931040 Yep from an initial impression I agree with you @jayzhan211 that does seem like it should work. Then you just need to hook into wherever select / projections are evaluated and make sure if a column isn't explicitly mentioned by name and it has that metadata it is not included in the output -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Fix incorrect searched CASE optimization [datafusion]
findepi commented on code in PR #14349: URL: https://github.com/apache/datafusion/pull/14349#discussion_r1933432132 ## datafusion/sqllogictest/test_files/case.slt: ## @@ -289,12 +289,22 @@ query B select case when a=1 then false end from foo; false -false -false -false -false -false Review Comment: These results were not correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Improve speed of `median` by implementing special `GroupsAccumulator` [datafusion]
Rachelint commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1933435620 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,212 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulator accumulates the raw input values +/// +/// For calculating the accurate medians of groups, we need to store all values +/// of groups before final evaluation. +/// So values in each group will be stored in a `Vec`, and the total group values +/// will be actually organized as a `Vec>`. +/// +#[derive(Debug)] +struct MedianGroupsAccumulator { +data_type: DataType, +group_values: Vec>, +} + +impl MedianGroupsAccumulator { +pub fn new(data_type: DataType) -> Self { +Self { +data_type, +group_values: Vec::new(), +} +} +} + +impl GroupsAccumulator for MedianGroupsAccumulator { +fn update_batch( +&mut self, +values: &[ArrayRef], +group_indices: &[usize], +opt_filter: Option<&BooleanArray>, +total_num_groups: usize, +) -> Result<()> { +assert_eq!(values.len(), 1, "single argument to update_batch"); +let values = values[0].as_primitive::(); + +// Push the `not nulls + not filtered` row into its group +self.group_values.resize(total_num_groups, Vec::new()); +accumulate( +group_indices, +values, +opt_filter, +|group_index, new_value| { +self.group_values[group_index].push(new_value); +}, +); + +Ok(()) +} + +fn merge_batch( +&mut self, +values: &[ArrayRef], +group_indices: &[usize], +// Since aggregate filter should be applied in partial stage, in final stage there should be no filter +_opt_filter: Option<&BooleanArray>, +total_num_groups: usize, +) -> Result<()> { +assert_eq!(values.len(), 1, "one argument to merge_batch"); + +// The merged values should be organized like as a `ListArray` which is nullable +// (input with nulls usually generated from `convert_to_state`), but `inner array` of +// `ListArray` is `non-nullable`. +// +// Following is the possible and impossible input `values`: +// +// # Possible values +// ```text +// group 0: [1, 2, 3] +// group 1: null (list array is nullable) +// group 2: [6, 7, 8] +// ... +// group n: [...] +// ``` +// +// # Impossible values +// ```text +// group x: [1, 2, null] (values in list array is non-nullable) +// ``` +// +let input_group_values = values[0].as_list::(); + +// Ensure group values big enough +self.group_values.resize(total_num_groups, Vec::new()); + +// Extend values to related groups +// TODO: avoid using iterator of the `ListArray`, this will lead to +// many calls of `slice` of its ``inner array`, and `slice` is not +// so efficient(due to the calculation of `null_count` for each `slice`). +group_indices +.iter() +.zip(input_group_values.iter()) +.for_each(|(&group_index, values_opt)| { +if let Some(values) = values_opt { +let values = values.as_primitive::(); + self.group_values[group_index].extend(values.values().iter()); +} +}); + +Ok(()) +} + +fn state(&mut self, emit_to: EmitTo) -> Result> { +// Emit values +let emit_group_values = emit_to.take_needed(&mut self.group_values); + +// Build offsets +let mut offsets = Vec::with_capacity(self.group_values.len() + 1); +offsets.push(0); +let mut cur_len = 0_i32; +for group_value in &emit_group_values { +cur_len += group_value.len() as i32; +offsets.push(cur_len); +} +// TODO: maybe we can use `OffsetBuffer::new_unchecked` like what in `convert_to_state`, +// but safety should be considered more carefully here(and I am not sure if it can get +// performance improvement when we introduce checks to keep the safety...). +// +// Can see more details in: +// https://github.com/apache/datafusion/pull/13681#discussion_r1931209791 +// +let offsets = OffsetBuffer::new(ScalarBuffer::from(offsets)); + +// Build inner array +let flatten_group_values = +emit_group_values.into_iter().flatten().collect::>(); +let group_values_array = +PrimitiveArraynew(ScalarBuffer::from(flatten_group_values), None) +.with_data_type(self.data_type.clone()); + +// Build the result list array +let result_list_array = ListArray::new( +
Re: [PR] Extend lambda support for ClickHouse, DuckDB and Generic dialects [datafusion-sqlparser-rs]
gstvg commented on code in PR #1686: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1686#discussion_r1933397382 ## src/dialect/mod.rs: ## @@ -340,12 +340,21 @@ pub trait Dialect: Debug + Any { /// Returns true if the dialect supports lambda functions, for example: /// /// ```sql -/// SELECT transform(array(1, 2, 3), x -> x + 1); -- returns [2,3,4] +/// SELECT transform(array(1, 2, 3), (x) -> x + 1); -- returns [2,3,4] /// ``` fn supports_lambda_functions(&self) -> bool { false } +/// Returns true if the dialect supports lambda functions without parentheses for a single argument, for example: Review Comment: cc @jmhain -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Improve speed of `median` by implementing special `GroupsAccumulator` [datafusion]
Rachelint commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1933435620 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,212 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulator accumulates the raw input values +/// +/// For calculating the accurate medians of groups, we need to store all values +/// of groups before final evaluation. +/// So values in each group will be stored in a `Vec`, and the total group values +/// will be actually organized as a `Vec>`. +/// +#[derive(Debug)] +struct MedianGroupsAccumulator { +data_type: DataType, +group_values: Vec>, +} + +impl MedianGroupsAccumulator { +pub fn new(data_type: DataType) -> Self { +Self { +data_type, +group_values: Vec::new(), +} +} +} + +impl GroupsAccumulator for MedianGroupsAccumulator { +fn update_batch( +&mut self, +values: &[ArrayRef], +group_indices: &[usize], +opt_filter: Option<&BooleanArray>, +total_num_groups: usize, +) -> Result<()> { +assert_eq!(values.len(), 1, "single argument to update_batch"); +let values = values[0].as_primitive::(); + +// Push the `not nulls + not filtered` row into its group +self.group_values.resize(total_num_groups, Vec::new()); +accumulate( +group_indices, +values, +opt_filter, +|group_index, new_value| { +self.group_values[group_index].push(new_value); +}, +); + +Ok(()) +} + +fn merge_batch( +&mut self, +values: &[ArrayRef], +group_indices: &[usize], +// Since aggregate filter should be applied in partial stage, in final stage there should be no filter +_opt_filter: Option<&BooleanArray>, +total_num_groups: usize, +) -> Result<()> { +assert_eq!(values.len(), 1, "one argument to merge_batch"); + +// The merged values should be organized like as a `ListArray` which is nullable +// (input with nulls usually generated from `convert_to_state`), but `inner array` of +// `ListArray` is `non-nullable`. +// +// Following is the possible and impossible input `values`: +// +// # Possible values +// ```text +// group 0: [1, 2, 3] +// group 1: null (list array is nullable) +// group 2: [6, 7, 8] +// ... +// group n: [...] +// ``` +// +// # Impossible values +// ```text +// group x: [1, 2, null] (values in list array is non-nullable) +// ``` +// +let input_group_values = values[0].as_list::(); + +// Ensure group values big enough +self.group_values.resize(total_num_groups, Vec::new()); + +// Extend values to related groups +// TODO: avoid using iterator of the `ListArray`, this will lead to +// many calls of `slice` of its ``inner array`, and `slice` is not +// so efficient(due to the calculation of `null_count` for each `slice`). +group_indices +.iter() +.zip(input_group_values.iter()) +.for_each(|(&group_index, values_opt)| { +if let Some(values) = values_opt { +let values = values.as_primitive::(); + self.group_values[group_index].extend(values.values().iter()); +} +}); + +Ok(()) +} + +fn state(&mut self, emit_to: EmitTo) -> Result> { +// Emit values +let emit_group_values = emit_to.take_needed(&mut self.group_values); + +// Build offsets +let mut offsets = Vec::with_capacity(self.group_values.len() + 1); +offsets.push(0); +let mut cur_len = 0_i32; +for group_value in &emit_group_values { +cur_len += group_value.len() as i32; +offsets.push(cur_len); +} +// TODO: maybe we can use `OffsetBuffer::new_unchecked` like what in `convert_to_state`, +// but safety should be considered more carefully here(and I am not sure if it can get +// performance improvement when we introduce checks to keep the safety...). +// +// Can see more details in: +// https://github.com/apache/datafusion/pull/13681#discussion_r1931209791 +// +let offsets = OffsetBuffer::new(ScalarBuffer::from(offsets)); + +// Build inner array +let flatten_group_values = +emit_group_values.into_iter().flatten().collect::>(); +let group_values_array = +PrimitiveArraynew(ScalarBuffer::from(flatten_group_values), None) +.with_data_type(self.data_type.clone()); + +// Build the result list array +let result_list_array = ListArray::new( +
[PR] chore(deps): bump tempfile from 3.15.0 to 3.16.0 in /datafusion-cli [datafusion]
dependabot[bot] opened a new pull request, #14350: URL: https://github.com/apache/datafusion/pull/14350 Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.15.0 to 3.16.0. Changelog Sourced from https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md";>tempfile's changelog. 3.16.0 Update getrandom to 0.3.0 (thanks to https://github.com/paolobarbolini";>@paolobarbolini). Allow windows-sys versions 0.59.x in addition to 0.59.0 (thanks https://github.com/ErichDonGubler";>@ErichDonGubler). Improved security documentation (thanks to https://github.com/n0toose";>@n0toose for collaborating with me on this). Commits https://github.com/Stebalien/tempfile/commit/6ecd9f197104c9edc74ba6ad7478a5289526adb1";>6ecd9f1 chore: release 3.16.0 https://github.com/Stebalien/tempfile/commit/3693c016b50cb43331966c80a9020e6d5cab709f";>3693c01 build: upgrade getrandom to v0.3.0 (https://redirect.github.com/Stebalien/tempfile/issues/320";>#320) https://github.com/Stebalien/tempfile/commit/a4596e5d57359e43734407c283788b1d7720bf54";>a4596e5 build: allow windows-sys 0.59 (https://redirect.github.com/Stebalien/tempfile/issues/319";>#319) https://github.com/Stebalien/tempfile/commit/3ac6d4eb04f4d719c4aadfb2328b9d0d80afcddf";>3ac6d4e chore: fix clippy lifetime warning (https://redirect.github.com/Stebalien/tempfile/issues/318";>#318) https://github.com/Stebalien/tempfile/commit/3a722e86baad70464e25348379c11f68b011a82a";>3a722e8 docs: improve & expand security documentation (https://redirect.github.com/Stebalien/tempfile/issues/317";>#317) See full diff in https://github.com/Stebalien/tempfile/compare/v3.15.0...v3.16.0";>compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] chore(deps): bump serde_json from 1.0.137 to 1.0.138 in /datafusion-cli [datafusion]
dependabot[bot] opened a new pull request, #14351: URL: https://github.com/apache/datafusion/pull/14351 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.137 to 1.0.138. Release notes Sourced from https://github.com/serde-rs/json/releases";>serde_json's releases. v1.0.138 Documentation improvements Commits https://github.com/serde-rs/json/commit/c916099147f0864c158bfffaf6d74870a64b16ee";>c916099 Release 1.0.138 https://github.com/serde-rs/json/commit/dc29e4815d8b99d48f43d73638819b25e7cd19c8";>dc29e48 Move BufReader to caller https://github.com/serde-rs/json/commit/29122f9ed796712c098a1cd614f207bd9d1b2ccc";>29122f9 Sort imports from PR 1237 https://github.com/serde-rs/json/commit/d33c1b527e77d29ed9a6c2719d8aaac424e85357";>d33c1b5 Merge pull request https://redirect.github.com/serde-rs/json/issues/1237";>#1237 from JonathanBrouwer/master https://github.com/serde-rs/json/commit/8c2d8004b2b873772dbeadd5ad3f96a185d329df";>8c2d800 Add more warnings to apply buffering on docs of affected functions https://github.com/serde-rs/json/commit/65bbd1aa2d0c9aca8347ba5b963e2f8658ab2d42";>65bbd1a Fix example of from_reader not applying buffering when it should https://github.com/serde-rs/json/commit/87f78da0f57a5bc6c875e56357bc9761558a3ef9";>87f78da More precise gitignore patterns https://github.com/serde-rs/json/commit/4134f119c025afa0f57f6b52b66def5c69db0ae6";>4134f11 Remove *.sw[po] from gitignore https://github.com/serde-rs/json/commit/c7626dbac286ddf54aa120b3f3e7c9ebb4804af7";>c7626db Remove **/*.rs.bk from project-specific gitignore See full diff in https://github.com/serde-rs/json/compare/v1.0.137...v1.0.138";>compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[I] Incorrect result for IS NOT NULL predicate over UNION ALL query [datafusion]
findepi opened a new issue, #14352: URL: https://github.com/apache/datafusion/issues/14352 ### Describe the bug Incorrect result for IS NOT NULL predicate over UNION ALL query ### To Reproduce ```sql SELECT a, a IS NOT NULL FROM ( SELECT 'foo' AS a, 3 AS b UNION ALL SELECT NULL AS a, 4 AS b ) ``` DF ``` +--+---+ | a| a IS NOT NULL | +--+---+ | NULL | true | 😱 | foo | true | +--+---+ ``` ### Expected behavior `NULL` valued checked with `IS NOT NULL` should return `false` ``` trino> SELECT -> a, -> a IS NOT NULL -> FROM ( -> SELECT 'foo' AS a, 3 AS b -> UNION ALL -> SELECT NULL AS a, 4 AS b -> ); a | _col1 --+--- foo | true NULL | false ``` ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] chore(deps): update getrandom requirement from 0.2.8 to 0.3.1 [datafusion]
dependabot[bot] opened a new pull request, #14353: URL: https://github.com/apache/datafusion/pull/14353 Updates the requirements on [getrandom](https://github.com/rust-random/getrandom) to permit the latest version. Changelog Sourced from https://github.com/rust-random/getrandom/blob/master/CHANGELOG.md";>getrandom's changelog. [0.3.1] - 2025-01-28 Fixed Build error on Android https://redirect.github.com/rust-random/getrandom/issues/588";>#588 https://redirect.github.com/rust-random/getrandom/issues/588";>#588: https://redirect.github.com/rust-random/getrandom/pull/588";>rust-random/getrandom#588 [0.3.0] - 2025-01-25 Breaking Changes Changed Bump MSRV to 1.63 https://redirect.github.com/rust-random/getrandom/issues/542";>#542 Rename getrandom and getrandom_uninit functions to fill and fill_uninit respectively https://redirect.github.com/rust-random/getrandom/issues/532";>#532 Removed wasm32-wasi target support (use wasm32-wasip1 or wasm32-wasip2 instead) https://redirect.github.com/rust-random/getrandom/issues/499";>#499 linux_disable_fallback, rdrand, js, test-in-browser, and custom crate features in favor of configuration flags https://redirect.github.com/rust-random/getrandom/issues/504";>#504 register_custom_getrandom! macro https://redirect.github.com/rust-random/getrandom/issues/504";>#504 Implementation of Fromfor Error and Error::code method https://redirect.github.com/rust-random/getrandom/issues/507";>#507 Internet Explorer 11 support https://redirect.github.com/rust-random/getrandom/issues/554";>#554 Target-specific assocciated Error constants https://redirect.github.com/rust-random/getrandom/issues/562";>#562 Changed Use ProcessPrng on Windows 10 and up, and use RtlGenRandom on older Windows versions https://redirect.github.com/rust-random/getrandom/issues/415";>#415 Do not use locale-specific strerror_r for retrieving error code descriptions https://redirect.github.com/rust-random/getrandom/issues/440";>#440 Avoid assuming usize is the native word size in the rdrand backend https://redirect.github.com/rust-random/getrandom/issues/442";>#442 Do not read from errno when libc did not indicate error on Solaris https://redirect.github.com/rust-random/getrandom/issues/448";>#448 Switch from libpthread's mutex to futex on Linux and to nanosleep-based wait loop on other targets in the use_file backend https://redirect.github.com/rust-random/getrandom/issues/490";>#490 Do not retry on EAGAIN while polling /dev/random on Linux https://redirect.github.com/rust-random/getrandom/issues/522";>#522 Remove separate codepath for Node.js in the wasm_js backend (bumps minimum supported Node.js version to v19) https://redirect.github.com/rust-random/getrandom/issues/557";>#557 Use js_namespace in the wasm_js backend https://redirect.github.com/rust-random/getrandom/issues/559";>#559 Added wasm32-wasip1 and wasm32-wasip2 support https://redirect.github.com/rust-random/getrandom/issues/499";>#499 getrandom_backend configuration flag for selection of opt-in backends https://redirect.github.com/rust-random/getrandom/issues/504";>#504 Error::new_custom method https://redirect.github.com/rust-random/getrandom/issues/507";>#507 rndr opt-in backend https://redirect.github.com/rust-random/getrandom/issues/512";>#512 Automatic MemorySanitizer support https://redirect.github.com/rust-random/getrandom/issues/521";>#521 https://redirect.github.com/rust-random/getrandom/issues/571";>#571 u32 and u64 functions for generating random values of the respective type https://redirect.github.com/rust-random/getrandom/issues/544";>#544 wasm32v1-none support in the wasm_js backend https://redirect.github.com/rust-random/getrandom/issues/560";>#560 wasm_js crate feature which allows users to enable the wasm_js opt-in backend https://redirect.github.com/rust-random/getrandom/issues/574";>#574 Fixed NetBSD fallback code based on KERN_ARND https://redirect.github.com/rust-random/getrandom/issues/555";>#555 https://redirect.github.com/rust-random/getrandom/issues/415";>#415: https://redirect.github.com/rust-random/getrandom/pull/415";>rust-random/getrandom#415 ... (truncated) Commits https://github.com/rust-random/getrandom/commit/2bb39598632af0097426815487f5b33b07edacbc";>2bb3959 Release v0.3.1 (https://redirect.github.com/rust-random/getrandom/issues/589";>#589) https://github.com/rust-random/getrandom/commit/ac379dacbfbcbcc14747a0e014d84d6f24c8140b";>ac379da Fix build error on Android (https://redirect.github.com/rust-random/getrandom/issues/588";>#588) https://github.com/rust-random/getrandom/commit/aa9636349418ffa58bca6e2884d1bbc63fa1d2fa";>aa96363 Add benchmarks for u32/u64 functions. (https://redirect.github.com/rust-random/getrandom/issues/582";>#582) https://github.com/rust-r
Re: [PR] chore(deps): update getrandom requirement from 0.2.8 to 0.3.0 [datafusion]
dependabot[bot] closed pull request #14315: chore(deps): update getrandom requirement from 0.2.8 to 0.3.0 URL: https://github.com/apache/datafusion/pull/14315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore(deps): update getrandom requirement from 0.2.8 to 0.3.0 [datafusion]
dependabot[bot] commented on PR #14315: URL: https://github.com/apache/datafusion/pull/14315#issuecomment-2621010404 Superseded by #14353. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: FULL OUTER JOIN and LIMIT produces wrong results [datafusion]
zhuqi-lucas commented on PR #14338: URL: https://github.com/apache/datafusion/pull/14338#issuecomment-2621010452 Thank you @alamb and @xudong963! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] chore(deps): bump korandoru/hawkeye from 5 to 6 [datafusion]
dependabot[bot] opened a new pull request, #14354: URL: https://github.com/apache/datafusion/pull/14354 Bumps [korandoru/hawkeye](https://github.com/korandoru/hawkeye) from 5 to 6. Release notes Sourced from https://github.com/korandoru/hawkeye/releases";>korandoru/hawkeye's releases. 6.0.0 2025-01-28 Release Notes Breaking changes Now, HawkEye uses MiniJinja as the template engine. All the properties configured will be passed to the template engine as the props value, and thus: Previous ${property} should be replaced with {{ props["property"] }}. Previous built-in variables hawkeye.core.filename is now attrs.filename. Previous built-in variables hawkeye.git.fileCreatedYear is now attrs.git_file_created_year. Previous built-in variables hawkeye.git.fileModifiedYear is now attrs.git_file_modified_year. New properties: attrs.git_authors is a collection of authors of the file. You can join them with , to get a string by {{ attrs.git_authors | join(", ") }}. Notable changes Now, HawkEye would detect a leading BOM (Byte Order Mark) and remove it if it exists (https://redirect.github.com/korandoru/hawkeye/issues/166";>#166). I tend to treat this as a bug fix, but it may affect the output of the header. Install hawkeye 6.0.0 Install prebuilt binaries via shell script curl --proto '=https' --tlsv1.2 -LsSf https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-installer.sh | sh Download hawkeye 6.0.0 File Platform Checksum https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-aarch64-apple-darwin.tar.xz";>hawkeye-aarch64-apple-darwin.tar.xz Apple Silicon macOS https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-aarch64-apple-darwin.tar.xz.sha256";>checksum https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-x86_64-apple-darwin.tar.xz";>hawkeye-x86_64-apple-darwin.tar.xz Intel macOS https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-x86_64-apple-darwin.tar.xz.sha256";>checksum https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-x86_64-pc-windows-msvc.zip";>hawkeye-x86_64-pc-windows-msvc.zip x64 Windows https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-x86_64-pc-windows-msvc.zip.sha256";>checksum https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-x86_64-unknown-linux-gnu.tar.xz";>hawkeye-x86_64-unknown-linux-gnu.tar.xz x64 Linux https://github.com/korandoru/hawkeye/releases/download/v6.0.0/hawkeye-x86_64-unknown-linux-gnu.tar.xz.sha256";>checksum v5.9.1 Install hawkeye 5.9.1 Install prebuilt binaries via shell script curl --proto '=https' --tlsv1.2 -LsSf https://github.com/korandoru/hawkeye/releases/download/v5.9.1/hawkeye-installer.sh | sh Download hawkeye 5.9.1 ... (truncated) Changelog Sourced from https://github.com/korandoru/hawkeye/blob/main/CHANGELOG.md";>korandoru/hawkeye's changelog. CHANGELOG All notable changes to this project will be documented in this file. [6.0.0] 2025-01-28 Breaking changes Now, HawkEye uses MiniJinja as the template engine. All the properties configured will be passed to the template engine as the props value, and thus: Previous ${property} should be replaced with {{ props["property"] }}. Previous built-in variables hawkeye.core.filename is now attrs.filename. Previous built-in variables hawkeye.git.fileCreatedYear is now attrs.git_file_created_year. Previous built-in variables hawkeye.git.fileModifiedYear is now attrs.git_file_modified_year. New properties: attrs.git_authors is a collection of authors of the file. You can join them with , to get a string by {{ attrs.git_authors | join(", ") }}. Notable changes Now, HawkEye would detect a leading BOM (Byte Order Mark) and remove it if it exists (https://redirect.github.com/korandoru/hawkeye/issues/166";>#166). I tend to treat this as a bug fix, but it may affect the output of the header. Commits https://github.com/korandoru/hawkeye/commit/479470ff6bae2e70ac6b56ff9b1bf8ee14de9af2";>479470f chore: release 6.0.0 https://github.com/korandoru/hawkeye/commit/1467c23fc6d4ce4e6fb5d23743cfcb36dd2a358c";>1467c23 feat: render by minijnja and support git authors (https://redirect.github.com/korandoru/hawkeye/issues/169";>#169) https://github.com/korandoru/hawkeye/commit/223367e14afc21370b055f0a9c37e7fba209fbd9";>223367e fix: skip leading bom if exist (https://redirect.github.com/korandoru/hawkeye/issues/166";>#166) See full diff in https://github.com/korandoru/hawkeye/compare/v5...v6";>compare view [ > How do implementers define a signature for a UDF that takes one or more lambdas? Do we just skip type checking of lambda arguments? How does this affect the ordering? I added a new [ScalarUDF method invoke_with_lambda_args](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-ce24057bb984fe97516c33a9b1e690e20c85a5943503caa65db40d78ca3fdc7aR128) that receives ScalarFunctionArgs instead of ScalarFunctionArgs, and has a [default implementation](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-8a327db2db945bcf6ca2b4229885532feae127e94a450600d3fac6ecdc0eeb3fR702) calling invoke_with_args, returning an error if any arg is a lambda This should preserve ordering of any arg. But maybe we could add an `lambda_arguments` to `ScalarFunctionArgs` similtar to [here](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-8a327db2db945bcf6ca2b4229885532feae127e94a450600d3fac6ecdc0eeb3fR413), what do you think ? > How should one map lambda function arguments to concrete values in the body? If the function only operates on a list, this becomes easier to express. The lambda implementation should [build a RecordBatch](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-ce24057bb984fe97516c33a9b1e690e20c85a5943503caa65db40d78ca3fdc7aR147), likely using some other concrete argument, and pass it to the lambda [PhysicalExpr evaluate](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-ce24057bb984fe97516c33a9b1e690e20c85a5943503caa65db40d78ca3fdc7aR147) method That said, if you have any thoughts on the [current interface](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-ce24057bb984fe97516c33a9b1e690e20c85a5943503caa65db40d78ca3fdc7aR128), I’d love to hear your feedback. It's a `list_map([1,2], x -> x*2)` udf, you can see the [executed output here](https://github.com/apache/datafusion/compare/main...gstvg:arrow-datafusion:lambda#diff-53634af9d48017f60cb545dfe2aaab20d1f7fa3435947c161ef22515fa62ed8dR26). Thanks! 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Add support for lambda/higher order functions [datafusion]
gstvg commented on issue #14205: URL: https://github.com/apache/datafusion/issues/14205#issuecomment-2621070075 Lambda syntax only works with databricks dialect, proposed fix here https://github.com/apache/datafusion-sqlparser-rs/pull/1686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1933513054 ## datafusion/physical-optimizer/src/limit_pushdown.rs: ## @@ -247,7 +246,15 @@ pub fn pushdown_limit_helper( } } else { // Add fetch or a `LimitExec`: -global_state.satisfied = true; +// If the plan's children have limit, we shouldn't change the global state to true, Review Comment: Good question @xudong963 , added the slt testing for children's limit is >= the globe limit, the limit should also support push down consistent with current behaviour, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Script and documentation for regenerating sqlite test files [datafusion]
Omega359 commented on code in PR #14290: URL: https://github.com/apache/datafusion/pull/14290#discussion_r1934713735 ## datafusion/sqllogictest/regenerate_sqlite_files.sh: ## @@ -0,0 +1,179 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +echo "This script is experimental! Please read the following completely to understand +what this script does before running it. + +This script is designed to regenerate the .slt files in datafusion-testing/data/sqlite/ +from source files obtained from a git repository. To do that the following steps are +performed: + +- Verify required commands are installed +- Clone the remote git repository into /tmp/sqlitetesting +- Delete all existing .slt files in datafusion-testing/data/sqlite/ folder +- Copy the .test files from the cloned git repo into datafusion-testing +- Remove a few directories and files from the copied files (not relevant to DataFusion) +- Rename the .test files to .slt and cleanses the files. Cleansing involves: + - dos2unix + - removing all references to mysql, mssql and postgresql + - adds in a new 'control resultmode valuewise' statement at the top of all files + - updates the files to change 'AS REAL' to 'AS FLOAT8' +- Replace the sqllogictest-rs dependency in the Cargo.toml with a version to + a git repository that has been custom written to properly complete the files + with comparison of datafusion results with postgresql +- Replace the sqllogictest.rs file with a customized version from a github gist + that will work with the customized sqllogictest-rs dependency +- Run the sqlite test with completion (takes > 1 hr) +- Update a few results to ignore known issues +- Run sqlite test to verify results +- Perform cleanup to revert changes to the Cargo.toml file/sqllogictest.rs file +- Delete backup files and the /tmp/sqlitetesting directory +" +read -r -p "Do you understand and accept the risk? (yes/no): " acknowledgement + +if [ "${acknowledgement,,}" != "yes" ]; then + exit 0 +else + echo "Ok, Proceeding..." +fi + +if [ ! -x "$(command -v sd)" ]; then + echo "This script required 'sd' to be installed. Install via 'cargo install sd' or using your local package manager" + exit 0 +else + echo "sd command is installed, proceeding..." +fi + +DF_HOME=$(realpath "../../") + +# location where we'll clone sql-logic-test repos +if [ ! -d "/tmp/sqlitetesting" ]; then + mkdir /tmp/sqlitetesting +fi + +if [ ! -d "/tmp/sqlitetesting/sql-logic-test" ]; then + echo "Cloning sql-logic-test into /tmp/sqlitetesting/" + cd /tmp/sqlitetesting/ || exit; + git clone https://github.com/hydromatic/sql-logic-test.git +fi + +echo "Removing all existing .slt files from datafusion-testing/data/sqlite/ directory" + +cd "$DF_HOME/datafusion-testing/data/sqlite/" || exit; +find ./ -type f -name "*.slt" -exec rm {} \; + +echo "Copying .test files from sql-logic-test to datafusion-testing/data/sqlite/" + +cp -r /tmp/sqlitetesting/sql-logic-test/src/main/resources/test/* ./ + +echo "Removing 'evidence/*' and 'index/delete/*' directories from datafusion-testing" + +find ./evidence/ -type f -name "*.test" -exec rm {} \; +rm -rf ./index/delete/1 +rm -rf ./index/delete/10 +rm -rf ./index/delete/100 +rm -rf ./index/delete/1000 +rm -rf ./index/delete/1 +# this file is empty and causes the sqllogictest-rs code to fail +rm ./index/view/10/slt_good_0.test + +echo "Renaming .test files to .slt and cleansing the files ..." + +# add hash-threshold lines into these 3 files as they were missing +sed -i '1s/^/hash-threshold 8\n\n/' ./select1.test +sed -i '1s/^/hash-threshold 8\n\n/' ./select4.test +sed -i '1s/^/hash-threshold 8\n\n/' ./select5.test +# rename +find ./ -type f -name "*.test" -exec rename -f 's/\.test/\.slt/' {} \; +# gotta love windows :/ +find ./ -type f -name "*.slt" -exec dos2unix --quiet {} \; +# add in control resultmode +find ./ -type f -name "*.slt" -exec sd -f i 'hash-threshold 8\n' 'hash-threshold 8\ncontrol resultmode valuewise\n' {} \; +# remove mysql tests and skipif lines +find ./ -type f -name "*.slt" -exec sd -f i 'onlyif mysql.+\n.+\n.+\n.+\n.+\n.+\n.+\n.+\n.+\n.+\n.+\n.+\n\n' '' {} \; +find ./ -type f -name "*.slt" -exec sd -f i 'onlyif mysql.+\n.+\n.+\n.+\n.+\
Re: [I] Type Coercion fails for List with inner type struct which has large/view types [datafusion]
alamb commented on issue #14154: URL: https://github.com/apache/datafusion/issues/14154#issuecomment-2622957118 Possibly related: - https://github.com/apache/datafusion/pull/12490 (released in DataFusion 44.0.0) - https://github.com/apache/datafusion/pull/13452 (also released in DataFusion 44.0.0) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Type Coercion fails for List with inner type struct which has large/view types [datafusion]
alamb commented on issue #14154: URL: https://github.com/apache/datafusion/issues/14154#issuecomment-2622959604 Ao the next step for this PR is to find a DataFusion only reproducer that works in DF 43 but not in DF 44. I will try to do so tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622977789 as described in document, https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/sql/connector/catalog/SupportsMetadataColumns.html If a table column and a metadata column have the same name, the conflict is resolved by either renaming or suppressing the metadata column. I think we should also guarantee suppressing the metadata column in case of arbitrary fields order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] chore: Prepare for DataFusion 45 (bump to DataFusion rev 5592834 + Arrow 54.0.0) [datafusion-comet]
andygrove commented on code in PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1934746178 ## native/core/src/execution/operators/filter.rs: ## @@ -62,6 +65,8 @@ pub struct FilterExec { default_selectivity: u8, /// Properties equivalence properties, partitioning, etc. cache: PlanProperties, +/// The projection indices of the columns in the output schema of join +projection: Option>, Review Comment: We currently maintain a copy of DataFusion's `FilterExec` with one small change, so I copied over that latest to keep in sync and then re-applied the change that we need (for memory safety because of the way we re-use buffers). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]
rohitrastogi commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1934929386 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! [DedicatedExecutor] for running CPU-bound tasks on a separate tokio runtime. + +use crate::SendableRecordBatchStream; +use async_trait::async_trait; +use datafusion::physical_plan::stream::RecordBatchStreamAdapter; +use datafusion_common::error::GenericError; +use datafusion_common::DataFusionError; +use futures::stream::BoxStream; +use futures::{ +future::{BoxFuture, Shared}, +Future, FutureExt, Stream, StreamExt, TryFutureExt, +}; +use log::{info, warn}; +use object_store::path::Path; +use object_store::{ +GetOptions, GetResult, GetResultPayload, ListResult, MultipartUpload, ObjectMeta, +ObjectStore, PutMultipartOpts, PutOptions, PutPayload, PutResult, UploadPart, +}; +use std::cell::RefCell; +use std::pin::Pin; +use std::sync::RwLock; +use std::task::{Context, Poll}; +use std::{fmt::Display, sync::Arc, time::Duration}; +use tokio::runtime::Builder; +use tokio::task::JoinHandle; +use tokio::{ +runtime::Handle, +sync::{oneshot::error::RecvError, Notify}, +task::JoinSet, +}; +use tokio_stream::wrappers::ReceiverStream; + +/// Create a [`DedicatedExecutorBuilder`] from a tokio [`Builder`] +impl From for DedicatedExecutorBuilder { +fn from(value: Builder) -> Self { +Self::new_from_builder(value) +} +} + +/// Manages a separate tokio [`Runtime`] (thread pool) for executing CPU bound +/// tasks such as DataFusion `ExecutionPlans`. +/// +/// See [`DedicatedExecutorBuilder`] for creating a new instance. +/// +/// A `DedicatedExecutor` can helps avoid issues when runnnig IO and CPU bound tasks on the +/// same thread pool by running futures (and any `tasks` that are +/// `tokio::task::spawned` by them) on a separate tokio [`Executor`]. +/// +/// `DedicatedExecutor`s can be `clone`ed and all clones share the same thread pool. +/// +/// Since the primary use for a `DedicatedExecutor` is offloading CPU bound +/// work, IO work can not be performed on tasks launched in the Executor. +/// +/// To perform IO, see: +/// - [`Self::spawn_io`] +/// - [`Self::wrap_object_store`] +/// +/// When [`DedicatedExecutorBuilder::build`] is called, a reference to the +/// "current" tokio runtime will be stored and used, via [`register_io_runtime`] by all +/// threads spawned by the executor. Any I/O done by threads in this +/// [`DedicatedExecutor`] should use [`spawn_io`], which will run them on the +/// I/O runtime. +/// +/// # TODO examples +/// +/// # Background +/// +/// Tokio has the notion of the "current" runtime, which runs the current future +/// and any tasks spawned by it. Typically, this is the runtime created by +/// `tokio::main` and is used for the main application logic and I/O handling +/// +/// For CPU bound work, such as DataFusion plan execution, it is important to +/// run on a separate thread pool to avoid blocking the I/O handling for extended +/// periods of time in order to avoid long poll latencies (which decreases the +/// throughput of small requests under concurrent load). +/// +/// # IO Scheduling +/// +/// I/O, such as network calls, should not be performed on the runtime managed +/// by [`DedicatedExecutor`]. As tokio is a cooperative scheduler, long-running +/// CPU tasks will not be preempted and can therefore starve servicing of other +/// tasks. This manifests in long poll-latencies, where a task is ready to run +/// but isn't being scheduled to run. For CPU-bound work this isn't a problem as +/// there is no external party waiting on a response, however, for I/O tasks, +/// long poll latencies can prevent timely servicing of IO, which can have a +/// significant detrimental effect. +/// +/// # Details +/// +/// The worker thread priority is set to low so that such tasks do +/// not starve other more important tasks (such as answering health checks) +/// +/// Follows the example from stack overflow and spawns a new +/// thread to install a Tokio runtime "context" +///
Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]
rohitrastogi commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1934929386 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! [DedicatedExecutor] for running CPU-bound tasks on a separate tokio runtime. + +use crate::SendableRecordBatchStream; +use async_trait::async_trait; +use datafusion::physical_plan::stream::RecordBatchStreamAdapter; +use datafusion_common::error::GenericError; +use datafusion_common::DataFusionError; +use futures::stream::BoxStream; +use futures::{ +future::{BoxFuture, Shared}, +Future, FutureExt, Stream, StreamExt, TryFutureExt, +}; +use log::{info, warn}; +use object_store::path::Path; +use object_store::{ +GetOptions, GetResult, GetResultPayload, ListResult, MultipartUpload, ObjectMeta, +ObjectStore, PutMultipartOpts, PutOptions, PutPayload, PutResult, UploadPart, +}; +use std::cell::RefCell; +use std::pin::Pin; +use std::sync::RwLock; +use std::task::{Context, Poll}; +use std::{fmt::Display, sync::Arc, time::Duration}; +use tokio::runtime::Builder; +use tokio::task::JoinHandle; +use tokio::{ +runtime::Handle, +sync::{oneshot::error::RecvError, Notify}, +task::JoinSet, +}; +use tokio_stream::wrappers::ReceiverStream; + +/// Create a [`DedicatedExecutorBuilder`] from a tokio [`Builder`] +impl From for DedicatedExecutorBuilder { +fn from(value: Builder) -> Self { +Self::new_from_builder(value) +} +} + +/// Manages a separate tokio [`Runtime`] (thread pool) for executing CPU bound +/// tasks such as DataFusion `ExecutionPlans`. +/// +/// See [`DedicatedExecutorBuilder`] for creating a new instance. +/// +/// A `DedicatedExecutor` can helps avoid issues when runnnig IO and CPU bound tasks on the +/// same thread pool by running futures (and any `tasks` that are +/// `tokio::task::spawned` by them) on a separate tokio [`Executor`]. +/// +/// `DedicatedExecutor`s can be `clone`ed and all clones share the same thread pool. +/// +/// Since the primary use for a `DedicatedExecutor` is offloading CPU bound +/// work, IO work can not be performed on tasks launched in the Executor. +/// +/// To perform IO, see: +/// - [`Self::spawn_io`] +/// - [`Self::wrap_object_store`] +/// +/// When [`DedicatedExecutorBuilder::build`] is called, a reference to the +/// "current" tokio runtime will be stored and used, via [`register_io_runtime`] by all +/// threads spawned by the executor. Any I/O done by threads in this +/// [`DedicatedExecutor`] should use [`spawn_io`], which will run them on the +/// I/O runtime. +/// +/// # TODO examples +/// +/// # Background +/// +/// Tokio has the notion of the "current" runtime, which runs the current future +/// and any tasks spawned by it. Typically, this is the runtime created by +/// `tokio::main` and is used for the main application logic and I/O handling +/// +/// For CPU bound work, such as DataFusion plan execution, it is important to +/// run on a separate thread pool to avoid blocking the I/O handling for extended +/// periods of time in order to avoid long poll latencies (which decreases the +/// throughput of small requests under concurrent load). +/// +/// # IO Scheduling +/// +/// I/O, such as network calls, should not be performed on the runtime managed +/// by [`DedicatedExecutor`]. As tokio is a cooperative scheduler, long-running +/// CPU tasks will not be preempted and can therefore starve servicing of other +/// tasks. This manifests in long poll-latencies, where a task is ready to run +/// but isn't being scheduled to run. For CPU-bound work this isn't a problem as +/// there is no external party waiting on a response, however, for I/O tasks, +/// long poll latencies can prevent timely servicing of IO, which can have a +/// significant detrimental effect. +/// +/// # Details +/// +/// The worker thread priority is set to low so that such tasks do +/// not starve other more important tasks (such as answering health checks) +/// +/// Follows the example from stack overflow and spawns a new +/// thread to install a Tokio runtime "context" +///
Re: [PR] move information_schema to datafusion-catalog [datafusion]
alamb commented on code in PR #14364: URL: https://github.com/apache/datafusion/pull/14364#discussion_r1934520683 ## datafusion/substrait/Cargo.toml: ## @@ -46,6 +46,7 @@ url = { workspace = true } [dev-dependencies] datafusion = { workspace = true, features = ["nested_expressions"] } +datafusion-catalog = { workspace = true } Review Comment: I think since datafusion re-exports `datafusion_catalog` we should be able to avoid this dependency: https://github.com/apache/datafusion/blob/9ca6c2557488c1c608182542c1be96889b64fe29/datafusion/core/src/lib.rs#L731-L730 ## datafusion/catalog/src/lib.rs: ## @@ -18,23 +18,264 @@ //! Interfaces and default implementations of catalogs and schemas. //! //! Implementations +//! * Information schema: [`information_schema`] //! * Simple memory based catalog: [`MemoryCatalogProviderList`], [`MemoryCatalogProvider`], [`MemorySchemaProvider`] pub mod memory; +pub use datafusion_sql::{ResolvedTableReference, TableReference}; pub use memory::{ MemoryCatalogProvider, MemoryCatalogProviderList, MemorySchemaProvider, }; +use std::collections::BTreeSet; +use std::ops::ControlFlow; mod r#async; mod catalog; mod dynamic_file; +pub mod information_schema; mod schema; mod session; mod table; - pub use catalog::*; pub use dynamic_file::catalog::*; pub use r#async::*; pub use schema::*; pub use session::*; pub use table::*; +pub mod streaming; + +/// Collects all tables and views referenced in the SQL statement. CTEs are collected separately. +/// This can be used to determine which tables need to be in the catalog for a query to be planned. +/// +/// # Returns +/// +/// A `(table_refs, ctes)` tuple, the first element contains table and view references and the second +/// element contains any CTE aliases that were defined and possibly referenced. +/// +/// ## Example +/// +/// ``` +/// # use datafusion_sql::parser::DFParser; +/// # use datafusion_catalog::resolve_table_references; +/// let query = "SELECT a FROM foo where x IN (SELECT y FROM bar)"; +/// let statement = DFParser::parse_sql(query).unwrap().pop_back().unwrap(); +/// let (table_refs, ctes) = resolve_table_references(&statement, true).unwrap(); +/// assert_eq!(table_refs.len(), 2); +/// assert_eq!(table_refs[0].to_string(), "bar"); +/// assert_eq!(table_refs[1].to_string(), "foo"); +/// assert_eq!(ctes.len(), 0); +/// ``` +/// +/// ## Example with CTEs +/// +/// ``` +/// # use datafusion_sql::parser::DFParser; +/// # use datafusion_catalog::resolve_table_references; +/// let query = "with my_cte as (values (1), (2)) SELECT * from my_cte;"; +/// let statement = DFParser::parse_sql(query).unwrap().pop_back().unwrap(); +/// let (table_refs, ctes) = resolve_table_references(&statement, true).unwrap(); +/// assert_eq!(table_refs.len(), 0); +/// assert_eq!(ctes.len(), 1); +/// assert_eq!(ctes[0].to_string(), "my_cte"); +/// ``` +pub fn resolve_table_references( +statement: &datafusion_sql::parser::Statement, Review Comment: I think since this function belongs in `datafusion-sql` as it is walking over the sql parse tree perhaps we can put it in `https://github.com/apache/datafusion/tree/main/datafusion/sql/src/resolve.rs` or something and then remove the dependency of `datafusion-catalog` on `datafusion-sql` I think we could do this in a follow on PR as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] move information_schema to datafusion-catalog [datafusion]
alamb commented on PR #14364: URL: https://github.com/apache/datafusion/pull/14364#issuecomment-2622752446 I am checking this one out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Build time regression [datafusion]
alamb commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2622826545 > > After removing the WildcardOptions (by replacing it with an empty structure) I can see the build time drops. Removing the rule itself and the change in core doesn't help. It looks like the change to Expr is the root cause, though I don't know the reason... > > I wonder if the issue is that `WildCardOptions` is a large struct that is not `Box`ed -- so it means that every `Expr` needs to be large enough to hold it which maybe increased the size of all `Expr`s 🤔 🤔 it seems we have already Box'd it: https://github.com/apache/datafusion/blob/2797cf7221e5492f14157e461d8066f5223650ce/datafusion/expr/src/expr.rs#L316-L315 Let me see if I can find ways to make Expr smaller though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] move information_schema to datafusion-catalog [datafusion]
logan-keede commented on code in PR #14364: URL: https://github.com/apache/datafusion/pull/14364#discussion_r1934571539 ## datafusion/substrait/Cargo.toml: ## @@ -46,6 +46,7 @@ url = { workspace = true } [dev-dependencies] datafusion = { workspace = true, features = ["nested_expressions"] } +datafusion-catalog = { workspace = true } Review Comment: I knew there was some chance of removing this dependency, but not sure on how to go about it. So, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] move information_schema to datafusion-catalog [datafusion]
logan-keede commented on PR #14364: URL: https://github.com/apache/datafusion/pull/14364#issuecomment-2622850087 > If you are feeling like some more refactoring projects, any chance you are interested in working to split out the data sources (aka make `datafusion-datasource-parquet`, `datafusion-datasource-csv`, etc)? Well, I guess I know what I might be doing when I get back. I will definitely look into it, please let me know if there have been some discussion(or any relevant resource) on it in the past if that's not too much trouble. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Simple Functions [datafusion]
Omega359 commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2622912627 https://github.com/apache/datafusion/blob/main/datafusion/functions-nested/src/string.rs is a good example of what can result with supporting many types and args -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Reduce size of `Expr` struct [datafusion]
findepi commented on code in PR #14366: URL: https://github.com/apache/datafusion/pull/14366#discussion_r1934669162 ## datafusion/expr/src/expr.rs: ## @@ -3067,4 +3078,19 @@ mod test { rename: opt_rename, } } + +#[test] +fn test_size_of_expr() { +// because Expr is such a widely used struct in DataFusion +// it is important to keep its size as small as possible +// +// If this test fails when you change `Expr`, please try +// `Box`ing the fields to make `Expr` smaller +// See https://github.com/apache/datafusion/issues/14256 for details +assert_eq!(size_of::(), 112); +assert_eq!(size_of::(), 64); +assert_eq!(size_of::(), 24); // 3 ptrs Review Comment: does this test pass in debug and release modes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623096983 something like this. ```rust #[tokio::test] async fn test_name_conflict() { let batch = record_batch!( ("_rowid", UInt32, [0, 1, 2]), ("_rowid", Utf8, ["file-0", "file-1", "file-2"]) ) .unwrap(); let batch = batch .with_schema(Arc::new(Schema::new(vec![ Field::new("_rowid", DataType::UInt32, true).to_system_column(), Field::new("_rowid", DataType::Utf8, true), ]))) .unwrap(); let ctx = SessionContext::new_with_config( SessionConfig::new().with_information_schema(true), ); let _ = ctx.register_batch("test", batch); let select = "SELECT _rowid FROM test"; let df = ctx.sql(select).await.unwrap(); let batchs = df.collect().await.unwrap(); let expected = [ "++", "| _rowid |", "++", "| file-0 |", "| file-1 |", "| file-2 |", "++", ]; assert_batches_sorted_eq!(expected, &batchs); } ``` Currently DuplicateQualifiedField error is thrown. But it's valid in spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623148014 I will still give a shot at adding that feature tomorrow. But I'm not sold on the behavior being ideal even if that's what Spark does. Besides if it's an error now we can always make it not error in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: add expression array_size [datafusion-comet]
parthchandra commented on code in PR #1122: URL: https://github.com/apache/datafusion-comet/pull/1122#discussion_r1934793909 ## native/spark-expr/src/list.rs: ## @@ -708,6 +708,92 @@ impl PartialEq for ArrayInsert { } } +#[derive(Debug, Hash)] +pub struct ArraySize { +src_array_expr: Arc, +} + +impl ArraySize { +pub fn new(src_array_expr: Arc) -> Self { +Self { src_array_expr } +} +} + +impl Display for ArraySize { +fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { +write!(f, "ArraySize [array: {:?}]", self.src_array_expr) +} +} + +impl PartialEq for ArraySize { +fn eq(&self, other: &dyn Any) -> bool { +down_cast_any_ref(other) +.downcast_ref::() +.map(|x| self.src_array_expr.eq(&x.src_array_expr)) +.unwrap_or(false) +} +} + +impl PhysicalExpr for ArraySize { +fn as_any(&self) -> &dyn Any { +self +} + +fn data_type(&self, _input_schema: &Schema) -> DataFusionResult { +Ok(DataType::Int32) +} + +fn nullable(&self, input_schema: &Schema) -> DataFusionResult { +self.src_array_expr.nullable(input_schema) +} + +fn evaluate(&self, batch: &RecordBatch) -> DataFusionResult { +let array_value = self +.src_array_expr +.evaluate(batch)? +.into_array(batch.num_rows())?; +match array_value.data_type() { +DataType::List(_) => { +let list_array = as_list_array(&array_value)?; +let mut builder = Int32Array::builder(list_array.len()); +for i in 0..list_array.len() { +if list_array.is_null(i) { +builder.append_null(); +} else { +builder.append_value(list_array.value_length(i)); Review Comment: Will `value_length(i)` include nulls in the array at index `i`? ## native/spark-expr/src/list.rs: ## @@ -708,6 +708,92 @@ impl PartialEq for ArrayInsert { } } +#[derive(Debug, Hash)] +pub struct ArraySize { +src_array_expr: Arc, +} + +impl ArraySize { +pub fn new(src_array_expr: Arc) -> Self { +Self { src_array_expr } +} +} + +impl Display for ArraySize { +fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { +write!(f, "ArraySize [array: {:?}]", self.src_array_expr) +} +} + +impl PartialEq for ArraySize { +fn eq(&self, other: &dyn Any) -> bool { +down_cast_any_ref(other) +.downcast_ref::() +.map(|x| self.src_array_expr.eq(&x.src_array_expr)) +.unwrap_or(false) +} +} + +impl PhysicalExpr for ArraySize { +fn as_any(&self) -> &dyn Any { +self +} + +fn data_type(&self, _input_schema: &Schema) -> DataFusionResult { +Ok(DataType::Int32) +} + +fn nullable(&self, input_schema: &Schema) -> DataFusionResult { +self.src_array_expr.nullable(input_schema) +} + +fn evaluate(&self, batch: &RecordBatch) -> DataFusionResult { +let array_value = self +.src_array_expr +.evaluate(batch)? +.into_array(batch.num_rows())?; +match array_value.data_type() { +DataType::List(_) => { +let list_array = as_list_array(&array_value)?; +let mut builder = Int32Array::builder(list_array.len()); +for i in 0..list_array.len() { +if list_array.is_null(i) { Review Comment: You might want to handle `legacySizeOfNull` here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623174593 > Is this that important to support? The example seems a bit contrived, I think it'd be more reasonable if it occurred naturally as part of a join or something where a user could unexpectedly run into it. Otherwise it seems to me like something very spark specific. > > Postgres for example doesn't let you duplicate system columns: > > ```sql > create table t (ctid int); > ERROR: column name "ctid" conflicts with a system column name > ``` because spark is not a database, it's a compute engine. For databases such as postgres, it can manipulate schema and data by itself. so it's ok to disable system column and normal column name conflict. but for spark, the data comes from other systems. it cannot guarantee data doesn't contain these conflict fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [I] Implement xxhash algorithms as part of the expression API [datafusion]
Spaarsh commented on issue #14044: URL: https://github.com/apache/datafusion/issues/14044#issuecomment-2623464462 Thanks @HectorPascual! I'm opening PR from here on for transperancy's sake! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]
Spaarsh opened a new pull request, #14367: URL: https://github.com/apache/datafusion/pull/14367 ## Which issue does this PR close? Closes #14044. ## Rationale for this change Lack of xxhash (a quick, non-cryptographic hashing technique) functions. ## What changes are included in this PR? ## Are these changes tested? Yes. ## Are there any user-facing changes? The users shall now be able to use xxhash32 and xxhash64 functions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Allow plain JOIN without turning it into INNER [datafusion-sqlparser-rs]
iffyio merged PR #1692: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org