Re: [PR] Add `insta` / snapshot testing to CLI & set up AWS mock [datafusion]

2025-03-08 Thread via GitHub
findepi commented on code in PR #13672: URL: https://github.com/apache/datafusion/pull/13672#discussion_r1986025313 ## datafusion-cli/CONTRIBUTING.md: ## @@ -0,0 +1,75 @@ + + +# Development instructions + +## Running Tests + +Tests can be run using `cargo` + +```shell +cargo tes

[PR] build(deps): bump datafusion-ffi from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-08 Thread via GitHub
dependabot[bot] opened a new pull request, #1050: URL: https://github.com/apache/datafusion-python/pull/1050 Bumps [datafusion-ffi](https://github.com/apache/datafusion) from 45.0.0 to 46.0.0. Commits https://github.com/apache/datafusion/commit/d5ca8307940c1a6345419a2c8d91ef877

Re: [PR] Document guidelines for physical operator yielding [datafusion]

2025-03-08 Thread via GitHub
ozankabak commented on PR #15030: URL: https://github.com/apache/datafusion/pull/15030#issuecomment-2704990515 Thanks for improving the docs, left my suggestions inline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-03-08 Thread via GitHub
Spaarsh commented on PR #982: URL: https://github.com/apache/datafusion-python/pull/982#issuecomment-2708491224 Key Points: 1. ```|``` operator not supported for python < 3.10, anyone pulling the main post merge will not be able to use ```SessionContext``` at all 2. ```global_ctx``` a

Re: [I] Expose global context [datafusion-python]

2025-03-08 Thread via GitHub
Spaarsh commented on issue #1045: URL: https://github.com/apache/datafusion-python/issues/1045#issuecomment-2708491652 Key Points: 1. ```|``` operator not supported for python < 3.10, anyone pulling the main will not be able to use ```SessionContext``` at all 2. ```global_ctx``` alrea

Re: [I] Add udf / udaf decorators [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer closed issue #806: Add udf / udaf decorators URL: https://github.com/apache/datafusion-python/issues/806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Expand wildcard to actual expressions in `prepare_select_exprs` [datafusion]

2025-03-08 Thread via GitHub
jayzhan211 commented on code in PR #15090: URL: https://github.com/apache/datafusion/pull/15090#discussion_r1986194642 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -985,13 +985,20 @@ drop table ambiguity_test; statement ok create table t(a0 int, a int, b int, c int)

Re: [PR] Expand wildcard to actual expressions in `prepare_select_exprs` [datafusion]

2025-03-08 Thread via GitHub
jayzhan211 commented on code in PR #15090: URL: https://github.com/apache/datafusion/pull/15090#discussion_r1986194642 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -985,13 +985,20 @@ drop table ambiguity_test; statement ok create table t(a0 int, a int, b int, c int)

Re: [PR] Order Requirement Analysis [datafusion-site]

2025-03-08 Thread via GitHub
akurmustafa commented on code in PR #58: URL: https://github.com/apache/datafusion-site/pull/58#discussion_r1986190590 ## content/blog/2025-03-05-ordering-analysis.md: ## @@ -0,0 +1,176 @@ +--- +layout: post +title: Analysis of Ordering for Better Plans +date: 2025-03-05 +author

Re: [I] Internal error: Non Panic Task error: task 113 was cancelled. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [datafu

2025-03-08 Thread via GitHub
chenquan commented on issue #15065: URL: https://github.com/apache/datafusion/issues/15065#issuecomment-2708618865 > > [@alamb](https://github.com/alamb) Hello, thank you very much! In addition, I want to know how to control the cancellation of tasks by myself. Currently, I have observed th

Re: [PR] Order Requirement Analysis [datafusion-site]

2025-03-08 Thread via GitHub
akurmustafa commented on PR #58: URL: https://github.com/apache/datafusion-site/pull/58#issuecomment-2708620029 Hi @alamb, I have addressed the points you mention. I also changed the order of some sections to make the post more clear. I am happy to address reviews by the community. -- Th

[PR] Fix wasm32 build on version 46 [datafusion]

2025-03-08 Thread via GitHub
XiangpengHao opened a new pull request, #15102: URL: https://github.com/apache/datafusion/pull/15102 ## Which issue does this PR close? - Closes #. ## Rationale for this change I encountered a compile error when trying to upgrade DataFusion to V46 for [paruqet vi

Re: [I] Upgrade to sqlparser 0.55.0 [datafusion]

2025-03-08 Thread via GitHub
PokIsemaine commented on issue #15071: URL: https://github.com/apache/datafusion/issues/15071#issuecomment-2708645397 I want to discuss `JoinType`: JoinOperator - https://github.com/apache/datafusion-sqlparser-rs/pull/1692 - https://github.com/apache/datafusion-sqlparser-rs/pul

Re: [PR] Fix wasm32 build on version 46 [datafusion]

2025-03-08 Thread via GitHub
XiangpengHao commented on code in PR #15102: URL: https://github.com/apache/datafusion/pull/15102#discussion_r1986212420 ## .github/workflows/rust.yml: ## @@ -259,6 +259,10 @@ jobs: uses: ./.github/actions/setup-builder with: rust-version: stable +

Re: [PR] refactor: use TypeSignature::Coercible for crypto functions [datafusion]

2025-03-08 Thread via GitHub
Chen-Yuan-Lai commented on PR #14826: URL: https://github.com/apache/datafusion/pull/14826#issuecomment-2708661495 Hi @jayzhan211, It seems all the CI checks were passed (including sqlogicaltest), but when I created and printed a table table by datafusion-cli , I got empty result ```

Re: [PR] Introducing mutation testing [datafusion]

2025-03-08 Thread via GitHub
Omega359 commented on PR #14590: URL: https://github.com/apache/datafusion/pull/14590#issuecomment-2708589256 I am wondering if this test is just too strenuous for the ci runners. If it could be narrowed down to just the modified code I could see this working, otherwise perhaps it'll either

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-03-08 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2708663987 Hey @alamb @lmwnshn I've been actually following the CMU 15-799 course (nights and weekend's mostly) and started working on a Rust port of the benchbase Java implementation a

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-03-08 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1986214497 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlan

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-03-08 Thread via GitHub
himadripal commented on PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#issuecomment-2708663771 > I left a comment about a missing assertion @andygrove I did not see this comment. Although I reverted previous changes related to spark3.3 assertion and added a chec

Re: [PR] Refactor EnforceDistribution test cases to demonstrate dependencies across optimizer runs. [datafusion]

2025-03-08 Thread via GitHub
wiedld commented on code in PR #15074: URL: https://github.com/apache/datafusion/pull/15074#discussion_r1986108634 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -442,40 +445,30 @@ impl TestConfig { self.config.execution.target_partitions = t

Re: [PR] docs: Add README to tpch directory [datafusion-ray]

2025-03-08 Thread via GitHub
andygrove commented on PR #79: URL: https://github.com/apache/datafusion-ray/pull/79#issuecomment-2708384249 @robtandy This is now ready for review. I was able to run the benchmarks this morning based on these instructions. -- This is an automated message from the Apache Git Service. To r

[PR] Triggering extended tests through PR comment [datafusion]

2025-03-08 Thread via GitHub
danila-b opened a new pull request, #15101: URL: https://github.com/apache/datafusion/pull/15101 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/14319 ## Rationale for this change Allows to run extended tests on some PRs if needed, wh

Re: [PR] Triggering extended tests through PR comment [datafusion]

2025-03-08 Thread via GitHub
Omega359 commented on PR #15101: URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2708410057 Looks like it ran checks on the forked branch, I think that is what we want ![image](https://github.com/user-attachments/assets/de280fc4-b351-4141-920d-b62012b889ff) --

Re: [PR] chore: Add `native_iceberg_compat` CI checks [datafusion-comet]

2025-03-08 Thread via GitHub
andygrove commented on PR #1487: URL: https://github.com/apache/datafusion-comet/pull/1487#issuecomment-2708417311 failure: ``` Error: Errors: Error: ParquetEncryptionITCase>SparkFunSuite.run:69->SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run:69->AnyFunSuite.org$

[I] `ParquetEncryptionITCase` fails with `native_iceberg_compat` [datafusion-comet]

2025-03-08 Thread via GitHub
andygrove opened a new issue, #1488: URL: https://github.com/apache/datafusion-comet/issues/1488 ### Describe the bug `ParquetEncryptionITCase` fails with `native_iceberg_compat` ### Steps to reproduce _No response_ ### Expected behavior _No response_

[PR] Address a TODO about simplify Ray stages collection [datafusion-ray]

2025-03-08 Thread via GitHub
vmingchen opened a new pull request, #80: URL: https://github.com/apache/datafusion-ray/pull/80 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] feat: Implementation of udf and udaf decorator [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer merged PR #1040: URL: https://github.com/apache/datafusion-python/pull/1040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] docs: Add README to tpch directory [datafusion-ray]

2025-03-08 Thread via GitHub
robtandy commented on code in PR #79: URL: https://github.com/apache/datafusion-ray/pull/79#discussion_r1986161055 ## tpch/README.md: ## @@ -0,0 +1,120 @@ + + +# Benchmarking DataFusion Ray on Kubernetes + +This is a rough guide to deploying and benchmarking DataFusion Ray on K

Re: [PR] docs: Add README to tpch directory [datafusion-ray]

2025-03-08 Thread via GitHub
robtandy commented on code in PR #79: URL: https://github.com/apache/datafusion-ray/pull/79#discussion_r1986161334 ## tpch/README.md: ## @@ -0,0 +1,120 @@ + + +# Benchmarking DataFusion Ray on Kubernetes + +This is a rough guide to deploying and benchmarking DataFusion Ray on K

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-03-08 Thread via GitHub
kylebarron commented on PR #982: URL: https://github.com/apache/datafusion-python/pull/982#issuecomment-2708499224 There needs to be an initial import `from __future__ import annotations` on the first line to be able to use `|` typing syntax. There are ruff rules to check this and we

Re: [PR] implement tree explain for GlobalLimitExec [datafusion]

2025-03-08 Thread via GitHub
irenjj commented on code in PR #15100: URL: https://github.com/apache/datafusion/pull/15100#discussion_r1986188763 ## datafusion/physical-plan/src/limit.rs: ## @@ -109,8 +109,12 @@ impl DisplayAs for GlobalLimitExec { ) } DisplayFormatT

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer commented on PR #982: URL: https://github.com/apache/datafusion-python/pull/982#issuecomment-2708552064 Thanks, Kyle. More generally I’ll see about the impact of turning on all the rules and then removing a few specifically as needed -- This is an automated message from the Apa

Re: [PR] chore: Reduce number of runs of Rust unit tests in CI [datafusion-comet]

2025-03-08 Thread via GitHub
kazuyukitanimura commented on code in PR #1481: URL: https://github.com/apache/datafusion-comet/pull/1481#discussion_r1986199136 ## .github/workflows/pr_build.yml: ## @@ -40,12 +40,31 @@ env: RUST_VERSION: stable jobs: + linux-test-rust: +strategy: + matrix: +

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-03-08 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2708672112 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] docs: Improve docs on AggregateFunctionExpr construction [datafusion]

2025-03-08 Thread via GitHub
ctsk commented on code in PR #15044: URL: https://github.com/apache/datafusion/pull/15044#discussion_r1983234072 ## datafusion/physical-expr/src/aggregate.rs: ## @@ -91,6 +91,9 @@ impl AggregateExprBuilder { } } +/// Constructs an `AggregateFunctionExpr` from

Re: [PR] SET with a list of comma separated assignments [datafusion-sqlparser-rs]

2025-03-08 Thread via GitHub
mvzink commented on code in PR #1757: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1757#discussion_r1986222154 ## src/ast/mod.rs: ## @@ -2947,6 +2947,17 @@ pub enum Statement { variables: OneOrManyWithParens, value: Vec, }, + +/// ```sq

Re: [PR] SET with a list of comma separated assignments [datafusion-sqlparser-rs]

2025-03-08 Thread via GitHub
mvzink commented on code in PR #1757: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1757#discussion_r1986222154 ## src/ast/mod.rs: ## @@ -2947,6 +2947,17 @@ pub enum Statement { variables: OneOrManyWithParens, value: Vec, }, + +/// ```sq

[PR] Implement tree explain for AggregateExec [datafusion]

2025-03-08 Thread via GitHub
zebsme opened a new pull request, #15103: URL: https://github.com/apache/datafusion/pull/15103 ## Which issue does this PR close? - Close #15024 - Part of #14914 ## What changes are included in this PR? 1. Implement AggregateExec 2. Update related tes

Re: [PR] SET with a list of comma separated assignments [datafusion-sqlparser-rs]

2025-03-08 Thread via GitHub
iffyio commented on code in PR #1757: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1757#discussion_r1986233456 ## src/ast/mod.rs: ## @@ -2947,6 +2947,17 @@ pub enum Statement { variables: OneOrManyWithParens, value: Vec, }, + +/// ```sq

Re: [PR] refactor: rm `single_distinct_to_groupby` optimizer pass [datafusion]

2025-03-08 Thread via GitHub
qazxcdswe123 closed pull request #15099: refactor: rm `single_distinct_to_groupby` optimizer pass URL: https://github.com/apache/datafusion/pull/15099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-03-08 Thread via GitHub
qazxcdswe123 commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2708702633 > Queries like `avg(distinct a)` rely on this rule, and without it, they cannot be executed anymore. > > The following query is available on the main branch, but not

[PR] Config: Add support default sql varchar to view types [datafusion]

2025-03-08 Thread via GitHub
zhuqi-lucas opened a new pull request, #15104: URL: https://github.com/apache/datafusion/pull/15104 ## Which issue does this PR close? This is the first step of our incremental work for this issue: https://github.com/apache/datafusion/issues/15096 ## Rationale for this ch

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-08 Thread via GitHub
zhuqi-lucas commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2708703567 - [ ] Config: Add support default sql varchar to view types Submitted a PR for the first step of our incremental work: https://github.com/apache/datafusion/pull/1510

Re: [PR] Expand wildcard to actual expressions in `prepare_select_exprs` [datafusion]

2025-03-08 Thread via GitHub
jayzhan211 commented on code in PR #15090: URL: https://github.com/apache/datafusion/pull/15090#discussion_r1986194642 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -985,13 +985,20 @@ drop table ambiguity_test; statement ok create table t(a0 int, a int, b int, c int)

Re: [PR] Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions [datafusion]

2025-03-08 Thread via GitHub
qazxcdswe123 commented on code in PR #15014: URL: https://github.com/apache/datafusion/pull/15014#discussion_r1986032404 ## datafusion/sql/src/expr/function.rs: ## @@ -349,6 +349,12 @@ impl SqlToRel<'_, S> { } else { // User defined aggregate functions (UDA

Re: [I] Implement tree rendering for StreamingTableExec [datafusion]

2025-03-08 Thread via GitHub
Standing-Man commented on issue #15086: URL: https://github.com/apache/datafusion/issues/15086#issuecomment-2708093826 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[I] Sort query won't get round-robin repartitioned if input is `MemTable` [datafusion]

2025-03-08 Thread via GitHub
2010YOUY01 opened a new issue, #15088: URL: https://github.com/apache/datafusion/issues/15088 ### Is your feature request related to a problem or challenge? Now sort executor and aggregate executor both support executing in parallel, and the degree of parallelism is specified in confi

[I] Implement `tree` explain for `JsonSink` [datafusion]

2025-03-08 Thread via GitHub
Shreyaskr1409 opened a new issue, #15089: URL: https://github.com/apache/datafusion/issues/15089 ### Is your feature request related to a problem or challenge? a part of #14914 ### Describe the solution you'd like No response ### Describe alternatives you've considered No resp

[I] stack overflow on `PhysicalPlanNode::try_from_physical_plan` [datafusion]

2025-03-08 Thread via GitHub
milenkovicm opened a new issue, #15087: URL: https://github.com/apache/datafusion/issues/15087 ### Describe the bug There might be a regression on v46 After updating to 46.0.0 there is a `stack overflow` calling `PhysicalPlanNode::try_from_physical_plan` on a relatively simple

Re: [I] Implement `tree` explain for `JsonSink` [datafusion]

2025-03-08 Thread via GitHub
Shreyaskr1409 commented on issue #15089: URL: https://github.com/apache/datafusion/issues/15089#issuecomment-2708158548 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Internal error: Non Panic Task error: task 113 was cancelled. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [datafu

2025-03-08 Thread via GitHub
alamb commented on issue #15065: URL: https://github.com/apache/datafusion/issues/15065#issuecomment-2708162324 > [@alamb](https://github.com/alamb) Hello, have you made any progress? Hi @chenquan -- I am not actively working on this issue and don't think I will have any plans to do

[PR] doc: Correct benchmark command [datafusion]

2025-03-08 Thread via GitHub
qazxcdswe123 opened a new pull request, #15094: URL: https://github.com/apache/datafusion/pull/15094 -o takes a filepath not a folder Otherwise it shows: `Error: IoError(Os { code: 21, kind: IsADirectory, message: "Is a directory" })` -- This is an automated message from the Apache

Re: [I] Internal error: Non Panic Task error: task 113 was cancelled. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [datafu

2025-03-08 Thread via GitHub
alamb commented on issue #15065: URL: https://github.com/apache/datafusion/issues/15065#issuecomment-2708201014 > [@alamb](https://github.com/alamb) Hello, thank you very much! In addition, I want to know how to control the cancellation of tasks by myself. Currently, I have observed that th

Re: [PR] Implement tree explain for PartialSortExec [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15066: URL: https://github.com/apache/datafusion/pull/15066#discussion_r1986065671 ## datafusion/physical-plan/src/sorts/partial_sort.rs: ## @@ -226,10 +226,15 @@ impl DisplayAs for PartialSortExec { None => write!(f, "PartialS

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-03-08 Thread via GitHub
tustvold commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2708240200 FWIW the new HttpClient abstraction, introduced in ObjectStore 0.12, provides a potentially nicer way to spawn IO on to a separate runtime - https://github.com/apache/arrow-rs/pull/

Re: [PR] fix: nested window function [datafusion]

2025-03-08 Thread via GitHub
chenkovsky commented on PR #15033: URL: https://github.com/apache/datafusion/pull/15033#issuecomment-2708243765 by the way, If I change tokio to single thread, there's also no stack overflow. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] feat: implement tree explain for ProjectionExec [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15082: URL: https://github.com/apache/datafusion/pull/15082#discussion_r1986071622 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -519,6 +519,150 @@ physical_plan 17)│ format: arrow │ 18)└───┘

[PR] chore: Add `native_iceberg_compat` CI checks [datafusion-comet]

2025-03-08 Thread via GitHub
andygrove opened a new pull request, #1487: URL: https://github.com/apache/datafusion-comet/pull/1487 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] refactor: rm `single_distinct_to_groupby` optimizer pass [datafusion]

2025-03-08 Thread via GitHub
jonahgao commented on code in PR #15099: URL: https://github.com/apache/datafusion/pull/15099#discussion_r1986094870 ## datafusion/optimizer/src/optimizer.rs: ## @@ -240,7 +239,6 @@ impl Optimizer { // Filters can't be pushed down past Limits, we should do PushDown

Re: [PR] fix: mark ScalarUDFImpl::invoke_batch as deprecated [datafusion]

2025-03-08 Thread via GitHub
Weijun-H merged PR #15049: URL: https://github.com/apache/datafusion/pull/15049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] implement tree explain for GlobalLimitExec [datafusion]

2025-03-08 Thread via GitHub
zjregee opened a new pull request, #15100: URL: https://github.com/apache/datafusion/pull/15100 ## Which issue does this PR close? - Closes #15026. ## Rationale for this change ## What changes are included in this PR? Implement tree explain for `GlobalLimitExec`. ## Are these c

Re: [PR] Enable Dataframe to be converted into views which can be used in register_table [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer commented on code in PR #1016: URL: https://github.com/apache/datafusion-python/pull/1016#discussion_r1983239116 ## src/dataframe.rs: ## @@ -50,6 +52,22 @@ use crate::{ expr::{sort_expr::PySortExpr, PyExpr}, }; +#[pyclass(name = "TableProvider", module = "data

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-03-08 Thread via GitHub
danila-b commented on code in PR #14331: URL: https://github.com/apache/datafusion/pull/14331#discussion_r1986110748 ## .github/workflows/extended.yml: ## @@ -33,16 +33,46 @@ on: push: branches: - main + issue_comment: +types: [created] + +permissions: + pul

Re: [I] Implement `tree` explain for `AggregateExec` [datafusion]

2025-03-08 Thread via GitHub
zebsme commented on issue #15024: URL: https://github.com/apache/datafusion/issues/15024#issuecomment-2708407584 hi @alamb , I notice that DuckDB provides full expressions for aggregations while only index for group_by ``` ┌─┴─┐ HASH_GROUP_BY │

[PR] Simplify Array related functions impl [datafusion-comet]

2025-03-08 Thread via GitHub
kazantsev-maksim opened a new pull request, #1490: URL: https://github.com/apache/datafusion-comet/pull/1490 ## Which issue does this PR close? Related to issue: https://github.com/apache/datafusion-comet/issues/1459 Closes #. Defined under Issue: https://github.com/apach

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-03-08 Thread via GitHub
qazxcdswe123 commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2708191051 > On optimizer side, I am not sure if `single_distinct_to_groupby` can really improve performance in current version (it is an old rule introduced in long long ago), maybe

Re: [I] Internal error: Non Panic Task error: task 113 was cancelled. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [datafu

2025-03-08 Thread via GitHub
chenquan commented on issue #15065: URL: https://github.com/apache/datafusion/issues/15065#issuecomment-2708191516 > > [@alamb](https://github.com/alamb) Hello, have you made any progress? > > Hi [@chenquan](https://github.com/chenquan) -- I am not actively working on this issue and d

Re: [PR] Refactor EnforceDistribution test cases to demonstrate dependencies across optimizer runs. [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15074: URL: https://github.com/apache/datafusion/pull/15074#discussion_r1986048918 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -442,40 +445,30 @@ impl TestConfig { self.config.execution.target_partitions = ta

[PR] minor: fix repo and homepage url in `cargo.toml` [datafusion-ballista]

2025-03-08 Thread via GitHub
milenkovicm opened a new pull request, #1196: URL: https://github.com/apache/datafusion-ballista/pull/1196 # Which issue does this PR close? Closes #1193. # Rationale for this change # What changes are included in this PR? # Are there any user-faci

Re: [I] Implement `tree` explain for `AggregateExec` [datafusion]

2025-03-08 Thread via GitHub
alamb commented on issue #15024: URL: https://github.com/apache/datafusion/issues/15024#issuecomment-2708194671 > > [@zebsme](https://github.com/zebsme) how is this going? I noticed you took this and [#15025](https://github.com/apache/datafusion/issues/15025) and [#15026](https://github.co

[I] Auto run docker containers needed for tests [datafusion]

2025-03-08 Thread via GitHub
blaginin opened a new issue, #15092: URL: https://github.com/apache/datafusion/issues/15092 ### Is your feature request related to a problem or challenge? https://github.com/apache/datafusion/pull/13672 adds a new CLI integration testing with external storage tests, using Minio. Curre

Re: [PR] feat: implement tree explain for ProjectionExec [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15082: URL: https://github.com/apache/datafusion/pull/15082#discussion_r1986051520 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -519,6 +519,150 @@ physical_plan 17)│ format: arrow │ 18)└───┘

Re: [I] Implement `tree` explain for `ValuesExec` [datafusion]

2025-03-08 Thread via GitHub
Shreyaskr1409 commented on issue #15093: URL: https://github.com/apache/datafusion/issues/15093#issuecomment-2708197604 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Implement `tree explain for `BoundedWindowAggExec` and `WindowAggExec` [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15084: URL: https://github.com/apache/datafusion/pull/15084#discussion_r1986064733 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -519,6 +519,36 @@ physical_plan 17)│ format: arrow │ 18)└───┘ +

Re: [PR] Implement `tree` explain for `HashJoinExec` [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15079: URL: https://github.com/apache/datafusion/pull/15079#discussion_r1986065410 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -519,6 +534,51 @@ physical_plan 17)│ format: arrow │ 18)└───┘ +

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-08 Thread via GitHub
alamb commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2708225278 Please add comments if you find other needed items / issues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Implement tree explain for `NestedLoopJoinExec`, `CrossJoinExec`, `So… [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15081: URL: https://github.com/apache/datafusion/pull/15081#discussion_r1986065534 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -370,8 +370,15 @@ impl DisplayAs for SortMergeJoinExec { ) }

Re: [PR] fix: Support datatype cast for insert api same as insert into sql [datafusion]

2025-03-08 Thread via GitHub
alamb commented on code in PR #15091: URL: https://github.com/apache/datafusion/pull/15091#discussion_r1986065791 ## datafusion/common/src/dfschema.rs: ## @@ -1034,7 +1034,7 @@ impl SchemaExt for Schema { .iter() .zip(other.fields().iter())

Re: [PR] Add `insta` / snapshot testing to CLI & set up AWS mock [datafusion]

2025-03-08 Thread via GitHub
blaginin commented on code in PR #13672: URL: https://github.com/apache/datafusion/pull/13672#discussion_r1984937074 ## datafusion-cli/CONTRIBUTING.md: ## @@ -0,0 +1,72 @@ + + +# Development instructions + +## Running Tests + +Tests can be run using `cargo` + +```shell +cargo te

Re: [PR] fix: mark ScalarUDFImpl::invoke_batch as deprecated [datafusion]

2025-03-08 Thread via GitHub
Weijun-H commented on PR #15049: URL: https://github.com/apache/datafusion/pull/15049#issuecomment-2708311998 Thanks @Blizzara and @alamb 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Deprecate `ScalarUDFImpl::invoke_batch` and move everything to `ScalarUDFImpl::invoke_with_args` [datafusion]

2025-03-08 Thread via GitHub
Weijun-H closed issue #13515: Deprecate `ScalarUDFImpl::invoke_batch` and move everything to `ScalarUDFImpl::invoke_with_args` URL: https://github.com/apache/datafusion/issues/13515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Experimental native scan test failures [datafusion-comet]

2025-03-08 Thread via GitHub
andygrove commented on issue #1441: URL: https://github.com/apache/datafusion-comet/issues/1441#issuecomment-2708452642 One more for the list: https://github.com/apache/datafusion-comet/issues/1488 -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] build(deps): bump datafusion-substrait from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-08 Thread via GitHub
dependabot[bot] opened a new pull request, #1048: URL: https://github.com/apache/datafusion-python/pull/1048 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 45.0.0 to 46.0.0. Commits https://github.com/apache/datafusion/commit/d5ca8307940c1a6345419a2c8d9

[PR] build(deps): bump datafusion from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-08 Thread via GitHub
dependabot[bot] opened a new pull request, #1051: URL: https://github.com/apache/datafusion-python/pull/1051 Bumps [datafusion](https://github.com/apache/datafusion) from 45.0.0 to 46.0.0. Commits https://github.com/apache/datafusion/commit/d5ca8307940c1a6345419a2c8d91ef8770465

[PR] build(deps): bump datafusion-proto from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-08 Thread via GitHub
dependabot[bot] opened a new pull request, #1049: URL: https://github.com/apache/datafusion-python/pull/1049 Bumps [datafusion-proto](https://github.com/apache/datafusion) from 45.0.0 to 46.0.0. Commits https://github.com/apache/datafusion/commit/d5ca8307940c1a6345419a2c8d91ef8

[PR] build(deps): bump tokio from 1.43.0 to 1.44.0 [datafusion-python]

2025-03-08 Thread via GitHub
dependabot[bot] opened a new pull request, #1047: URL: https://github.com/apache/datafusion-python/pull/1047 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.43.0 to 1.44.0. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio

[PR] build(deps): bump async-trait from 0.1.86 to 0.1.87 [datafusion-python]

2025-03-08 Thread via GitHub
dependabot[bot] opened a new pull request, #1046: URL: https://github.com/apache/datafusion-python/pull/1046 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.86 to 0.1.87. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's

Re: [PR] Fail on optimization cycles [datafusion]

2025-03-08 Thread via GitHub
github-actions[bot] commented on PR #11288: URL: https://github.com/apache/datafusion/pull/11288#issuecomment-2705347408 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Implement `tree` explain for `AggregateExec` [datafusion]

2025-03-08 Thread via GitHub
zebsme commented on issue #15024: URL: https://github.com/apache/datafusion/issues/15024#issuecomment-2708135023 > @zebsme how is this going? I noticed you took this and https://github.com/apache/datafusion/issues/15025 and https://github.com/apache/datafusion/issues/15026 > > If yo

Re: [PR] Expand wildcard to actual expressions in `prepare_select_exprs` [datafusion]

2025-03-08 Thread via GitHub
jayzhan211 commented on code in PR #15090: URL: https://github.com/apache/datafusion/pull/15090#discussion_r1986038458 ## datafusion/sql/tests/sql_integration.rs: ## @@ -54,15 +54,6 @@ use sqlparser::dialect::{Dialect, GenericDialect, HiveDialect, MySqlDialect}; mod cases; mo

Re: [PR] doc: update RecordBatchReceiverStreamBuilder::spawn_blocking task behaviour [datafusion]

2025-03-08 Thread via GitHub
alamb merged PR #14995: URL: https://github.com/apache/datafusion/pull/14995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] doc: update RecordBatchReceiverStreamBuilder::spawn_blocking task behaviour [datafusion]

2025-03-08 Thread via GitHub
alamb commented on PR #14995: URL: https://github.com/apache/datafusion/pull/14995#issuecomment-2708164015 Thanks again @shruti2522 and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] doc: RecordBatchReceiverStreamBuilder::spawn_blocking does not abort threads [datafusion]

2025-03-08 Thread via GitHub
alamb closed issue #9152: doc: RecordBatchReceiverStreamBuilder::spawn_blocking does not abort threads URL: https://github.com/apache/datafusion/issues/9152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] chore: update datafusion to 46 [datafusion-ballista]

2025-03-08 Thread via GitHub
milenkovicm opened a new pull request, #1201: URL: https://github.com/apache/datafusion-ballista/pull/1201 # Which issue does this PR close? Closes #1164 # Rationale for this change regular update to latest datafusion which brings support for `INSERT INTO` # W

Re: [I] Auto run docker containers needed for tests [datafusion]

2025-03-08 Thread via GitHub
alamb commented on issue #15092: URL: https://github.com/apache/datafusion/issues/15092#issuecomment-2708204616 I think @Omega359 did something similar here with sqllogictests for starting postgres: https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/bin/postgres_c

Re: [PR] Add `insta` / snapshot testing to CLI & set up AWS mock [datafusion]

2025-03-08 Thread via GitHub
blaginin commented on code in PR #13672: URL: https://github.com/apache/datafusion/pull/13672#discussion_r1986051734 ## datafusion-cli/CONTRIBUTING.md: ## @@ -0,0 +1,75 @@ + + +# Development instructions + +## Running Tests + +Tests can be run using `cargo` + +```shell +cargo te

Re: [PR] Add DataFrame fill_nan/fill_null [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer commented on PR #1019: URL: https://github.com/apache/datafusion-python/pull/1019#issuecomment-2708285597 Hi @kosiew I moved this to draft since it looks like you're doing a good job on the upstream work which would change how we would want to handle this. -- This is an automat

Re: [I] Implement `tree` explain for `ValuesExec` [datafusion]

2025-03-08 Thread via GitHub
irenjj commented on issue #15093: URL: https://github.com/apache/datafusion/issues/15093#issuecomment-2708287445 > #14032 Hi, @Shreyaskr1409 , maybe we don't need to implement the explain format since it's deprecated. cc @jonathanc-n . I've noticed you've picked other tasks, maybe

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer commented on PR #982: URL: https://github.com/apache/datafusion-python/pull/982#issuecomment-2708291557 I added a few lines to the documentation, rebased, and applied updated ruff formatting. -- This is an automated message from the Apache Git Service. To respond to the message

[I] Expose global context [datafusion-python]

2025-03-08 Thread via GitHub
timsaucer opened a new issue, #1045: URL: https://github.com/apache/datafusion-python/issues/1045 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** #982 added convenience functions to use a global context for operations like `re

  1   2   >