[PR] Minor: use FileScanConfig builder API in some tests [datafusion]

2025-03-01 Thread via GitHub
alamb opened a new pull request, #14938: URL: https://github.com/apache/datafusion/pull/14938 - Draft as it builds on https://github.com/apache/datafusion/pull/14685 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/14685 ## R

Re: [PR] Parse SET NAMES syntax in Postgres [datafusion-sqlparser-rs]

2025-03-01 Thread via GitHub
iffyio merged PR #1752: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Should pruningpredicate coerce? [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14944: URL: https://github.com/apache/datafusion/issues/14944#issuecomment-2691336051 I think the conclusion is that PruningPredicaate expects a properly coerced expression already You can use `SessionContext::create_physical_expr` to automatically perform c

Re: [PR] chore: forbide `with_default_features` override existing information [datafusion]

2025-03-01 Thread via GitHub
milenkovicm commented on PR #14935: URL: https://github.com/apache/datafusion/pull/14935#issuecomment-2692244166 I just wonder if we should provide `with_default_features` or `Default::default` implementation for this case as well? wdyt @alamb ? -- This is an automated message from the Ap

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
logan-keede commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976406045 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the Tab

Re: [I] Should pruningpredicate coerce? [datafusion]

2025-03-01 Thread via GitHub
ion-elgreco commented on issue #14944: URL: https://github.com/apache/datafusion/issues/14944#issuecomment-2691341269 @alamb this is actually already done like that, but it still doesn't prune properly: `.map(|expr| context.create_physical_expr(expr, &df_schema).unwrap())

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
logan-keede commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976494633 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the Tab

[PR] build(deps): bump uuid from 1.13.1 to 1.15.1 [datafusion-python]

2025-03-01 Thread via GitHub
dependabot[bot] opened a new pull request, #1039: URL: https://github.com/apache/datafusion-python/pull/1039 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.13.1 to 1.15.1. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.15.1

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on PR #14922: URL: https://github.com/apache/datafusion/pull/14922#issuecomment-2692495354 Testing result: ```rust > CREATE EXTERNAL TABLE IF NOT EXISTS lineitem ( l_orderkey BIGINT, l_partkey BIGINT, l_suppkey BIGINT,

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1976514610 ## datafusion-examples/examples/dataframe.rs: ## @@ -59,7 +59,8 @@ use tempfile::tempdir; #[tokio::main] async fn main() -> Result<()> { // The SessionC

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1976514485 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2400,7 +2400,8 @@ async fn write_json_with_order() -> Result<()> { #[tokio::test] async fn write_table_with

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1976514971 ## datafusion/sqllogictest/test_files/insert_to_external.slt: ## @@ -456,13 +456,16 @@ explain insert into table_without_values select c1 from aggregate_test_

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1976515014 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -144,7 +144,7 @@ EXPLAIN select b from t_pushdown where a = 'bar' order by b; l

Re: [PR] Slightly faster keyword lookups [datafusion-sqlparser-rs]

2025-03-01 Thread via GitHub
github-actions[bot] closed pull request #1591: Slightly faster keyword lookups URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
AdamGS commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976488608 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the TableOpt

Re: [PR] Do not swap with projection when file is partitioned [datafusion]

2025-03-01 Thread via GitHub
blaginin commented on code in PR #14956: URL: https://github.com/apache/datafusion/pull/14956#discussion_r1976506356 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -266,7 +266,10 @@ impl DataSource for FileScanConfig { ) -> Result>> { // If there is any no

Re: [PR] Allow setting the recursion limit for sql parsing [datafusion]

2025-03-01 Thread via GitHub
alamb merged PR #14756: URL: https://github.com/apache/datafusion/pull/14756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Statistics: Migrate to `Distribution` from `Precision` [datafusion]

2025-03-01 Thread via GitHub
clflushopt commented on issue #14896: URL: https://github.com/apache/datafusion/issues/14896#issuecomment-2692564535 Following our discussion from before and since I recently documented some parts of the old statistics API I would be happy to take this if it's not taken cc @alamb for vis.

[PR] fix(docs+minor): set the proper link for dev-env setup in contrib guide [datafusion]

2025-03-01 Thread via GitHub
clflushopt opened a new pull request, #14960: URL: https://github.com/apache/datafusion/pull/14960 ## Which issue does this PR close? I was setting up a new workstation and while checking the docs I was redirected to the user guide instead of the development environment page.

Re: [PR] fix(docs+minor): set the proper link for dev-env setup in contrib guide [datafusion]

2025-03-01 Thread via GitHub
clflushopt commented on PR #14960: URL: https://github.com/apache/datafusion/pull/14960#issuecomment-2692567333 cc @dentiny @alamb minor patch following #14694 and #14890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2025-03-01 Thread via GitHub
zero-element commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2692570732 I agree with @faucct that this approach could be beneficial for building Online ML Feature Store engines. This is because such systems often involve numerous element-wise cal

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1976515014 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -144,7 +144,7 @@ EXPLAIN select b from t_pushdown where a = 'bar' order by b; l

Re: [PR] Find keywords using perfect hashing [datafusion-sqlparser-rs]

2025-03-01 Thread via GitHub
github-actions[bot] closed pull request #1590: Find keywords using perfect hashing URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Add Spark-like scheduling mode with shuffle files between stages [datafusion-ray]

2025-03-01 Thread via GitHub
andygrove commented on issue #69: URL: https://github.com/apache/datafusion-ray/issues/69#issuecomment-2692500974 > I think, ideally, this new mode of execution diverges as little as possible from the existing mode of execution and we can use the exact same physical plans in both cases.

Re: [I] Planning error for compound expressions involving window functions [datafusion]

2025-03-01 Thread via GitHub
2010YOUY01 commented on issue #14910: URL: https://github.com/apache/datafusion/issues/14910#issuecomment-2692543687 > I dont think I can fix this lol, I tried tweaking `rebase_expr` and `check_columns_satisfy_exprs` in `datafusion/sql/src/utils.rs` but all failed. The planner is little too

Re: [I] datafusion-cli regression: explain plan output looks bad (error rendering multi-lines) [datafusion]

2025-03-01 Thread via GitHub
xudong963 closed issue #14947: datafusion-cli regression: explain plan output looks bad (error rendering multi-lines) URL: https://github.com/apache/datafusion/issues/14947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Remove redundant statistics from FileScanConfig [datafusion]

2025-03-01 Thread via GitHub
Standing-Man opened a new pull request, #14955: URL: https://github.com/apache/datafusion/pull/14955 ## Which issue does this PR close? - Closes #14937. ## Rationale for this change Both `FileScanConfig` and `DataSource` has same statistics, it make that stat

Re: [PR] Minor: improve documentation of `AggregateMode` [datafusion]

2025-03-01 Thread via GitHub
2010YOUY01 commented on code in PR #14946: URL: https://github.com/apache/datafusion/pull/14946#discussion_r1976265670 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -57,41 +57,53 @@ mod row_hash; mod topk; mod topk_stream; -/// Hash aggregate modes +/// Aggregatio

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-03-01 Thread via GitHub
Weijun-H commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2692083297 I actually anticipate higher memory consumption, particularly in systems where the upstream portion of RepartitionExec generates results faster than the downstream component process

Re: [PR] Add Upgrade Guide for DataFusion 46.0.0 [datafusion]

2025-03-01 Thread via GitHub
xudong963 commented on PR #14891: URL: https://github.com/apache/datafusion/pull/14891#issuecomment-2692102278 After the PR is merged, I'll update changlog PR and make the branch-46. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Support SQL pipe operator [datafusion]

2025-03-01 Thread via GitHub
simonvandel commented on issue #14660: URL: https://github.com/apache/datafusion/issues/14660#issuecomment-2692109389 Here's a short video presentation by a Google engineer https://www.hytradboi.com/2025/f8582cd3-1e39-43a8-8749-46817b2910cf-pipe-syntax-in-sql-its-time -- This is an automa

Re: [I] Should pruningpredicate coerce? [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14944: URL: https://github.com/apache/datafusion/issues/14944#issuecomment-2691384188 🤔 What are the coerced expressions? If it is like this: ``` cast(month_id, 'utf8') = '202502' ``` That is not going to prune because the cast is happeni

Re: [PR] re-add support for nested comments in mssql [datafusion-sqlparser-rs]

2025-03-01 Thread via GitHub
iffyio merged PR #1754: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Feat: support array_compact function [datafusion-comet]

2025-03-01 Thread via GitHub
kazuyukitanimura commented on PR #1321: URL: https://github.com/apache/datafusion-comet/pull/1321#issuecomment-2691343558 Looks like there are formatting issues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Remove invalid bug reproducer. [datafusion]

2025-03-01 Thread via GitHub
alamb merged PR #14950: URL: https://github.com/apache/datafusion/pull/14950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Bug: Fix multi-lines printing issue for datafusion-cli [datafusion]

2025-03-01 Thread via GitHub
zhuqi-lucas commented on PR #14954: URL: https://github.com/apache/datafusion/pull/14954#issuecomment-2692165153 Updated testing result: ```rust DataFusion CLI v45.0.0 > create table foo(x int, y int) as values (1,2), (3,4); 0 row(s) fetched. Elapsed 0.026 seconds. > exp

Re: [I] Planning error for compound expressions involving window functions [datafusion]

2025-03-01 Thread via GitHub
qazxcdswe123 commented on issue #14910: URL: https://github.com/apache/datafusion/issues/14910#issuecomment-2692166597 untake -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Planning error for compound expressions involving window functions [datafusion]

2025-03-01 Thread via GitHub
qazxcdswe123 commented on issue #14910: URL: https://github.com/apache/datafusion/issues/14910#issuecomment-2692166110 I dont think I can fix this lol, I tried tweaking `rebase_expr` and `check_columns_satisfy_exprs` in `datafusion/sql/src/utils.rs` but all failed. The planner is little too

Re: [I] March 2025 ASF Board Report [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #13713: URL: https://github.com/apache/datafusion/issues/13713#issuecomment-2692172346 Here is a google doc to coordinate board reporting: https://docs.google.com/document/d/11b2GEmPh5gblWWegeZi3G38e97vRqHSRElkLTwZHrjY/edit?tab=t.0 Please feel free to post com

Re: [I] March 2025 ASF Board Report [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #13713: URL: https://github.com/apache/datafusion/issues/13713#issuecomment-2692172863 Mailing list announcement: https://lists.apache.org/thread/7g8b66wdhpdj9tn77ptzy2790bj3l47d -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-01 Thread via GitHub
geoffreyclaude commented on code in PR #14547: URL: https://github.com/apache/datafusion/pull/14547#discussion_r1976361324 ## datafusion/common-runtime/src/join_set.rs: ## @@ -0,0 +1,207 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] Should pruningpredicate coerce? [datafusion]

2025-03-01 Thread via GitHub
ion-elgreco commented on issue #14944: URL: https://github.com/apache/datafusion/issues/14944#issuecomment-2692138942 It's like cast(month_id, 'utf8') = '202502', see below: ``` [crates/core/src/delta_datafusion/mod.rs:551:9] self.filter.clone() = Some( BinaryExpr(

[PR] Do not swap with projection when file is partitioned [datafusion]

2025-03-01 Thread via GitHub
blaginin opened a new pull request, #14956: URL: https://github.com/apache/datafusion/pull/14956 ## Which issue does this PR close? Related to https://github.com/delta-io/delta-rs/pull/3261#issuecomment-2691373678 ## Rationale for this change I feel like there s

Re: [PR] Do not swap with projection when file is partitioned [datafusion]

2025-03-01 Thread via GitHub
blaginin commented on PR #14956: URL: https://github.com/apache/datafusion/pull/14956#issuecomment-2692137880 ``` uv run pytest == 472 passed, 4 skipped, 47 deselected, 90 warnings in 23.16s =

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-03-01 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975887998 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

[I] index out of bounds: the len is 2 but the index is 2 in some data sources [datafusion]

2025-03-01 Thread via GitHub
alamb opened a new issue, #14957: URL: https://github.com/apache/datafusion/issues/14957 ### Describe the bug As part of testing DataFusion 46.0.0 on Delta.rs in - https://github.com/delta-io/delta-rs/pull/3261 Here is an example failure: https://github.com/delta-io/d

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2692195887 > +1 for marking `Expr::Wildcard` as deprecated. > > > Expr::Wildcard is still used in other cases not only count(*). We only remove the count wildcard case, so we can't dep

Re: [I] index out of bounds: the len is 2 but the index is 2 in some data sources [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14957: URL: https://github.com/apache/datafusion/issues/14957#issuecomment-2692196135 @blaginin has a proposed fix for this: - https://github.com/apache/datafusion/pull/14956 ❤ -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-03-01 Thread via GitHub
Garamda commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1976295147 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -51,29 +52,43 @@ create_func!(ApproxPercentileCont, approx_percentile_cont_udaf); /// Co

Re: [PR] Minor: improve documentation of `AggregateMode` [datafusion]

2025-03-01 Thread via GitHub
alamb commented on code in PR #14946: URL: https://github.com/apache/datafusion/pull/14946#discussion_r1976405417 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -57,41 +57,53 @@ mod row_hash; mod topk; mod topk_stream; -/// Hash aggregate modes +/// Aggregation mod

Re: [I] Further improve datafusion-cli memory usage if we setting huge number for maxrow size. [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14810: URL: https://github.com/apache/datafusion/issues/14810#issuecomment-2692197186 We had to revert this change temporarily to get the 46 release out - https://github.com/apache/datafusion/pull/14948 So reopening the PR -- This is an automated message

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2692196440 We found one more issue while testing 46 in delta-rs: - https://github.com/apache/datafusion/issues/14957 @blaginin has a PR up to address it: - https://github.com/apac

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-01 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2692260440 @alamb Make sense, made the branch: https://github.com/apache/datafusion/tree/branch-46 -- This is an automated message from the Apache Git Service. To respond to the messag

[I] Table function supports non-literal args [datafusion]

2025-03-01 Thread via GitHub
jonahgao opened a new issue, #14958: URL: https://github.com/apache/datafusion/issues/14958 ### Is your feature request related to a problem or challenge? Currently, table functions like `range` only support literal arguments. ```sh DataFusion CLI v45.0.0 > select * from range

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-03-01 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1976451999 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1327,19 +1327,37 @@ pub async fn from_read_rel( table_ref: TableReference, schema:

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-03-01 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1976452057 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1327,19 +1327,37 @@ pub async fn from_read_rel( table_ref: TableReference, schema:

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-03-01 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1976065383 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-03-01 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1976452084 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1327,19 +1327,37 @@ pub async fn from_read_rel( table_ref: TableReference, schema:

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-03-01 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1976452271 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_r

Re: [PR] Scrollable python notebook table rendering [datafusion-python]

2025-03-01 Thread via GitHub
timsaucer commented on PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#issuecomment-2692325079 I need to update the unit test. I don't think the just validating it hasn't changed is exactly what we need, so I may think about it some more. -- This is an automated mess

Re: [PR] Revert Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-03-01 Thread via GitHub
alamb commented on PR #14948: URL: https://github.com/apache/datafusion/pull/14948#issuecomment-2692198056 > Thank you @alamb , it makes sense to me. > > So next step is, after we releasing 46.0.0, we revert this PR. > > And i created a follow-up ticket to fix the remaining mult

Re: [PR] Do not swap with projection when file is partitioned [datafusion]

2025-03-01 Thread via GitHub
alamb commented on code in PR #14956: URL: https://github.com/apache/datafusion/pull/14956#discussion_r1976407254 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -266,7 +266,10 @@ impl DataSource for FileScanConfig { ) -> Result>> { // If there is any non-c

Re: [I] March 2025 ASF Board Report [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #13713: URL: https://github.com/apache/datafusion/issues/13713#issuecomment-2692172475 FYI @iffyio and @robtandy in case you wanted to suggest any additions for datafusion ray / sqlparser -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Bug: Fix multi-lines printing issue for datafusion-cli [datafusion]

2025-03-01 Thread via GitHub
alamb commented on PR #14954: URL: https://github.com/apache/datafusion/pull/14954#issuecomment-2692197495 Thanks @zhuqi-lucas -- I'll try and check this out soon (likely tomorrow) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-03-01 Thread via GitHub
alamb commented on PR #14919: URL: https://github.com/apache/datafusion/pull/14919#issuecomment-2692197860 Reverted in - https://github.com/apache/datafusion/pull/14950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Prepare for 46.0.0 release: Version and Changelog [datafusion]

2025-03-01 Thread via GitHub
alamb merged PR #14903: URL: https://github.com/apache/datafusion/pull/14903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-01 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2692200337 @xudong963 I just merged - https://github.com/apache/datafusion/pull/14903 Can you now make a `branch-46` branch? That will then allow us to merge additiona

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-03-01 Thread via GitHub
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2692199949 The PR is out, but I suspect it's too big. I tried to get it in the 46 release, but now that I probably missed that, @alamb (or any other reviewers) would you prefer I split it u

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
logan-keede commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976405859 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the Tab

Re: [PR] chore: commit `Cargo.lock` file to make builds more predictable [datafusion-ballista]

2025-03-01 Thread via GitHub
andygrove merged PR #1190: URL: https://github.com/apache/datafusion-ballista/pull/1190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] chore: Update changelog for 44.0.0 [datafusion-ballista]

2025-03-01 Thread via GitHub
andygrove opened a new pull request, #1191: URL: https://github.com/apache/datafusion-ballista/pull/1191 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing cha

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-03-01 Thread via GitHub
berkaysynnada commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2692276290 I got a bit off-topic, but we will focus on this promising work to drive it to completion in the coming week, together with @mertak-synnada. -- This is an automated message f

[PR] chore: minor release script fix [datafusion-ballista]

2025-03-01 Thread via GitHub
andygrove opened a new pull request, #1192: URL: https://github.com/apache/datafusion-ballista/pull/1192 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing cha

Re: [PR] chore: minor release script fix [datafusion-ballista]

2025-03-01 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-ballista/pull/1192#discussion_r1976437957 ## dev/release/create-tarball.sh: ## @@ -53,11 +53,6 @@ if [ "$#" -ne 2 ]; then exit fi -if [[ -z "${GH_TOKEN}" ]]; then -echo "Please set pe

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-01 Thread via GitHub
linhr commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2692282881 > Can someone make a PR soon? We are in the final phases of getting ready for 46 release I've created a PR to deprecate `Expr::Wildcard`: #14959. -- This is an automated

[PR] Deprecate `Expr::Wildcard` [datafusion]

2025-03-01 Thread via GitHub
linhr opened a new pull request, #14959: URL: https://github.com/apache/datafusion/pull/14959 ## Which issue does this PR close? N/A ## Rationale for this change This is discussed as part of #14123. ## What changes are included in this PR? `Expr::Wildcard` i

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-03-01 Thread via GitHub
berkaysynnada commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1976434591 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Re

[PR] Scrollable python notebook table rendering [datafusion-python]

2025-03-01 Thread via GitHub
timsaucer opened a new pull request, #1036: URL: https://github.com/apache/datafusion-python/pull/1036 # Which issue does this PR close? None. # Rationale for this change The notebook rendering of DataFrames is very useful, but it can be enhanced. This PR adds quality o

Re: [I] Fix scalability limitations of current implementation [datafusion-ray]

2025-03-01 Thread via GitHub
andygrove commented on issue #46: URL: https://github.com/apache/datafusion-ray/issues/46#issuecomment-2692353922 I think that we can close this particular issue now. I am able to run benchmarks in a multi node setup. There is still more scalability work to do, but we can file specific issu

Re: [I] Fix scalability limitations of current implementation [datafusion-ray]

2025-03-01 Thread via GitHub
andygrove closed issue #46: Fix scalability limitations of current implementation URL: https://github.com/apache/datafusion-ray/issues/46 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[I] Improve error handling [datafusion-ray]

2025-03-01 Thread via GitHub
andygrove opened a new issue, #68: URL: https://github.com/apache/datafusion-ray/issues/68 Sometimes I run into resource issues and queries fail. This is to be expected, but it would be nice if we could provide a more useful error message on the client. Example error: ``` E

[I] Add Spark-like scheduling mode with shuffle files between stages [datafusion-ray]

2025-03-01 Thread via GitHub
andygrove opened a new issue, #69: URL: https://github.com/apache/datafusion-ray/issues/69 DFRay currently creates "stage processors" for each stage in a query plan and eagerly executes queries. This is good for low-latency use cases but can require a large amount of memory. I would

Re: [I] March 2025 ASF Board Report [datafusion]

2025-03-01 Thread via GitHub
kevinjqliu commented on issue #13713: URL: https://github.com/apache/datafusion/issues/13713#issuecomment-2692377147 I feel like there are a few items from the [blog post](https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0/) that would be great to include in the report.

Re: [PR] Revert Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-03-01 Thread via GitHub
xudong963 merged PR #14948: URL: https://github.com/apache/datafusion/pull/14948 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] build(deps): bump pyo3 from 0.23.4 to 0.23.5 [datafusion-python]

2025-03-01 Thread via GitHub
dependabot[bot] opened a new pull request, #1037: URL: https://github.com/apache/datafusion-python/pull/1037 Bumps [pyo3](https://github.com/pyo3/pyo3) from 0.23.4 to 0.23.5. Release notes Sourced from https://github.com/pyo3/pyo3/releases";>pyo3's releases. PyO3 0.23.5 T

[PR] Repo Housekeeping, updating naming remove unused files [datafusion-ray]

2025-03-01 Thread via GitHub
robtandy opened a new pull request, #70: URL: https://github.com/apache/datafusion-ray/pull/70 This PR updates names in the repo to be more consistent and also to remove ambiguity. Classes are prefixed with `DFRay` to differentiate from `Ray`. Most importantly, `RayStage` is re

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
logan-keede commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976486683 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the Tab

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
AdamGS commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976481607 ## datafusion/core/Cargo.toml: ## @@ -54,14 +60,15 @@ default = [ "string_expressions", "unicode_expressions", "compression", +"avro", Review Co

Re: [I] Add Spark-like scheduling mode with shuffle files between stages [datafusion-ray]

2025-03-01 Thread via GitHub
robtandy commented on issue #69: URL: https://github.com/apache/datafusion-ray/issues/69#issuecomment-2692393525 This makes sense to me. DFRay as written tries to lean into low latency execution at all costs, though it does look like it will provide a lot of utility for longer running dis

Re: [I] Add Spark-like scheduling mode with shuffle files between stages [datafusion-ray]

2025-03-01 Thread via GitHub
robtandy commented on issue #69: URL: https://github.com/apache/datafusion-ray/issues/69#issuecomment-2692394032 This would allow for a user to specify an exact number of `DFRayProcessors` to allocate and the stages would execute on those as available, otherwise they wait. -- This is an

Re: [PR] build(deps): bump uuid from 1.13.1 to 1.14.0 [datafusion-python]

2025-03-01 Thread via GitHub
dependabot[bot] closed pull request #1034: build(deps): bump uuid from 1.13.1 to 1.14.0 URL: https://github.com/apache/datafusion-python/pull/1034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
AdamGS commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976484614 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the TableOpt

Re: [PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-03-01 Thread via GitHub
AdamGS commented on code in PR #14951: URL: https://github.com/apache/datafusion/pull/14951#discussion_r1976481579 ## datafusion/core/src/execution/session_state.rs: ## @@ -832,17 +836,16 @@ impl SessionState { self.config.options() } -/// return the TableOpt

Re: [PR] build(deps): bump uuid from 1.13.1 to 1.14.0 [datafusion-python]

2025-03-01 Thread via GitHub
dependabot[bot] commented on PR #1034: URL: https://github.com/apache/datafusion-python/pull/1034#issuecomment-2692384363 Superseded by #1039. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] build(deps): bump arrow from 54.2.0 to 54.2.1 [datafusion-python]

2025-03-01 Thread via GitHub
dependabot[bot] opened a new pull request, #1038: URL: https://github.com/apache/datafusion-python/pull/1038 Bumps [arrow](https://github.com/apache/arrow-rs) from 54.2.0 to 54.2.1. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow's releases. ar

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-03-01 Thread via GitHub
TheBuilderJR commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2692420607 @alamb I should have some free cycles soon to do this. Any chance you can give me some code points or reference PRs that would help with implementation? Thanks in advance!