Re: [PR] Refactor signatures for lpad, rpad, left, and right [datafusion]

2025-02-22 Thread via GitHub
github-actions[bot] closed pull request #13420: Refactor signatures for lpad, rpad, left, and right URL: https://github.com/apache/datafusion/pull/13420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Ignore escaped LIKE wildcards in MySQL [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
mvzink commented on code in PR #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1966627492 ## tests/sqlparser_mysql.rs: ## @@ -2530,6 +2530,16 @@ fn parse_rlike_and_regexp() { } } +#[test] +fn parse_like_with_escape() { +mysql().ve

Re: [PR] Ignore escaped LIKE wildcards in MySQL [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
mvzink commented on code in PR #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1966627492 ## tests/sqlparser_mysql.rs: ## @@ -2530,6 +2530,16 @@ fn parse_rlike_and_regexp() { } } +#[test] +fn parse_like_with_escape() { +mysql().ve

[PR] Minor fixes to README [datafusion-ray]

2025-02-22 Thread via GitHub
vmingchen opened a new pull request, #64: URL: https://github.com/apache/datafusion-ray/pull/64 1. Remove a duplicate sentence. 2. Replace backquote of the sql argument with single quote (backquote in bash is for command substitution). 3. Since https://github.com/apache/datafusion-ray/

Re: [I] Cargo bench is eating up all the memory [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 commented on issue #14833: URL: https://github.com/apache/datafusion/issues/14833#issuecomment-2676556435 I see. The problem comes out when I run with `benches`, should be `--bench` -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Cargo bench is eating up all the memory [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 closed issue #14833: Cargo bench is eating up all the memory URL: https://github.com/apache/datafusion/issues/14833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14657: URL: https://github.com/apache/datafusion/pull/14657#discussion_r1966506454 ## docs/source/user-guide/crate-configuration.md: ## @@ -25,7 +25,47 @@ control DataFusion's behavior. [configuration settings]: configs.md -## Add latest non p

Re: [PR] Revert "fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers (#14223)" [datafusion]

2025-02-22 Thread via GitHub
alamb closed pull request #14292: Revert "fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers (#14223)" URL: https://github.com/apache/datafusion/pull/14292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] WIP: Update to `arrow`/`parquet` 54.2.0 [datafusion]

2025-02-22 Thread via GitHub
alamb closed pull request #14628: WIP: Update to `arrow`/`parquet` 54.2.0 URL: https://github.com/apache/datafusion/pull/14628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] WIP: Update to `arrow`/`parquet` 54.2.0 [datafusion]

2025-02-22 Thread via GitHub
alamb commented on PR #14628: URL: https://github.com/apache/datafusion/pull/14628#issuecomment-2676177352 Superceded by actual upgrades. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966506841 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966507366 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-22 Thread via GitHub
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2676158047 @logan-keede as part of moving avro functionality into `datafusion-datasource-avro`, WDYT about putting all of it behind a feature flag, the same way `parquet` is now? Seems like

Re: [PR] Revert "fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers (#14223)" [datafusion]

2025-02-22 Thread via GitHub
alamb commented on PR #14292: URL: https://github.com/apache/datafusion/pull/14292#issuecomment-2676177582 I think we decided we don't need to do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] refactor: make SqlToRel::new derive the parser options from the context provider [datafusion]

2025-02-22 Thread via GitHub
niebayes opened a new pull request, #14822: URL: https://github.com/apache/datafusion/pull/14822 ## Which issue does this PR close? - Closes #13700. ## Rationale for this change ## What changes are included in this PR? Previously, the `SqlTo

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
iffyio commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2676137037 Sorry im not sure I understood the question, could you clarify with a sql example if that's possible? -- This is an automated message from the Apache Git Service. To res

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-22 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1966451587 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -2203,7 +2203,7 @@ fn repartition_transitively_past_sort_with_projection() -> Result<()>

Re: [I] [DISCUSSION] Lowering the barrier to new users (Lessons from-799 CMU Optimizer Class) [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2676455981 I filed a ticket about substrait from Calcite not working here: - https://github.com/apache/datafusion/issues/14831 -- This is an automated message from the Apache Git Service

[I] Substrait may not respect identifier normalization flags [datafusion]

2025-02-22 Thread via GitHub
alamb opened a new issue, #14832: URL: https://github.com/apache/datafusion/issues/14832 ### Describe the bug This is a report from @lmwnshn and is part of - https://github.com/apache/datafusion/issues/14373 enable_ident_normalization: I think there may be some extra complic

Re: [I] [DISCUSSION] Lowering the barrier to new users (Lessons from-799 CMU Optimizer Class) [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2676456576 I also filed a ticket to track a potential issue with ident normalization - https://github.com/apache/datafusion/issues/14832 -- This is an automated message from the Apache G

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-22 Thread via GitHub
zhuqi-lucas commented on code in PR #14823: URL: https://github.com/apache/datafusion/pull/14823#discussion_r1966670496 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -446,21 +511,9 @@ impl ExternalSorter { None => { let sorted_size =

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-22 Thread via GitHub
dentiny commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1966653578 ## docs/source/contributor-guide/index.md: ## @@ -216,3 +216,23 @@ The good thing about open code and open development is that any issues in one ch Pull reques

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-22 Thread via GitHub
dentiny commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1966652960 ## docs/source/contributor-guide/development_environment.md: ## @@ -37,6 +37,22 @@ developing DataFusion in an isolated environment either locally or remote if de

[I] Cargo bench eating up all the memory [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 opened a new issue, #14833: URL: https://github.com/apache/datafusion/issues/14833 ### Describe the bug When I run cargo bench command, 20+GB mem is consumed and not able to run in my macbook. It is not the case before. ### To Reproduce Run these commands `

Re: [I] substrait generated by Apache Calcite does not run in DataFusion [datafusion]

2025-02-22 Thread via GitHub
lmwnshn commented on issue #14831: URL: https://github.com/apache/datafusion/issues/14831#issuecomment-2676528540 I think people can look directly at Substrait's consumer-testing repo for DataFusion if they want to fix this :) My repo is just a stripped-down rewrite for students. Som

Re: [PR] chore: Re-organize shuffle writer code [datafusion-comet]

2025-02-22 Thread via GitHub
codecov-commenter commented on PR #1439: URL: https://github.com/apache/datafusion-comet/pull/1439#issuecomment-2676356680 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1439?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
comphead commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966592342 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Minor: comment in Cargo.toml about MSRV [datafusion]

2025-02-22 Thread via GitHub
comphead merged PR #14809: URL: https://github.com/apache/datafusion/pull/14809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] Support User-Defined Sorting [datafusion]

2025-02-22 Thread via GitHub
tobixdev opened a new issue, #14828: URL: https://github.com/apache/datafusion/issues/14828 ### Is your feature request related to a problem or challenge? In our system we are working heavily with tagged unions. Basically every column in our results are a `DataType::Union`. However, c

[PR] fix(physical-expr): Remove empty constants check when ordering is satisfied [datafusion]

2025-02-22 Thread via GitHub
rkrishn7 opened a new pull request, #14829: URL: https://github.com/apache/datafusion/pull/14829 ## Which issue does this PR close? - Closes #14806 ## Rationale for this change This PR removes the condition that an inputs constants be empty in order to validate its order

Re: [PR] fix(physical-expr): Remove empty constants check when ordering is satisfied [datafusion]

2025-02-22 Thread via GitHub
rkrishn7 commented on PR #14829: URL: https://github.com/apache/datafusion/pull/14829#issuecomment-2676428302 cc @alamb Let me know if this change makes sense to you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-02-22 Thread via GitHub
Omega359 commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2676369298 For anyone looking, I think [this](https://github.com/influxdata/arrow-datafusion/pull/55) is the workaround Influx has for this issue. -- This is an automated message from

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-22 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966587874 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-22 Thread via GitHub
Omega359 commented on PR #14653: URL: https://github.com/apache/datafusion/pull/14653#issuecomment-2676370277 This should be ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] ci: Investigate running `cargo udeps` in ci. [datafusion]

2025-02-22 Thread via GitHub
findepi commented on issue #9881: URL: https://github.com/apache/datafusion/issues/9881#issuecomment-2676375161 `cargo +nightly udeps` has false positives in wasm tests ``` unused dependencies: `datafusion-wasmtest v45.0.0 (/Users/findepi/repos/datafusion/datafusion/wasmtest)` └─

[PR] Remove unused crate dependencies [datafusion]

2025-02-22 Thread via GitHub
findepi opened a new pull request, #14827: URL: https://github.com/apache/datafusion/pull/14827 Found by `cargo udeps`. Unfortunately there were false positives too (https://github.com/apache/datafusion/issues/9881#issuecomment-2676375161). -- This is an automated message from the

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
LucaCappelletti94 commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2676086002 While things may change in the future, to the best of my knowledge some indices can have operator classes and some cannot. Since we model the different indices,

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966507949 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] fix: normalize column names in table constraints [datafusion]

2025-02-22 Thread via GitHub
jonahgao commented on code in PR #14794: URL: https://github.com/apache/datafusion/pull/14794#discussion_r1966477882 ## datafusion/sqllogictest/test_files/ddl.slt: ## @@ -828,3 +828,39 @@ drop table table_with_pk; statement ok set datafusion.catalog.information_schema = false;

Re: [PR] fix: normalize column names in table constraints [datafusion]

2025-02-22 Thread via GitHub
jonahgao merged PR #14794: URL: https://github.com/apache/datafusion/pull/14794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Column for primary key not found in schema if constraint column in uppercase [datafusion]

2025-02-22 Thread via GitHub
jonahgao closed issue #14340: Column for primary key not found in schema if constraint column in uppercase URL: https://github.com/apache/datafusion/issues/14340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-02-22 Thread via GitHub
danila-b commented on code in PR #14331: URL: https://github.com/apache/datafusion/pull/14331#discussion_r1966524077 ## .github/workflows/extended.yml: ## @@ -33,16 +33,46 @@ on: push: branches: - main + issue_comment: +types: [created] + +permissions: + pul

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2676189897 > [@alamb](https://github.com/alamb) how do y'all handle this at influx? This one comes as quite a shocker to me. Does no one else using datafusion support struct evolution?

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966502438 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966502387 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966502387 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

[PR] Fix duplicated schema name of count wildcard issue [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 opened a new pull request, #14824: URL: https://github.com/apache/datafusion/pull/14824 ## Which issue does this PR close? We convert count(constant) to count(*) in previous PR so select count(1) * count(2) produces duplicated schema name error - Closes #.

Re: [I] [EPIC] DuckDB-Inspired Feature Enhancements [datafusion]

2025-02-22 Thread via GitHub
PokIsemaine commented on issue #14514: URL: https://github.com/apache/datafusion/issues/14514#issuecomment-2676170116 > > I noticed [#7622](https://github.com/apache/datafusion/pull/7622), if the syntax was originally not supported by [sqlparser](https://github.com/apache/datafusion-sqlpars

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966503758 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 commented on PR #14689: URL: https://github.com/apache/datafusion/pull/14689#issuecomment-2676170935 Making extended tests optional seems like a better approach. This way, for minor changes or cases where we're confident in the outcome, we can choose to skip the tests. https:/

Re: [I] Create more user friendly aliases from `col` [datafusion-python]

2025-02-22 Thread via GitHub
Spaarsh commented on issue #754: URL: https://github.com/apache/datafusion-python/issues/754#issuecomment-2676172533 @timsaucer I was thinking of going down the route of writing some workaround to this in our python interface since the issue originally mentioned it. Changing the logic upst

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966501158 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-22 Thread via GitHub
ozankabak commented on PR #14689: URL: https://github.com/apache/datafusion/pull/14689#issuecomment-2676173794 > Making extended tests optional BUT easily visible and run it before merge (maybe github supports such UI?) seems like a better approach. If this is possible, certainly. If

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1966505166 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -244,13 +244,15 @@ SELECT x, y FROM t1 UNION BY NAME (SELECT y, z FROM t2 INTERSECT SELECT 2, 2 as 3

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
LucaCappelletti94 commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2676141068 Sure, so: In a GIN index, there MAY be an operator class from a provided limited set, in the following example `gin_trgm_ops` ```sql CREATE INDE

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-22 Thread via GitHub
logan-keede commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2676185427 > [@logan-keede](https://github.com/logan-keede) as part of moving avro functionality into `datafusion-datasource-avro`, WDYT about putting all of it behind a feature flag,

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-22 Thread via GitHub
zhuqi-lucas commented on code in PR #14823: URL: https://github.com/apache/datafusion/pull/14823#discussion_r1966528251 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -446,21 +511,9 @@ impl ExternalSorter { None => { let sorted_size =

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-22 Thread via GitHub
alamb commented on PR #14689: URL: https://github.com/apache/datafusion/pull/14689#issuecomment-2676180834 > > Making extended tests optional BUT easily visible and run it before merge (maybe github supports such UI?) seems like a better approach. > > If this is possible, certainl

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2676183097 I am basically stalled with this code -- I have some ideas but we have a workaround downstream in InfluxData and a bunch of other priorities have come up since then -- This is

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
iffyio commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2676183454 Ah I see thanks for clarifying! 1. I think introducing a `Custom` variant would be reasonable, it would be similar to the existing pattern of `DataType::Custom`, `Bi

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966510247 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966510247 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [I] Release sqlparser-rs version `0.55.0` [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
alamb commented on issue #1671: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1671#issuecomment-2676181725 It is likely about time to make another release of sqlparser. I plan to do so next week -- I am a bit behind this week -- This is an automated message from the Apach

Re: [I] Add a way to trigger the `extended` test suite from a PR [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14319: URL: https://github.com/apache/datafusion/issues/14319#issuecomment-2676181434 Ths came up again in the context of - https://github.com/apache/datafusion/pull/14689 It would be great to figure out how to get this working -- This is an automated m

Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2676186699 > Are there existing mechanisms in DataFusion to handle external iterators or row sources? There is a PR we are currently working on related to metadata columns (which coul

Re: [I] Datafusion-cli: when the max rows setting inf, we are missing the unlimited case for bounded streaming. [datafusion]

2025-02-22 Thread via GitHub
alamb closed issue #14814: Datafusion-cli: when the max rows setting inf, we are missing the unlimited case for bounded streaming. URL: https://github.com/apache/datafusion/issues/14814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] WIP : create `datafusion-datasource-avro` crate [datafusion]

2025-02-22 Thread via GitHub
alamb commented on PR #14651: URL: https://github.com/apache/datafusion/pull/14651#issuecomment-2676189108 > I'll reopen when the datafusion-datasource crate is arranged. @logan-keede has completed that PR and it is now merged! -- This is an automated message from the Apache Git Se

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2676190385 > From what is see in current code, this struct `PullUpCorrelatedExpr` is applied for scalar subquery as well as predicate subquery. > > For that paper implementation, i'll

Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2676188552 > We are using DataFusion to query Parquet files and wondering if the result of the query can be represented as a bit set of the document position (example below). Bit sets from t

Re: [PR] fix: we are missing the unlimited case for bounded streaming when usi… [datafusion]

2025-02-22 Thread via GitHub
alamb merged PR #14815: URL: https://github.com/apache/datafusion/pull/14815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Add `range` table function [datafusion]

2025-02-22 Thread via GitHub
simonvandel opened a new pull request, #14830: URL: https://github.com/apache/datafusion/pull/14830 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10177. It does not close it, as there is no support for timestamp arguments. ## Rat

Re: [I] DuplicateQualifiedField With Paritioned Data [datafusion-python]

2025-02-22 Thread via GitHub
cfis commented on issue #1018: URL: https://github.com/apache/datafusion-python/issues/1018#issuecomment-2676440635 Thanks for looking into this @kosiew. Yes, I understand that the duplicate fields come from the combination of hive partition fields and the parquet fields. However, I

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
Omega359 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966565413 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
Omega359 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966515761 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [

Re: [I] Incorrect backslash treatment in string literals in DataFusion CLI [datafusion]

2025-02-22 Thread via GitHub
Lordworms commented on issue #13286: URL: https://github.com/apache/datafusion/issues/13286#issuecomment-2676330289 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] chore: Re-organize shuffle writer code [datafusion-comet]

2025-02-22 Thread via GitHub
andygrove opened a new pull request, #1439: URL: https://github.com/apache/datafusion-comet/pull/1439 ## Which issue does this PR close? N/A ## Rationale for this change In preparation for improving the shuffle writer, this PR simply moves some code out o

Re: [PR] DF 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
alamb commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966493967 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [rel

[PR] refactor: use TypeSignature::Coercible for crypto functions [datafusion]

2025-02-22 Thread via GitHub
Chen-Yuan-Lai opened a new pull request, #14826: URL: https://github.com/apache/datafusion/pull/14826 ## Which issue does this PR close? - Closes #14762 . ## Rationale for this change ## What changes are included in this PR? ## Are these cha

[PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-22 Thread via GitHub
2010YOUY01 opened a new pull request, #14823: URL: https://github.com/apache/datafusion/pull/14823 ## Which issue does this PR close? Follow up to https://github.com/apache/datafusion/pull/14644, this PR fixes an unsolved failing case for external sort. It's found by https://

Re: [PR] DF 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
alamb commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966495105 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 Review Comment:

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-22 Thread via GitHub
2010YOUY01 commented on code in PR #14823: URL: https://github.com/apache/datafusion/pull/14823#discussion_r1966496849 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -446,21 +511,9 @@ impl ExternalSorter { None => { let sorted_size =

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
alamb commented on PR #57: URL: https://github.com/apache/datafusion-site/pull/57#issuecomment-2676158225 @ozankabak and @comphead and @andygrove and @Dandandan , Are there other things / features you think we should highlight in the DataFusion 45 blog post (that covers the last seve

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966456466 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -1119,11 +1180,11 @@ fn next_value_helper(value: ScalarValue) -> ScalarValue { match value {

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2676188953 > > [@logan-keede](https://github.com/logan-keede) as part of moving avro functionality into `datafusion-datasource-avro`, WDYT about putting all of it behind a feature flag, the

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-22 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966514940 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
Omega359 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966515761 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
Omega359 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966516600 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
Omega359 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966516700 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-22 Thread via GitHub
Omega359 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966516624 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories: [

Re: [PR] Fix ref head for issue comment [datafusion]

2025-02-22 Thread via GitHub
danila-b closed pull request #14825: Fix ref head for issue comment URL: https://github.com/apache/datafusion/pull/14825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[PR] Fix ref head for issue comment [datafusion]

2025-02-22 Thread via GitHub
danila-b opened a new pull request, #14825: URL: https://github.com/apache/datafusion/pull/14825 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Fix duplicated schema name error from count wildcard [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 commented on code in PR #14824: URL: https://github.com/apache/datafusion/pull/14824#discussion_r1966519181 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -1231,6 +1233,13 @@ fn evaluate( expr: &[Arc], batch: &RecordBatch, ) -> Result> { +// h

[PR] build(deps): bump arrow from 54.1.0 to 54.2.0 [datafusion-python]

2025-02-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1035: URL: https://github.com/apache/datafusion-python/pull/1035 Bumps [arrow](https://github.com/apache/arrow-rs) from 54.1.0 to 54.2.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow's releases. ar

[PR] build(deps): bump uuid from 1.13.1 to 1.14.0 [datafusion-python]

2025-02-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1034: URL: https://github.com/apache/datafusion-python/pull/1034 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.13.1 to 1.14.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.14.0

Re: [I] [EPIC] Substrait: Add producer and consumer for physical plans [datafusion]

2025-02-22 Thread via GitHub
alamb commented on issue #5173: URL: https://github.com/apache/datafusion/issues/5173#issuecomment-2676454727 Hi @niebayes, this PR from @robtandy may be quite relevant to your work building a distributed execuiton engine: - https://github.com/apache/datafusion-ray/pull/60 The larg

[I] substrait generated by Apache Calcite does not run in DataFusion [datafusion]

2025-02-22 Thread via GitHub
alamb opened a new issue, #14831: URL: https://github.com/apache/datafusion/issues/14831 ### Describe the bug This is a report from @lmwnshn As part of a series of issues that were discovered at CMU while working on DataFusion - https://github.com/apache/datafusion/issues/14

Re: [PR] [POC] Try to plan ast::Expr::CompoundFieldAccess syntax [datafusion]

2025-02-22 Thread via GitHub
github-actions[bot] closed pull request #13734: [POC] Try to plan ast::Expr::CompoundFieldAccess syntax URL: https://github.com/apache/datafusion/pull/13734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Find keywords using perfect hashing [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
github-actions[bot] commented on PR #1590: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1590#issuecomment-2676503628 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] Slightly faster keyword lookups [datafusion-sqlparser-rs]

2025-02-22 Thread via GitHub
github-actions[bot] commented on PR #1591: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1591#issuecomment-2676503607 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [I] Cargo bench is eating up all the memory [datafusion]

2025-02-22 Thread via GitHub
jayzhan211 commented on issue #14833: URL: https://github.com/apache/datafusion/issues/14833#issuecomment-2676535170 Btw this happens at compile time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   >