Re: [PR] Impl intermeidate result blocked approach sketch [datafusion]

2025-04-23 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2823308363 I add a query in `extened.sql`, the blocked approach can get a obvious improvement as expected: - sql: ```sql SELECT "WatchID", MIN("ResolutionWidth"), MAX("ResolutionW

Re: [PR] Impl intermeidate result blocked approach sketch [datafusion]

2025-04-23 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2823308375 I add a query in `extened.sql`, the blocked approach can get a obvious improvement as expected: - sql: ```sql SELECT "WatchID", MIN("ResolutionWidth"), MAX("ResolutionW

Re: [PR] feat: Add option to adjust writer buffer size for query output [datafusion]

2025-04-23 Thread via GitHub
m09526 commented on code in PR #15747: URL: https://github.com/apache/datafusion/pull/15747#discussion_r2055434583 ## datafusion/datasource/src/write/mod.rs: ## @@ -88,6 +91,21 @@ pub async fn create_writer( file_compression_type.convert_async_writer(buf_writer) } +/// R

[PR] Add `DECLARE ... CURSOR FOR` support for SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc opened a new pull request, #1821: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1821 This PR adds support for declaring cursors on queries for SQL Server ([docs](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql)) Eg, thi

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2825528849 Here's another example case this PR should parse properly, before merging (on my todo list...) ``` USE some_database; GO ;WITH cte AS ( SELECT 1

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-04-23 Thread via GitHub
adamreeve commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2826229093 With the KMS API not being included in arrow-rs but being built as a third-party crate (https://github.com/apache/arrow-rs/pull/7387#issuecomment-2819908130), I would assume

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056434400 ## src/parser/mod.rs: ## @@ -484,8 +488,18 @@ impl<'a> Parser<'a> { } let statement = self.parse_statement()?; +

[I] Sorting is not maintained after using a window function [datafusion]

2025-04-23 Thread via GitHub
daphnenhuch-at opened a new issue, #15833: URL: https://github.com/apache/datafusion/issues/15833 ### Describe the bug I have a query which sorts the data by a column called "userPrimaryKey" and then using a windowing function to add a row number column to the data frame. I've set `t

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056886926 ## src/dialect/mssql.rs: ## @@ -116,7 +116,17 @@ impl Dialect for MsSqlDialect { true } -fn is_column_alias(&self, kw: &Keyword,

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056903407 ## src/parser/mod.rs: ## @@ -4055,6 +4090,38 @@ impl<'a> Parser<'a> { ) } +/// Look backwards in the token stream and expect that

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056908951 ## src/dialect/mssql.rs: ## @@ -116,7 +116,17 @@ impl Dialect for MsSqlDialect { true } -fn is_column_alias(&self, kw: &Keyword,

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
codecov-commenter commented on PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#issuecomment-2825234249 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1677?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] ignore: explore jemalloc and snmalloc instead of mimalloc [datafusion-comet]

2025-04-23 Thread via GitHub
mbutrovich opened a new pull request, #1679: URL: https://github.com/apache/datafusion-comet/pull/1679 ## Which issue does this PR close? Closes #. ## Rationale for this change Comet currently supports mimalloc as its memory allocate (`make release COMET_FEAT

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-04-23 Thread via GitHub
corwinjoy commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2825947274 @alamb @adamreeve With the modular encryption essentially complete in arrow-rs, we are interested in beginning to move forward with adding support for this feature in datafus

[PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
xudong963 opened a new pull request, #15822: URL: https://github.com/apache/datafusion/pull/15822 ## Which issue does this PR close? - Closes #. ## Rationale for this change fetch is missing in `EnforceSorting` optimizer (two places) ## What changes are

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
vadimpiven commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2823619778 @xudong963 I do not have a merge button even after the review. Should I wait for another review, or you can merge the change? In the letter case I want to highlight that this chan

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2823637363 I will send the conflict fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2823656330 > Should I wait for another review, or you can merge the change? I'll wait for another to review, if not, I'll merge tomorrow. > In the letter case I want to hi

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2823660336 > In the letter case I want to highlight that this change is blocking me from updating to 47.0.0 version, and I would highly appreciate a patch release with this fix. If it'

[I] Build failure when default features are disabled [datafusion-ballista]

2025-04-23 Thread via GitHub
milenkovicm opened a new issue, #1254: URL: https://github.com/apache/datafusion-ballista/issues/1254 **Describe the bug** when default features are disabled ```toml ballista = { version = "44", default-features = false } ballista-scheduler = { version = "44", default-fea

Re: [I] Bring ordering information for grouped aggregation [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on issue #15818: URL: https://github.com/apache/datafusion/issues/15818#issuecomment-2823740436 > There's no `SortExec` in the query plan. Maybe it was removed by the optimizer? In this query plan, the `agg.child` is not sorted by the GROUP BY key. > > > >

Re: [PR] Impl intermeidate result blocked approach sketch [datafusion]

2025-04-23 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2823851220 Bench result of clickbench, it can see no regression when `blocked_groups` disabled (still disable in all queries now due to only support blocked in very few `group values` and `ac

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
Garamda commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2823913238 @jayzhan211 Thank you for reviewing! However, I have one concern. Is it okay to merge this PR right away, considering https://github.com/apache/datafusion/pull/13511#pul

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2055896317 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,145 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benchma

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 merged PR #13511: URL: https://github.com/apache/datafusion/pull/13511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 closed issue #11732: Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions URL: https://github.com/apache/datafusion/issues/11732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Adjust sizeInBytes estimation for Comet exchanges to avoid join strategy regressions [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove closed issue #1671: Adjust sizeInBytes estimation for Comet exchanges to avoid join strategy regressions URL: https://github.com/apache/datafusion-comet/issues/1671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Support unparsing `UNION` for distinct results [datafusion]

2025-04-23 Thread via GitHub
goldmedal commented on PR #15814: URL: https://github.com/apache/datafusion/pull/15814#issuecomment-2824780534 Thanks @phillipleblanc and @sgrebnov for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-04-23 Thread via GitHub
acking-you commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2824790924 > I tried the rewrite into a Semi join and indeed it is over 2x slower (5.3sec vs 12sec) > > > SELECT * from 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY "E

Re: [I] The SQL Unparser does not correctly handle `UNION` [datafusion]

2025-04-23 Thread via GitHub
goldmedal closed issue #15813: The SQL Unparser does not correctly handle `UNION` URL: https://github.com/apache/datafusion/issues/15813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Support unparsing `UNION` for distinct results [datafusion]

2025-04-23 Thread via GitHub
goldmedal merged PR #15814: URL: https://github.com/apache/datafusion/pull/15814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] Minor: fix potential flaky test in aggregate.slt [datafusion]

2025-04-23 Thread via GitHub
bikbov opened a new pull request, #15829: URL: https://github.com/apache/datafusion/pull/15829 ## Which issue does this PR close? - Closes #15789. ## Rationale for this change Tests improvement ## What changes are included in this PR? Fix potential flaky

[PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
Rachelint opened a new pull request, #15828: URL: https://github.com/apache/datafusion/pull/15828 …fusion. ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are the

Re: [PR] Add support for `XMLTABLE` [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio merged PR #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio merged PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] Support exposing setting memory limit of memory pool [datafusion]

2025-04-23 Thread via GitHub
Rachelint opened a new issue, #15830: URL: https://github.com/apache/datafusion/issues/15830 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [D] Should ExecutionPlan spawn tasks in `execute` function [datafusion]

2025-04-23 Thread via GitHub
GitHub user pepijnve added a comment to the discussion: Should ExecutionPlan spawn tasks in `execute` function I can't give you an authoritative answer on this one, but FWIW `CoalescePartitionsExec::execute` also requires a current/active Tokio context since it spawns a task for each partitio

Re: [I] xmltable(...) function support [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio closed issue #1816: xmltable(...) function support URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15811: URL: https://github.com/apache/datafusion/pull/15811#discussion_r2055814059 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -498,7 +498,7 @@ impl LogicalPlanBuilder { TableScan::try_new(table_name, table_source, proje

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
EmilyMatt commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056074078 ## .github/workflows/miri.yml: ## @@ -38,6 +38,12 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 + - name: Install

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2824970974 hi @xudong963 , i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L2

[PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 opened a new pull request, #15832: URL: https://github.com/apache/datafusion/pull/15832 ## Which issue does this PR close? - Closes #15774 ## Rationale for this change updated the extending-operators.md file -- This is an automated message from the A

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-04-23 Thread via GitHub
acking-you commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2824964842 > Relevant: https://clickhouse.com/blog/clickhouse-gets-lazier-and-faster-introducing-lazy-materialization Thank you so much for sharing this blog link—it’s truly an ex

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-23 Thread via GitHub
Blizzara commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2824982349 Thanks! This has indeed been a long time todo :) also cc @vbarua I think personally I'd prefer a bit less files, but that's just a suggestion: I'd probably do something like:

[PR] Project inline [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 opened a new pull request, #15825: URL: https://github.com/apache/datafusion/pull/15825 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2823998323 https://github.com/apache/datafusion/pull/15825 I couldn't open a PR to your repo. Here is the fix I did -- This is an automated message from the Apache Git Service. To re

[PR] Implement min max for dictionary types [datafusion]

2025-04-23 Thread via GitHub
XiangpengHao opened a new pull request, #15827: URL: https://github.com/apache/datafusion/pull/15827 ## Which issue does this PR close? - Closes #. ## Rationale for this change I hit a run time error when passing a dictionary type to the min/max aggregation.

Re: [PR] Implement min max for dictionary types [datafusion]

2025-04-23 Thread via GitHub
XiangpengHao commented on code in PR #15827: URL: https://github.com/apache/datafusion/pull/15827#discussion_r2056278172 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -1854,9 +1866,31 @@ mod tests { #[test] fn test_get_min_max_return_type_coerce_dictionary()

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
vadimpiven commented on code in PR #15825: URL: https://github.com/apache/datafusion/pull/15825#discussion_r2056286726 ## datafusion/core/tests/execution/logical_plan.rs: ## @@ -96,3 +100,37 @@ where }; element } + +#[test] +fn inline_scan_projection_test() -> Result<

[PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
EmilyMatt opened a new pull request, #1677: URL: https://github.com/apache/datafusion-comet/pull/1677 ## Rationale for this change Reduce the amount of duplicate crates due to crates that use outdated versions, thereby improving compile times and reducing binary size. Some

Re: [PR] chore(deps): bump env_logger from 0.11.7 to 0.11.8 [datafusion]

2025-04-23 Thread via GitHub
xudong963 merged PR #15823: URL: https://github.com/apache/datafusion/pull/15823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
waynexia commented on code in PR #15828: URL: https://github.com/apache/datafusion/pull/15828#discussion_r2056500052 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -141,6 +141,24 @@ pub trait MemoryPool: Send + Sync + std::fmt::Debug { /// Return the total amount o

Re: [PR] Add `OR ALTER` support for `CREATE VIEW` [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio merged PR #1818: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2823874921 Thanks @Garamda and @vbarua! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] Fix `CoalescePartitionsExec` proto serialization [datafusion]

2025-04-23 Thread via GitHub
lewiszlw opened a new pull request, #15824: URL: https://github.com/apache/datafusion/pull/15824 ## Which issue does this PR close? - Closes #. ## Rationale for this change `CoalescePartitionsExec` proto serialization missed `fetch` value. ## What c

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2055810839 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [PR] Add support for `XMLTABLE` [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
lovasoa commented on PR #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817#issuecomment-2824821901 Thanks for merging, @iffyio ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] Ensure Substrait producer for `BinaryExpr` includes `output_type` [datafusion]

2025-04-23 Thread via GitHub
kadinrabo opened a new issue, #15831: URL: https://github.com/apache/datafusion/issues/15831 ### Describe the bug When converting `BinaryExpr` expressions to Substrait using `from_binary_expr`, the resulting scalar function omits the `output_type` field. This happens via the `make_bi

[PR] docs: add ArkFlow [datafusion]

2025-04-23 Thread via GitHub
chenquan opened a new pull request, #15826: URL: https://github.com/apache/datafusion/pull/15826 ## Which issue does this PR close? no. ## Rationale for this change ## What changes are included in this PR? Add Arkflow to the document

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
vadimpiven commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2823765003 Thank you! Ok, will discuss making a fork and depending on it instead of upstream with my team today. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] docs: add ArkFlow [datafusion]

2025-04-23 Thread via GitHub
xudong963 merged PR #15826: URL: https://github.com/apache/datafusion/pull/15826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-04-23 Thread via GitHub
leoyvens commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2824716928 To understand how this happens in the logical optimizer, as part of the `SimplifyExpressions` pass, you can look at [unwrap_cast.rs](https://github.com/apache/datafusion/blob/m

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
EmilyMatt commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056071541 ## native/Cargo.toml: ## @@ -38,16 +38,16 @@ arrow = { version = "55.0.0", features = ["prettyprint", "ffi", "chrono-tz"] } async-trait = { version = "0.1"

Re: [PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
codecov-commenter commented on PR #1676: URL: https://github.com/apache/datafusion-comet/pull/1676#issuecomment-2824337650 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1676?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] docs: Add changelog for 0.8.0 [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove opened a new pull request, #1675: URL: https://github.com/apache/datafusion-comet/pull/1675 ## Which issue does this PR close? N/A ## Rationale for this change ## What changes are included in this PR? ## How are these changes teste

[PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove opened a new pull request, #1676: URL: https://github.com/apache/datafusion-comet/pull/1676 ## Which issue does this PR close? N/A ## Rationale for this change Now that the release branch `branch-0.8` has been created, it is time to switch `main

Re: [PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1676: URL: https://github.com/apache/datafusion-comet/pull/1676#discussion_r2055934997 ## dev/release/README.md: ## @@ -60,7 +60,6 @@ Create a PR against the main branch to prepare for developing the next release: - Update the Rust crate vers

Re: [PR] Make `Diagnostic` easy/convinient to attach by using macro and avoiding `map_err` [datafusion]

2025-04-23 Thread via GitHub
logan-keede commented on PR #15796: URL: https://github.com/apache/datafusion/pull/15796#issuecomment-2823666019 > @logan-keede please run the planner tests to check if this change affects planner performance, we got some experience in the past #7522 using ```sh cargo bench --bench

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056028940 ## native/Cargo.toml: ## @@ -38,16 +38,16 @@ arrow = { version = "55.0.0", features = ["prettyprint", "ffi", "chrono-tz"] } async-trait = { version = "0.1"

[I] Fix rat check errors during release process [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove opened a new issue, #1678: URL: https://github.com/apache/datafusion-comet/issues/1678 ### Describe the bug The rat exclude list needs updating to ignore these files: ``` NOT APPROVED: docs/source/_static/images/comet-dataflow.excalidraw (apache-datafusion-comet-0.

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056187263 ## native/Cargo.toml: ## @@ -38,16 +38,16 @@ arrow = { version = "55.0.0", features = ["prettyprint", "ffi", "chrono-tz"] } async-trait = { version = "0.1"

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
vadimpiven commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2824581546 You can merge your change and just close my PR, there is no difference to the result for me. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] feat: update datafusion dependency 47 [datafusion-python]

2025-04-23 Thread via GitHub
robtandy commented on code in PR #1107: URL: https://github.com/apache/datafusion-python/pull/1107#discussion_r2056227303 ## src/functions.rs: ## @@ -698,8 +677,22 @@ pub fn approx_percentile_cont_with_weight( add_builder_fns_to_aggregate(agg_fn, None, filter, None, None)

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2055781615 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [I] Join on pandas dataframe from python API fails due to schema metadata [datafusion]

2025-04-23 Thread via GitHub
lesam commented on issue #15754: URL: https://github.com/apache/datafusion/issues/15754#issuecomment-2824080701 https://github.com/apache/datafusion/issues/12736#issuecomment-2613005807 also seems to be the same issue -- This is an automated message from the Apache Git Service. To respon

Re: [PR] perf: Experimental fix to avoid join strategy regression [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on PR #1674: URL: https://github.com/apache/datafusion-comet/pull/1674#issuecomment-2824086340 I will merge this so that I can start the 0.8.0 release process. Thanks for the reviews @comphead and @kazuyukitanimura, -- This is an automated message from the Apache Git

Re: [PR] perf: Experimental fix to avoid join strategy regression [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove merged PR #1674: URL: https://github.com/apache/datafusion-comet/pull/1674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-04-23 Thread via GitHub
berkaysynnada commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2824434533 > > What's the status of this PR? > > It's ready to review. I'm still waiting for someone to help review it. Thanks @goldmedal. We'll need this as well, so let's re

[PR] bug: build fails with `--no-default-features` [datafusion-ballista]

2025-04-23 Thread via GitHub
milenkovicm opened a new pull request, #1255: URL: https://github.com/apache/datafusion-ballista/pull/1255 # Which issue does this PR close? Closes #1254 # Rationale for this change # What changes are included in this PR? - Bug fix - GitHub action check if

Re: [PR] Fix `ILIKE` expression support in SQL unparser [datafusion]

2025-04-23 Thread via GitHub
phillipleblanc commented on PR #15820: URL: https://github.com/apache/datafusion/pull/15820#issuecomment-2826473113 Correct, DataFusion was correctly handling Like and ILike - but DataFusion stores that as a single `Expr::Like ` with a boolean for whether its case insensitive. When t

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in `min` and `max` [datafusion]

2025-04-23 Thread via GitHub
gabotechs commented on PR #13991: URL: https://github.com/apache/datafusion/pull/13991#issuecomment-2826530936 @rluvaton do you have an estimate of when this might be shipped? We’re currently blocked by this, so we’d be glad to handle it ourselves if that would help ease your workload. --

[PR] chore(deps): bump env_logger from 0.11.7 to 0.11.8 [datafusion]

2025-04-23 Thread via GitHub
dependabot[bot] opened a new pull request, #15823: URL: https://github.com/apache/datafusion/pull/15823 Bumps [env_logger](https://github.com/rust-cli/env_logger) from 0.11.7 to 0.11.8. Release notes Sourced from https://github.com/rust-cli/env_logger/releases";>env_logger's relea

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2823540965 I didn't see the notification of this one. I will find time to review it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in `min` and `max` [datafusion]

2025-04-23 Thread via GitHub
rluvaton commented on PR #13991: URL: https://github.com/apache/datafusion/pull/13991#issuecomment-2826533597 I'm really sorry, had crazy week with the baby, will work on it today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-23 Thread via GitHub
srh opened a new pull request, #15836: URL: https://github.com/apache/datafusion/pull/15836 ## Which issue does this PR close? - Closes #15835. ## Rationale for this change Bugfix ## What changes are included in this PR? Bugfix and unit test cases for the op

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2826457369 > > > > i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 after

[I] Cannot use Projection::new_from_schema to set parquet field ids. [datafusion]

2025-04-23 Thread via GitHub
init-js opened a new issue, #15837: URL: https://github.com/apache/datafusion/issues/15837 ### Describe the bug Our goal is to take an existing `DataFrame` and change the parquet field ids (after the fact) of its schema. The function `Projection::new_from_schema` looks promising, in

[I] ILike with no wildcards is mistakenly optimized to string equality [datafusion]

2025-04-23 Thread via GitHub
srh opened a new issue, #15835: URL: https://github.com/apache/datafusion/issues/15835 ### Describe the bug `'a' ILIKE 'A'` ends up evaluating as false. PR incoming. ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additio

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-23 Thread via GitHub
gabotechs commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2825064512 > Thoughts? I have only a very slight preference for smaller pieces, but since I wasn’t on the front lines coding this, I trust your judgment much more. I’ll apply your sugg

Re: [PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
Rachelint commented on PR #15828: URL: https://github.com/apache/datafusion/pull/15828#issuecomment-2824818687 @waynexia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove merged PR #1676: URL: https://github.com/apache/datafusion-comet/pull/1676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15825: URL: https://github.com/apache/datafusion/pull/15825#discussion_r2057104794 ## datafusion/core/tests/execution/logical_plan.rs: ## @@ -96,3 +100,37 @@ where }; element } + +#[test] +fn inline_scan_projection_test() -> Result<

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2825855767 I think support both query would be confusing, if we plan to end up support the new syntax at the end, it is better not to keep the old syntax -- This is an automated message fr

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #15825: URL: https://github.com/apache/datafusion/pull/15825#issuecomment-2825860245 You can merge this PR, I had kept the commit from @vadimpiven -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2057125845 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2057129418 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [PR] feat: implement contextualized ObjectStore [datafusion]

2025-04-23 Thread via GitHub
github-actions[bot] commented on PR #14805: URL: https://github.com/apache/datafusion/pull/14805#issuecomment-2825977607 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-04-23 Thread via GitHub
github-actions[bot] commented on PR #14657: URL: https://github.com/apache/datafusion/pull/14657#issuecomment-2825977721 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] support simple/cross lateral joins [datafusion]

2025-04-23 Thread via GitHub
github-actions[bot] commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-282594 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
Rachelint merged PR #15828: URL: https://github.com/apache/datafusion/pull/15828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

  1   2   >