[PR] Add suppport for Show Objects statement for the Snowflake parser [datafusion-sqlparser-rs]

2025-02-04 Thread via GitHub
DanCodedThis opened a new pull request, #1702: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1702 It's a naive implementation, since `ShowStatementOptions` allow for cases that are not supported by Snowflake `SHOW OBJECTS` like: - `SHOW TERSE OBJECTS WHERE ... ...` - Inst

Re: [I] Attach `Diagnostic` to "invalid function argument types" error [datafusion]

2025-02-04 Thread via GitHub
dentiny commented on issue #14431: URL: https://github.com/apache/datafusion/issues/14431#issuecomment-2634135392 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
timsaucer commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2634209782 @ozankabak Yes, I will always try to make time for students. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Add supports for `CREATE/ALTER/DROP CONNECTOR` syntax [datafusion-sqlparser-rs]

2025-02-04 Thread via GitHub
wugeer commented on code in PR #1701: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1701#discussion_r1941338806 ## src/parser/alter.rs: ## @@ -99,6 +99,47 @@ impl Parser<'_> { } } +/// Parse ALTER CONNECTOR statement +/// ```sql +/// AL

Re: [I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2634124815 Thank you @mkarbo! Can I assume @eliaperantoni would help too? Some other CTA: - @timsaucer, would you be willing to mentor a student help us improve Python bindings?

Re: [I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
eliaperantoni commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2634129544 > Can I assume @eliaperantoni would help too? Absolutely :) I'm actively working on this or related features e.g. #14439 -- This is an automated message from the A

Re: [PR] Add supports for `CREATE/ALTER/DROP CONNECTOR` syntax [datafusion-sqlparser-rs]

2025-02-04 Thread via GitHub
wugeer commented on code in PR #1701: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1701#discussion_r1941339625 ## src/ast/mod.rs: ## @@ -2646,6 +2646,18 @@ pub enum Statement { with_check: Option, }, /// ```sql +/// CREATE CONNECTOR +//

Re: [I] Optimize `sort_batch` for single column case [datafusion]

2025-02-04 Thread via GitHub
2010YOUY01 commented on issue #14475: URL: https://github.com/apache/datafusion/issues/14475#issuecomment-2633270939 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14481: URL: https://github.com/apache/datafusion/issues/14481#issuecomment-2633740128 Given how much time is spent decoding ParquetMetadata, maybe it would be good to add some sort of small built in cache for parquet metadata 🤔 I think @Ted-Jiang made hooks to do t

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2633737624 > I will file a separate ticket for looking into them. - Filed https://github.com/apache/datafusion/issues/14481 in case anyone is interested in taking a look -- This

Re: [PR] Add parsing for GRANT ROLE and GRANT DATABASE ROLE in Snowflake dialect [datafusion-sqlparser-rs]

2025-02-04 Thread via GitHub
alamb commented on PR #1689: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1689#issuecomment-2633754004 Rolling along ☸️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Replace `once_cell::Lazy` with `std::sync::LazyLock` [datafusion]

2025-02-04 Thread via GitHub
alamb merged PR #14480: URL: https://github.com/apache/datafusion/pull/14480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Buildable release builds [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14479: URL: https://github.com/apache/datafusion/issues/14479#issuecomment-2633769479 As @mbrobbel says, I agree it is time to put Cargo.lock int he repo: - https://github.com/apache/datafusion/issues/14135 -- This is an automated message from the Apache Git S

Re: [I] [DISCUSS] Lower Friction / Lower Ceremony (faster?) releases [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14428: URL: https://github.com/apache/datafusion/issues/14428#issuecomment-2633777593 > What things from release script can be further automated? I think we could automate creating the tarball / tags perhaps. For what it is worth, this [past release](

Re: [PR] Replace `once_cell::Lazy` with `std::sync::LazyLock` [datafusion]

2025-02-04 Thread via GitHub
alamb commented on PR #14480: URL: https://github.com/apache/datafusion/pull/14480#issuecomment-2633759809 FYI @phillipleblanc who tried this a while ago before MSRV was bumped: - https://github.com/apache/datafusion/pull/12612 -- This is an automated message from the Apache Git Servic

Re: [I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14481: URL: https://github.com/apache/datafusion/issues/14481#issuecomment-2633735903 I got flamegraphs from them using https://github.com/flamegraph-rs/flamegraph ## Q36 ```shell ./datafusion-cli-44 -c "SELECT \"URL\", COUNT(*) AS PageViews FROM 'h

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-02-04 Thread via GitHub
Chen-Yuan-Lai commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2634404910 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Attach `Diagnostic` to "invalid function argument types" error [datafusion]

2025-02-04 Thread via GitHub
eliaperantoni commented on issue #14431: URL: https://github.com/apache/datafusion/issues/14431#issuecomment-2634119892 @dentiny Absolutely! Though I don't have the permissions to assign it to you, you can write `take` and the GitHub bot will -- This is an automated message from the Apach

Re: [I] Support "Tracing" / Spans [datafusion]

2025-02-04 Thread via GitHub
erratic-pattern commented on issue #9415: URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2635477969 > I am not clear what additional benefit more direct tracing integration in datafusion would provide, but I may be missing something The `tracing` API is more granul

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-04 Thread via GitHub
Garamda commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2635495223 Thank you @berkaysynnada for your review, and for letting me know about my mistake (submodule changes). I have just reverted submodule changes. I will check if I can add more n

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-02-04 Thread via GitHub
wForget commented on code in PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#discussion_r1942138342 ## native/core/src/execution/jni_api.rs: ## @@ -508,9 +505,6 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( let next_

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-04 Thread via GitHub
comphead commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1942084981 ## native/core/Cargo.toml: ## @@ -77,6 +77,7 @@ datafusion-comet-proto = { workspace = true } object_store = { workspace = true } url = { workspace = true }

Re: [I] Support "Tracing" / Spans [datafusion]

2025-02-04 Thread via GitHub
erratic-pattern commented on issue #9415: URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2635459901 > I'd like to propose a very simple change first, before going full tracing everywhere: wrapping all task spawn points with .in_current_span(). I agree with this cha

Re: [PR] disable coercison for unmatched struct type [datafusion]

2025-02-04 Thread via GitHub
Lordworms commented on PR #14409: URL: https://github.com/apache/datafusion/pull/14409#issuecomment-2635465735 @alamb Hi alamb, I have moved the logic in struct_coercion and also fix one bug in coalesce -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] fix: `List` of `FixedSizeList` coercion issue in SQL [datafusion]

2025-02-04 Thread via GitHub
alan910127 commented on PR #14468: URL: https://github.com/apache/datafusion/pull/14468#issuecomment-2635404952 > Thank you for this contribution @alan910127 ❤️ Thank you, @alamb! 😊 Glad to contribute! Let me know if there's anything else I can improve. -- This is an automated me

[I] Add separate HDFS submodule to Comet [datafusion-comet]

2025-02-04 Thread via GitHub
comphead opened a new issue, #1368: URL: https://github.com/apache/datafusion-comet/issues/1368 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-02-04 Thread via GitHub
alan910127 commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2635450667 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-02-04 Thread via GitHub
wForget commented on PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#issuecomment-2635523066 @andygrove @mbutrovich @parthchandra Thank you for your review and sorry for the late reply. I have just finished my Chinese New Year holiday and will continue this work later.

Re: [PR] WIP: fix regression after replacing `Vec` with `HashSet` [datafusion]

2025-02-04 Thread via GitHub
github-actions[bot] commented on PR #13656: URL: https://github.com/apache/datafusion/pull/13656#issuecomment-2635531668 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] `array_slice` can't correctly handle NULL parameters or some edge cases [datafusion]

2025-02-04 Thread via GitHub
jkosh44 commented on issue #10548: URL: https://github.com/apache/datafusion/issues/10548#issuecomment-2634748421 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] chore(deps): bump clap from 4.5.27 to 4.5.28 in /datafusion-cli [datafusion]

2025-02-04 Thread via GitHub
comphead merged PR #14477: URL: https://github.com/apache/datafusion/pull/14477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump bytes from 1.9.0 to 1.10.0 in /datafusion-cli [datafusion]

2025-02-04 Thread via GitHub
comphead merged PR #14476: URL: https://github.com/apache/datafusion/pull/14476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Fix link to issue and expand comment [datafusion]

2025-02-04 Thread via GitHub
alamb merged PR #14473: URL: https://github.com/apache/datafusion/pull/14473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] make datafusion-catalog-listing and move some implementation of listing out of datafusion/core/datasource/listing [datafusion]

2025-02-04 Thread via GitHub
alamb commented on code in PR #14464: URL: https://github.com/apache/datafusion/pull/14464#discussion_r1941705612 ## datafusion/core/src/datasource/listing/mod.rs: ## @@ -18,263 +18,6 @@ //! A table that uses the `ObjectStore` listing capability //! to get the list of files to

Re: [PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]

2025-02-04 Thread via GitHub
Spaarsh commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2634910919 I will add support for Null values and write the tests as well! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[PR] Minor: `cargo fmt` to fix CI [datafusion]

2025-02-04 Thread via GitHub
alamb opened a new pull request, #14487: URL: https://github.com/apache/datafusion/pull/14487 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/14473 ## Rationale for this change I accidentally merged a PR that didn't pass CI test

Re: [PR] chore: Fix link to issue and expand comment [datafusion]

2025-02-04 Thread via GitHub
alamb commented on PR #14473: URL: https://github.com/apache/datafusion/pull/14473#issuecomment-2634776421 Ooops -- I merged this with a CI failure. Fix PR: - https://github.com/apache/datafusion/pull/14487 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] STRING_AGG missing functionality [datafusion]

2025-02-04 Thread via GitHub
alamb closed pull request #14412: STRING_AGG missing functionality URL: https://github.com/apache/datafusion/pull/14412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] STRING_AGG missing functionality [datafusion]

2025-02-04 Thread via GitHub
alamb commented on PR #14412: URL: https://github.com/apache/datafusion/pull/14412#issuecomment-2634783601 Close/reopen to rerun CI checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-02-04 Thread via GitHub
wForget commented on code in PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#discussion_r1942247623 ## native/core/src/execution/metrics/utils.rs: ## @@ -55,60 +64,21 @@ pub fn update_comet_metric( Some(metrics.aggregate_by_name()) }; -upd

Re: [I] Attach `Diagnostic` to "duplicate table name" error [datafusion]

2025-02-04 Thread via GitHub
zjregee commented on issue #14436: URL: https://github.com/apache/datafusion/issues/14436#issuecomment-2635780992 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] [WIP] checks for build wasm [datafusion]

2025-02-04 Thread via GitHub
Lordworms opened a new pull request, #14493: URL: https://github.com/apache/datafusion/pull/14493 ## Which issue does this PR close? uuid crate wasm-build ci job failed on my pr check if it is related to changes Closes #. ## Rationale for this change ## Wh

[PR] fix(ci): build error with wasm [datafusion]

2025-02-04 Thread via GitHub
Lordworms opened a new pull request, #14494: URL: https://github.com/apache/datafusion/pull/14494 ## Which issue does this PR close? like this https://github.com/apache/datafusion/actions/runs/13151403418/job/36699387339?pr=14493 Closes #. ## Rationale for this chang

Re: [PR] [WIP] checks for build wasm [datafusion]

2025-02-04 Thread via GitHub
Lordworms closed pull request #14493: [WIP] checks for build wasm URL: https://github.com/apache/datafusion/pull/14493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Require space after -- to start single line comment in MySQL [datafusion-sqlparser-rs]

2025-02-04 Thread via GitHub
iffyio commented on code in PR #1705: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1705#discussion_r1942311180 ## src/dialect/mod.rs: ## @@ -880,6 +880,15 @@ pub trait Dialect: Debug + Any { fn supports_table_hints(&self) -> bool { false } + +

[PR] refactor: replace uses of `arrow_buffer` & `arrow_array` with reexport in arrow [datafusion]

2025-02-04 Thread via GitHub
Chen-Yuan-Lai opened a new pull request, #14495: URL: https://github.com/apache/datafusion/pull/14495 ## Which issue does this PR close? Closes #14115. ## Rationale for this change As [14115 issue comment](https://github.com/apache/datafusion/issues/14115#is

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub
findepi commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2635850123 > I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, there’s a critical distinction: **System contracts

Re: [PR] refactor: remove uses of `arrow_buffer` & `arrow_array` and use reexport in arrow instead [datafusion]

2025-02-04 Thread via GitHub
Chen-Yuan-Lai closed pull request #14495: refactor: remove uses of `arrow_buffer` & `arrow_array` and use reexport in arrow instead URL: https://github.com/apache/datafusion/pull/14495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2635853610 @comphead, would you be willing to mentor a student on a project to study our codebase and dependencies to reduce DF binary size? -- This is an automated message from the Ap

Re: [I] Browser-accessible official DataFusion playground / DataFusion fiddle [datafusion]

2025-02-04 Thread via GitHub
gabotechs commented on issue #13818: URL: https://github.com/apache/datafusion/issues/13818#issuecomment-2635854786 Made it a while ago and still use it for quickly trying out stuff, there are lots of low hanging fruits for improving it (local storage based query history, syntax/error highl

[PR] fix: rewrite fetch, skip of the Limit node in correct order [datafusion]

2025-02-04 Thread via GitHub
evenyag opened a new pull request, #14496: URL: https://github.com/apache/datafusion/pull/14496 ## Which issue does this PR close? Closes #. ## Rationale for this change We found a bug related to `with_new_exprs()` for the `Limit` plan in https://github.c

[PR] Fix link to volcano parallelism paper [datafusion]

2025-02-04 Thread via GitHub
lewiszlw opened a new pull request, #14497: URL: https://github.com/apache/datafusion/pull/14497 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] [WIP] checks for build wasm [datafusion]

2025-02-04 Thread via GitHub
xudong963 commented on PR #14493: URL: https://github.com/apache/datafusion/pull/14493#issuecomment-2635936440 duplicate with #14494 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix(ci): build error with wasm [datafusion]

2025-02-04 Thread via GitHub
Lordworms commented on PR #14494: URL: https://github.com/apache/datafusion/pull/14494#issuecomment-2635927296 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP] checks for build wasm [datafusion]

2025-02-04 Thread via GitHub
Lordworms commented on PR #14493: URL: https://github.com/apache/datafusion/pull/14493#issuecomment-2635946177 > duplicate with #14494 ? this is a test to prove the issue has nothing to do with my previous PR, I'll close it. -- This is an automated message from the Apache Git Servi

Re: [PR] [WIP] checks for build wasm [datafusion]

2025-02-04 Thread via GitHub
Lordworms closed pull request #14493: [WIP] checks for build wasm URL: https://github.com/apache/datafusion/pull/14493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[PR] Always use `StringViewArray` as output of `substr` [datafusion]

2025-02-04 Thread via GitHub
Kev1n8 opened a new pull request, #14498: URL: https://github.com/apache/datafusion/pull/14498 ## Which issue does this PR close? Closes #12338 ## Rationale for this change Generate `StringViewArray` whatever input type is for efficiency. ## What change

[PR] Feat: Add fetch to CoalescePartitionsExec [datafusion]

2025-02-04 Thread via GitHub
mertak-synnada opened a new pull request, #14499: URL: https://github.com/apache/datafusion/pull/14499 ## Which issue does this PR close? Closes #14446. ## Rationale for this change ## What changes are included in this PR? ## Are these chang

[I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
ozankabak opened a new issue, #14478: URL: https://github.com/apache/datafusion/issues/14478 ## Context Apache DataFusion is putting together an application for [GSoC 2025](https://summerofcode.withgoogle.com/). For those who do not know, GSoC is a Google-sponsored program for studen

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-04 Thread via GitHub
berkaysynnada commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2633502146 Thank you @Garamda, the code is well-written and has a good documentation. I couldn't fully think on the details of the implementation, but as an early review, I can suggest ad

Re: [I] Limits are not applied correctly [datafusion]

2025-02-04 Thread via GitHub
berkaysynnada closed issue #14406: Limits are not applied correctly URL: https://github.com/apache/datafusion/issues/14406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] fix: Limits are not applied correctly [datafusion]

2025-02-04 Thread via GitHub
berkaysynnada merged PR #14418: URL: https://github.com/apache/datafusion/pull/14418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] Replace `once_cell::Lazy` with `std::sync::LazyLock` [datafusion]

2025-02-04 Thread via GitHub
mbrobbel opened a new pull request, #14480: URL: https://github.com/apache/datafusion/pull/14480 ## Which issue does this PR close? None. ## Rationale for this change With the MSRV > 1.80 we can use `std::sync::LazyLock` instead of `once_cell::Lazy`. ## What chang

Re: [PR] fix: Limits are not applied correctly [datafusion]

2025-02-04 Thread via GitHub
berkaysynnada commented on PR #14418: URL: https://github.com/apache/datafusion/pull/14418#issuecomment-2633517827 Thank you @zhuqi-lucas, @mertak-synnada and @xudong963. This looks good to me now, and I'm merging it. I guess @mertak-synnada will open a follow-up PR removing the explicit ca

Re: [I] Proper NULL handling in array functions [datafusion]

2025-02-04 Thread via GitHub
alan910127 commented on issue #14451: URL: https://github.com/apache/datafusion/issues/14451#issuecomment-2633515022 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-02-04 Thread via GitHub
pmcgleenon commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2634016553 PR raised to update Clickbench results for datafusion `44.0.0` https://github.com/ClickHouse/ClickBench/pull/301 -- This is an automated message from the Apache Git Service

Re: [I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2634437150 @mertak-synnada has been working on benchmarking Synnada's fork, so he may be a good co-mentor (along with someone from InfluxData) for the continuous monitoring project. --

Re: [I] [EPIC] Add support for all Map functions [datafusion-comet]

2025-02-04 Thread via GitHub
kazantsev-maksim commented on issue #1044: URL: https://github.com/apache/datafusion-comet/issues/1044#issuecomment-2634495275 I would like work on `map_filter function` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2634527308 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Attach `Diagnostic` to "invalid function argument types" error [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14431: URL: https://github.com/apache/datafusion/issues/14431#issuecomment-2634526226 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2634526736 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Attach `Diagnostic` to "more than one column in subquery" error [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14438: URL: https://github.com/apache/datafusion/issues/14438#issuecomment-2634528655 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Attach `Diagnostic` to syntax errors [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14437: URL: https://github.com/apache/datafusion/issues/14437#issuecomment-2634528364 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Attach `Diagnostic` to "duplicate table name" error [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14436: URL: https://github.com/apache/datafusion/issues/14436#issuecomment-2634527995 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Attach `Diagnostic` to "function x does not exist" error [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14430: URL: https://github.com/apache/datafusion/issues/14430#issuecomment-2634525578 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

Re: [I] Emit warning with attached `Diagnostic` when doing `= NULL` [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #14434: URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2634527624 I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cas

[I] `array_slice` can't correctly handle NULL parameters or some edge cases [datafusion]

2025-02-04 Thread via GitHub
jonahgao opened a new issue, #10548: URL: https://github.com/apache/datafusion/issues/10548 ### Describe the bug These queries will give errors or incorrect results. ### To Reproduce Run queries in CLI: ```sh DataFusion CLI v38.0.0 > select array_slice([1,2,3

Re: [I] `array_slice` can't correctly handle NULL parameters or some edge cases [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #10548: URL: https://github.com/apache/datafusion/issues/10548#issuecomment-2634535126 Reopened per @jkosh44 Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-02-04 Thread via GitHub
alamb commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2634545461 Thanks for the update @buraksenn -- let's see if someone else can push it forward between now and when you get back. Take care -- This is an automated message from the Apach

Re: [I] Optimize SortPreservingMergeExec for single-column merge [datafusion]

2025-02-04 Thread via GitHub
Dandandan closed issue #13642: Optimize SortPreservingMergeExec for single-column merge URL: https://github.com/apache/datafusion/issues/13642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-02-04 Thread via GitHub
ozankabak commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2633850648 Dear all, this PR is ready for the another round of reviews. I think it is in decent shape, and I'm looking forward to getting more feedback. Old source operators are merely wrappe

[PR] Add `Cargo.lock` [datafusion]

2025-02-04 Thread via GitHub
mbrobbel opened a new pull request, #14483: URL: https://github.com/apache/datafusion/pull/14483 ## Which issue does this PR close? Closes #14135. ## Rationale for this change See linked issue. ## What changes are included in this PR? - Remove `Cargo.lock`

[I] datafusion-ray is not published to pypi.org [datafusion-ray]

2025-02-04 Thread via GitHub
sairamkrish opened a new issue, #59: URL: https://github.com/apache/datafusion-ray/issues/59 Hi We are trying to use datafusion-ray. To get started we tried to pip install datafusion-ray. However `datafusion-ray` is not published. Is it still in testing phase ? If someone wants

Re: [PR] feat: Add regexp_split_to_array function [datafusion]

2025-02-04 Thread via GitHub
berkaysynnada commented on PR #13110: URL: https://github.com/apache/datafusion/pull/13110#issuecomment-2633675770 > hey @buraksenn , are you still tracking implementing this feature ? You can take it if you wish. He will not be available for 1 month approx. -- This is an automated

Re: [I] Run DataFusion benchmarks regularly and track performance history over time [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #5504: URL: https://github.com/apache/datafusion/issues/5504#issuecomment-2633678177 Here is a suggestion on how to proceed with this project: 1. Create the converter from bench json --> line protocol (e.g. https://github.com/apache/datafusion/issues/6107) 2. W

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2633683712 > If this looks ok, I can create a PR on clickbench to update the results First of all, thank you so much @pmcgleenon -- this is super helpful as always 🙏 I looked

Re: [PR] Support vectorized append and compare for multi group by [datafusion]

2025-02-04 Thread via GitHub
alamb commented on PR #12996: URL: https://github.com/apache/datafusion/pull/12996#issuecomment-2633686313 Update here is that this is looking like it results in some sweet clickbench improvements: - https://github.com/apache/datafusion/issues/13983#issuecomment-2632053433 -- This is a

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-02-04 Thread via GitHub
mertak-synnada commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2633698226 Here are some changes on plans, Old ParquetExec usage: ```rust use std::sync::Arc; use arrow::datatypes::Schema; use datafusion::datasource::physical_plan::{F

[I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

2025-02-04 Thread via GitHub
alamb opened a new issue, #14481: URL: https://github.com/apache/datafusion/issues/14481 ### Is your feature request related to a problem or challenge? @pmcgleenon ran ClickBench on DataFusion 44 ❤ - https://github.com/apache/datafusion/issues/13983#issuecomment-2632053433 H

[I] [EPIC] A(nother) list of performance improvement tickets [datafusion]

2025-02-04 Thread via GitHub
alamb opened a new issue, #14482: URL: https://github.com/apache/datafusion/issues/14482 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] [EPIC] A list of performance improvement tickets [datafusion]

2025-02-04 Thread via GitHub
alamb closed issue #5546: [EPIC] A list of performance improvement tickets URL: https://github.com/apache/datafusion/issues/5546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [EPIC] A list of performance improvement tickets [datafusion]

2025-02-04 Thread via GitHub
alamb commented on issue #5546: URL: https://github.com/apache/datafusion/issues/5546#issuecomment-2633719293 I moved all items not yet completed to https://github.com/apache/datafusion/issues/14482 so we could have a fresher list -- This is an automated message from the Apache Git Servi

[PR] chore(deps): bump bytes from 1.9.0 to 1.10.0 in /datafusion-cli [datafusion]

2025-02-04 Thread via GitHub
dependabot[bot] opened a new pull request, #14476: URL: https://github.com/apache/datafusion/pull/14476 Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.9.0 to 1.10.0. Release notes Sourced from https://github.com/tokio-rs/bytes/releases";>bytes's releases. Bytes v1.10

[PR] chore(deps): bump clap from 4.5.27 to 4.5.28 in /datafusion-cli [datafusion]

2025-02-04 Thread via GitHub
dependabot[bot] opened a new pull request, #14477: URL: https://github.com/apache/datafusion/pull/14477 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.27 to 4.5.28. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.28 [4.5.

[I] Buildable release builds [datafusion]

2025-02-04 Thread via GitHub
findepi opened a new issue, #14479: URL: https://github.com/apache/datafusion/issues/14479 ### Problem description From security standpoint it would be great to have reproducible (byte-for-byte) release builds, however this issue is not about this (but is a prerequisite thereof).

Re: [I] Project Ideas for GSoC 2025 [datafusion]

2025-02-04 Thread via GitHub
Omega359 commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2633388087 I do not know what level students would be for GSoC however @alamb had some ideas for [CMU class projects](https://github.com/apache/datafusion/issues/14373) for university le

Re: [I] Buildable release builds [datafusion]

2025-02-04 Thread via GitHub
findepi commented on issue #14479: URL: https://github.com/apache/datafusion/issues/14479#issuecomment-2633394341 Proposed solution: commit `Cargo.lock` to the repository. Lack of `Cargo.lock` in the repo gives some benefit (checking with latest versions of unpinned deps automatically). T

Re: [PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]

2025-02-04 Thread via GitHub
Omega359 commented on code in PR #14367: URL: https://github.com/apache/datafusion/pull/14367#discussion_r1940851881 ## datafusion/functions/src/hash/xxhash.rs: ## @@ -0,0 +1,279 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2025-02-04 Thread via GitHub
findepi commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2633401529 Not sure it was Glare's perspective, but i guess this issue outgrew initial context already. From past SDF experience it would be beneficial if it was possible to checkout an

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-02-04 Thread via GitHub
Omega359 commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2633412221 I don't think I'll have the bandwidth to test a type coercion fix for UDF's myself this week to be honest. I'm about to fire off a full run of my application against the 45 bra

  1   2   3   >