Re: [I] Use Tokio::block_in_place [datafusion]

2024-12-09 Thread via GitHub
crepererum commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527658992 Couple of thoughts: - **runtime requirements:** `block_in_place` requires the multi-thread runtime. That's not a blocker, but we should clearly communicate that this m

Re: [I] Use Tokio::block_in_place [datafusion]

2024-12-09 Thread via GitHub
tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527556813 Yeah, there are tools like tokio-console and the runtime metrics that could help to find problematic codepaths, but I'm not going to pretend it is trivial. I do think it may en

[I] Make SqlToRel respect parser options from ContextProvider [datafusion]

2024-12-09 Thread via GitHub
niebayes opened a new issue, #13700: URL: https://github.com/apache/datafusion/issues/13700 `SqlToRel` provides two constructors: `new` and `new_with_options`. The former uses default `ParserOptions`, while the latter accepts `ParserOptions` as an input parameter. Since `SqlToRel` al

Re: [I] Make SqlToRel respect parser options from ContextProvider [datafusion]

2024-12-09 Thread via GitHub
niebayes commented on issue #13700: URL: https://github.com/apache/datafusion/issues/13700#issuecomment-2527692104 If this modification makes sense, I'd like to file a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Optimize performance of `initcap` function (~2x faster) [datafusion]

2024-12-09 Thread via GitHub
Weijun-H commented on code in PR #13691: URL: https://github.com/apache/datafusion/pull/13691#discussion_r1875851633 ## datafusion/functions/src/string/initcap.rs: ## @@ -132,21 +132,22 @@ fn initcap_utf8view(args: &[ArrayRef]) -> Result { Ok(Arc::new(result) as ArrayRef)

Re: [PR] Optimize performance of `initcap` function (~2x faster) [datafusion]

2024-12-09 Thread via GitHub
Weijun-H commented on code in PR #13691: URL: https://github.com/apache/datafusion/pull/13691#discussion_r1875851633 ## datafusion/functions/src/string/initcap.rs: ## @@ -132,21 +132,22 @@ fn initcap_utf8view(args: &[ArrayRef]) -> Result { Ok(Arc::new(result) as ArrayRef)

Re: [I] doc-gen: Migrate windows functions from code based documentation to attribute based [datafusion]

2024-12-09 Thread via GitHub
zjregee commented on issue #13670: URL: https://github.com/apache/datafusion/issues/13670#issuecomment-2527778327 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Optimize performance of `initcap` function (~2x faster) [datafusion]

2024-12-09 Thread via GitHub
Dandandan commented on code in PR #13691: URL: https://github.com/apache/datafusion/pull/13691#discussion_r1875885799 ## datafusion/functions/src/string/initcap.rs: ## @@ -132,21 +132,22 @@ fn initcap_utf8view(args: &[ArrayRef]) -> Result { Ok(Arc::new(result) as ArrayRef)

Re: [I] Remove record_batch! macro once upstream updates [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13037: URL: https://github.com/apache/datafusion/issues/13037#issuecomment-2527749169 > Should this be done? I saw this #12846. I can delete create_array and record_batch macros to make it use the same as in arrow. Also they are also not identical: > > * Dat

Re: [PR] [comet-parquet-exec] Change path handling to fix URL decoding [datafusion-comet]

2024-12-09 Thread via GitHub
andygrove merged PR #1149: URL: https://github.com/apache/datafusion-comet/pull/1149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] Alamb/clarify deprecation [datafusion]

2024-12-09 Thread via GitHub
alamb opened a new pull request, #13701: URL: https://github.com/apache/datafusion/pull/13701 ## Which issue does this PR close? Closes #. ## Rationale for this change While discussing https://github.com/apache/datafusion/issues/13037#issuecomment-2526291502 with

Re: [I] Home page of Comet not available to access build instructions [datafusion-comet]

2024-12-09 Thread via GitHub
andygrove closed issue #1153: Home page of Comet not available to access build instructions URL: https://github.com/apache/datafusion-comet/issues/1153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Home page of Comet not available to access build instructions [datafusion-comet]

2024-12-09 Thread via GitHub
andygrove commented on issue #1153: URL: https://github.com/apache/datafusion-comet/issues/1153#issuecomment-2527808045 @ajeyabsfujitsu it is back online now - https://datafusion.apache.org/comet/ Thanks for reporting this -- This is an automated message from the Apache Git Service

Re: [I] Remove record_batch! macro once upstream updates [datafusion]

2024-12-09 Thread via GitHub
buraksenn commented on issue #13037: URL: https://github.com/apache/datafusion/issues/13037#issuecomment-2527809229 > > Should this be done? I saw this #12846. I can delete create_array and record_batch macros to make it use the same as in arrow. Also they are also not identical: > >

Re: [I] Use Tokio::block_in_place [datafusion]

2024-12-09 Thread via GitHub
tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527819635 The performance implications definitely concern me, I have a nagging suspicion block_in_place spawns a thread... An arguably better solution would be to instead spawn the

Re: [I] Use Tokio::block_in_place [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527827318 > It's relatively easy to, at a low level, know where IO calls are happening. In fact Tokio lets you configure the runtime to panic if you do IO at all (which makes it easy to

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-09 Thread via GitHub
milenkovicm commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2527836833 I like option 3 even more. Thanks @pinarbayata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Optimize performance of `initcap` function (~2x faster) [datafusion]

2024-12-09 Thread via GitHub
Weijun-H commented on code in PR #13691: URL: https://github.com/apache/datafusion/pull/13691#discussion_r1875851633 ## datafusion/functions/src/string/initcap.rs: ## @@ -132,21 +132,22 @@ fn initcap_utf8view(args: &[ArrayRef]) -> Result { Ok(Arc::new(result) as ArrayRef)

Re: [PR] Optimize performance of `character_length` function [datafusion]

2024-12-09 Thread via GitHub
tlm365 commented on code in PR #13696: URL: https://github.com/apache/datafusion/pull/13696#discussion_r1875613725 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -136,31 +136,40 @@ fn character_length(args: &[ArrayRef]) -> Result { } } -fn character_leng

Re: [PR] Add support for TABLESAMPLE [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
iffyio commented on code in PR #1580: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1580#discussion_r1875615805 ## tests/sqlparser_snowflake.rs: ## @@ -2952,3 +2950,33 @@ fn test_sf_double_dot_notation() { #[test] fn test_parse_double_dot_notation_wrong_positi

[PR] PoC improving RepartitionEx [datafusion]

2024-12-09 Thread via GitHub
Dandandan opened a new pull request, #13699: URL: https://github.com/apache/datafusion/pull/13699 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-09 Thread via GitHub
alamb commented on PR #13675: URL: https://github.com/apache/datafusion/pull/13675#issuecomment-2527546119 When I ran the planning benchmarks like this ```shell cargo bench --bench sql_planner ``` I saw a pretty consistent 2-3% slowdown: ``` ++ critcmp main 8027-

Re: [PR] [minor]: Simplifications [datafusion]

2024-12-09 Thread via GitHub
jayzhan211 merged PR #13697: URL: https://github.com/apache/datafusion/pull/13697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] [minor]: Simplifications [datafusion]

2024-12-09 Thread via GitHub
jayzhan211 commented on PR #13697: URL: https://github.com/apache/datafusion/pull/13697#issuecomment-2527735805 Thanks @akurmustafa @Weijun-H -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Add `SessionConfig` reference to `ScalarFunctionArgs` [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13519: URL: https://github.com/apache/datafusion/issues/13519#issuecomment-2527739183 > Not really, DataType has no nullable info, we have to send nullable to ScalarFunctionArgs As I was thinking about the nullable info that is part of embedded `Field`s --

[PR] Update prost-build requirement from =0.13.3 to =0.13.4 [datafusion]

2024-12-09 Thread via GitHub
dependabot[bot] opened a new pull request, #13698: URL: https://github.com/apache/datafusion/pull/13698 Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CH

Re: [I] Add related source code locations to errors [datafusion]

2024-12-09 Thread via GitHub
eliaperantoni commented on issue #13662: URL: https://github.com/apache/datafusion/issues/13662#issuecomment-2527259854 > > ``` > > (line 1, column 8) error: 'users.name' in projection does not appear in GROUP BY clause > > (line 1, column 33) note: GROUP BY clause is here > > (line

[I] Home page of Comet not available to access build instructions [datafusion-comet]

2024-12-09 Thread via GitHub
ajeyabsfujitsu opened a new issue, #1153: URL: https://github.com/apache/datafusion-comet/issues/1153 ### Describe the bug The installation instructions link in README leads to a 404. The Apache Comet page in general shows a 404. I don't see the build/install information anywh

Re: [I] Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable` [datafusion]

2024-12-09 Thread via GitHub
crepererum commented on issue #13433: URL: https://github.com/apache/datafusion/issues/13433#issuecomment-2527472455 TBH the last two batches are rather hard: ## `hashbrown` 0.14 & allocation size `hashbrown` 0.14 doesn't expose the allocation size for `HashTable` which we would ne

Re: [I] Improve RepartitionExec for better query performance [datafusion]

2024-12-09 Thread via GitHub
Dandandan commented on issue #7001: URL: https://github.com/apache/datafusion/issues/7001#issuecomment-2527479510 Doing experiments in https://github.com/apache/datafusion/pull/13699 if anyone is interested, results seem to look very promising (need to double check). -- This is an automa

Re: [I] Improve RepartitionExec for better query performance [datafusion]

2024-12-09 Thread via GitHub
Dandandan commented on issue #7001: URL: https://github.com/apache/datafusion/issues/7001#issuecomment-2527559908 This is for TPC-H SF=1: ``` ┏━━┳━━┳━━┳━━━┓ ┃ Query┃ main ┃ repartition_exec_poc ┃Change ┃ ┡

Re: [I] Use Tokio::block_in_place [datafusion]

2024-12-09 Thread via GitHub
tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527850275 > I can try to take another shot at https://github.com/apache/datafusion/pull/13690 to more fully annotate IO in DataFusion. IMO the key challenge is async is an abstrac

Re: [I] doc-gen: Migrate builtin scalar functions from code based documentation to attribute based [datafusion]

2024-12-09 Thread via GitHub
Chen-Yuan-Lai commented on issue #13671: URL: https://github.com/apache/datafusion/issues/13671#issuecomment-2527959863 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] `TypeSignatureClass` for mixed type function signature [datafusion]

2024-12-09 Thread via GitHub
jayzhan211 commented on code in PR #13372: URL: https://github.com/apache/datafusion/pull/13372#discussion_r1875994647 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -60,72 +62,26 @@ impl DatePartFunc { Self { signature: Signature::one_of(

Re: [PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-09 Thread via GitHub
alamb commented on PR #13675: URL: https://github.com/apache/datafusion/pull/13675#issuecomment-2527982335 I see much less difference this time 🤔 (I also ran `cargo update` which might have helped) ``` ++ critcmp main 8027-refactor-hashmap group

Re: [PR] add partitioning scheme for unresolved shuffle and shuffle reader exec [datafusion-ballista]

2024-12-09 Thread via GitHub
milenkovicm commented on code in PR #1144: URL: https://github.com/apache/datafusion-ballista/pull/1144#discussion_r1876003273 ## ballista/core/proto/ballista.proto: ## @@ -50,14 +50,15 @@ message ShuffleWriterExecNode { message UnresolvedShuffleExecNode { uint32 stage_id =

Re: [PR] add partitioning scheme for unresolved shuffle and shuffle reader exec [datafusion-ballista]

2024-12-09 Thread via GitHub
onursatici commented on code in PR #1144: URL: https://github.com/apache/datafusion-ballista/pull/1144#discussion_r1876001066 ## ballista/core/proto/ballista.proto: ## @@ -50,14 +50,15 @@ message ShuffleWriterExecNode { message UnresolvedShuffleExecNode { uint32 stage_id =

Re: [PR] [comet-parquet-exec] Add Native Scan to CometReadBenchmark [datafusion-comet]

2024-12-09 Thread via GitHub
mbutrovich commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2528050451 I'll check if the benchmarks use the settings from CometTestBase or the global defaults. If so, then @parthchandra is right. -- This is an automated message from the Apach

Re: [PR] chore: reinstate find_df_window_func [datafusion]

2024-12-09 Thread via GitHub
alamb commented on code in PR #13708: URL: https://github.com/apache/datafusion/pull/13708#discussion_r1876523695 ## datafusion/expr/src/expr.rs: ## @@ -840,6 +840,12 @@ impl WindowFunction { } } +/// Find DataFusion's built-in window function by name. +#[deprecated(sinc

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-09 Thread via GitHub
tbar4 commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2528995986 > @Epicism @liurenjie1024 @ozankabak @cisaacson @tbar4 @alamb would you mind voting on "new 3" vs 4? 3 is amazing and has my vote! So excited for this new logo (what

Re: [PR] chore: Move more expressions from core crate to spark-expr crate [datafusion-comet]

2024-12-09 Thread via GitHub
andygrove merged PR #1152: URL: https://github.com/apache/datafusion-comet/pull/1152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Document SQL dialect [datafusion]

2024-12-09 Thread via GitHub
alamb commented on PR #13706: URL: https://github.com/apache/datafusion/pull/13706#issuecomment-2529335609 I plan to leave this open for several days to allow additional time for reviews / comments -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Write "upgrade guide" for DataFusion 44.0.0 [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13702: URL: https://github.com/apache/datafusion/issues/13702#issuecomment-2528515740 ``` error[E0308]: mismatched types --> core/src/execution/datafusion/planner.rs:1934:13 | 1929 | datafusion::physical_plan::windows::create_window_e

Re: [I] [EPIC] Improve sqlparser performance [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
alamb commented on issue #1557: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1557#issuecomment-2528566898 > Locally, cargo flamegraph --bench sqlparser_bench does not appear to be doing the trick as the flamegraph appears to be at most measuring a single iteration of the b

[PR] chore: reinstate find_df_window_func [datafusion]

2024-12-09 Thread via GitHub
andygrove opened a new pull request, #13708: URL: https://github.com/apache/datafusion/pull/13708 ## Which issue does this PR close? N/A ## Rationale for this change Public API function `find_df_window_func` was removed in https://github.com/apache/datafu

Re: [I] Write "upgrade guide" for DataFusion 44.0.0 [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13702: URL: https://github.com/apache/datafusion/issues/13702#issuecomment-2528349026 `GroupValues` and `new_group_values` were removed from the public API without being deprecated first. It does not impact Comet; I'm just pointing this out. -- This is an au

[PR] Always add round robin repartitioning to leaves [datafusion]

2024-12-09 Thread via GitHub
Dandandan opened a new pull request, #13707: URL: https://github.com/apache/datafusion/pull/13707 ## Which issue does this PR close? Closes #. ## Rationale for this change We have some logic to not introduce RoundRobin repartitioning whenever number of child part

Re: [PR] Document SQL dialect [datafusion]

2024-12-09 Thread via GitHub
Kimahriman commented on code in PR #13706: URL: https://github.com/apache/datafusion/pull/13706#discussion_r1876197893 ## docs/source/user-guide/sql/dialect.md: ## @@ -0,0 +1,38 @@ + + +# SQL Dialect + +By default, DataFusion follows the [PostgreSQL SQL dialect]. +For Array/List

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1876302358 ## src/dialect/snowflake.rs: ## @@ -234,6 +234,10 @@ impl Dialect for SnowflakeDialect { RESERVED_FOR_IDENTIFIER.contains(&kw)

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2528609728 Testing before we create the RC makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Retry logic in ParquetSink [datafusion]

2024-12-09 Thread via GitHub
wiedld commented on issue #13679: URL: https://github.com/apache/datafusion/issues/13679#issuecomment-2528810636 Ah, my apologies. I missed that was part of the object store clients. TY! -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[PR] fix union serialisation order in proto [datafusion]

2024-12-09 Thread via GitHub
onursatici opened a new pull request, #13709: URL: https://github.com/apache/datafusion/pull/13709 ## Which issue does this PR close? Closes #. ## Rationale for this change Unions with multiple tables result in an optimised plan with an Aggregate Node on top, exa

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-09 Thread via GitHub
andygrove commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2528900780 @Epicism @liurenjie1024 @ozankabak @cisaacson @tbar4 @alamb would you mind voting on "new 3" vs 4? -- This is an automated message from the Apache Git Service. To re

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2024-12-09 Thread via GitHub
tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2528975789 That'd be very useful, Influx also had a reproducer but it uses customer data and I no longer have access to it having left, but Marco or Andrew can probably run it as well. Ad

Re: [I] Improve performance of db-benchmark query 8 [datafusion]

2024-12-09 Thread via GitHub
akurmustafa commented on issue #13586: URL: https://github.com/apache/datafusion/issues/13586#issuecomment-2529057249 > Possibly it's used via `SortExec`? This AFAIK also merges larger inputs via `SortPreservingMergeStream`. You are right, I missed this. I think `ExternalSorter` uses

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-09 Thread via GitHub
ozankabak commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2529145880 I still like 4 slightly more, but they are not that different so I'm basically OK with either :) -- This is an automated message from the Apache Git Service. To resp

Re: [I] Improve performance of db-benchmark query 8 [datafusion]

2024-12-09 Thread via GitHub
Dandandan commented on issue #13586: URL: https://github.com/apache/datafusion/issues/13586#issuecomment-2529145431 > > Possibly it's used via `SortExec`? This AFAIK also merges larger inputs via `SortPreservingMergeStream`. > > You are right, I missed this. I think `ExternalSorter` u

[PR] chore: Remove dead code [datafusion-comet]

2024-12-09 Thread via GitHub
andygrove opened a new pull request, #1155: URL: https://github.com/apache/datafusion-comet/pull/1155 ## Which issue does this PR close? N/A ## Rationale for this change We were specifying `#[allow(dead_code)]` for the entire core crate rather than just w

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-09 Thread via GitHub
andygrove commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2529322442 I am also fine with either. I think I slightly prefer 4 now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2529361202 > For my part, I'm going to spend the next few hours seeing if I can boil down an initial reproducer per the ask [here](https://github.com/apache/datafusion/pull/13424#issuecommen

Re: [PR] Support INSERT OVERWRITE INTO syntax [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
alamb merged PR #1584: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] An error occurred when the sort push down rule pushed sort below join [datafusion]

2024-12-09 Thread via GitHub
alamb closed issue #13559: An error occurred when the sort push down rule pushed sort below join URL: https://github.com/apache/datafusion/issues/13559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Support INSERT OVERWRITE INTO syntax [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
alamb commented on PR #1584: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1584#issuecomment-2529370833 Thanks @yuval-illumex and @iffyio -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Fix hash join with sort push down [datafusion]

2024-12-09 Thread via GitHub
alamb merged PR #13560: URL: https://github.com/apache/datafusion/pull/13560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable` [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13433: URL: https://github.com/apache/datafusion/issues/13433#issuecomment-2529379554 > This is a bit of a wild interface and I dunno if it's worth the 15% uplift TBH: I think this is something @avantgardnerio and @thinkharderdev may know more about. Perha

Re: [PR] Reorganize the Parser module [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
alamb commented on PR #1581: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1581#issuecomment-2529390710 > Thanks for drafting this @davisp! It does sound reasonable to me, I imagine this could help editors as you mention. cc @alamb for thoughts if something like this could be

Re: [I] Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable` [datafusion]

2024-12-09 Thread via GitHub
avantgardnerio commented on issue #13433: URL: https://github.com/apache/datafusion/issues/13433#issuecomment-2529393059 > I think this is something @avantgardnerio and @thinkharderdev may know more about. The main reason that 15% was so important was to keep it on par with the unopt

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2528100895 I'd like to see an upgrade guide for the 44 release (and am willing to take the lead on this). I am trying to upgrade Comet now and am running into some issues. I am +1

Re: [PR] Add related source code locations to errors [datafusion]

2024-12-09 Thread via GitHub
eliaperantoni commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2528108553 @alamb here's a few examples of what the new diagnostics that I implemented so far can do: ![image](https://github.com/user-attachments/assets/01356e00-0228-44e7-9e49-3d

Re: [I] Write "upgrade guide" for DataFusion 44.0.0 [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13702: URL: https://github.com/apache/datafusion/issues/13702#issuecomment-2528110307 I plan on adding notes to this issue as I encounter issues while upgrading Comet to use latest DF. First couple of issues: - `down_cast_any_ref` was removed from the pub

[I] Write "upgrade guide" for DataFusion 44.0.0 [datafusion]

2024-12-09 Thread via GitHub
andygrove opened a new issue, #13702: URL: https://github.com/apache/datafusion/issues/13702 ### Is your feature request related to a problem or challenge? DataFusion 44.0.0 has breaking changes that require downstream projects to make code changes. Let's document these as we upgrade

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2528105749 I filed https://github.com/apache/datafusion/issues/13702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Update documentation guidelines for contribution content [datafusion]

2024-12-09 Thread via GitHub
alamb opened a new pull request, #13703: URL: https://github.com/apache/datafusion/pull/13703 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/12357 - related to https://github.com/apache/datafusion/issues/13648 ## Rationale for this

Re: [I] [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #12357: URL: https://github.com/apache/datafusion/issues/12357#issuecomment-2528116365 Sorry for the delay -- here is a proposal for specific wording: - https://github.com/apache/datafusion/pull/13703 -- This is an automated message from the Apache Git S

Re: [PR] fix: Implicitly plan `UNNEST` as lateral [datafusion]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #13695: URL: https://github.com/apache/datafusion/pull/13695#discussion_r1876080057 ## datafusion/sqllogictest/test_files/unnest.slt: ## @@ -860,6 +860,12 @@ select count(*) from (select unnest(range(0, 10)) id) t inner join (select u

Re: [PR] Update documentation guidelines for contribution content [datafusion]

2024-12-09 Thread via GitHub
alamb commented on PR #13703: URL: https://github.com/apache/datafusion/pull/13703#issuecomment-2528119373 Rather than state some hard rule, I went with the approach of guidance on what would need more / less discussion prior to acceptance. I felt this would give us flexibility but still gi

Re: [I] Write "upgrade guide" for DataFusion 44.0.0 [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13702: URL: https://github.com/apache/datafusion/issues/13702#issuecomment-2528140054 next issue: ``` 60 | signature: Signature::coercible(vec![DataType::Float64], Volatility::Immutable), |

[PR] Document SQL dialect [datafusion]

2024-12-09 Thread via GitHub
alamb opened a new pull request, #13706: URL: https://github.com/apache/datafusion/pull/13706 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/13704 ## Rationale for this change Determining what is the "correct behavior"

[PR] chore: reinstate down_cast_any_ref [datafusion]

2024-12-09 Thread via GitHub
andygrove opened a new pull request, #13705: URL: https://github.com/apache/datafusion/pull/13705 ## Which issue does this PR close? Part of ## Rationale for this change The public function `down_cast_any_ref` was removed in https://github.com/apache/dat

Re: [PR] Fix S3 in CLI: Do not normalize options values [datafusion]

2024-12-09 Thread via GitHub
blaginin commented on code in PR #13576: URL: https://github.com/apache/datafusion/pull/13576#discussion_r1876201273 ## datafusion/common/src/config.rs: ## @@ -973,16 +980,24 @@ impl ConfigField for Option { #[macro_export] macro_rules! config_field { -($t:ty) => { +

Re: [PR] Document SQL dialect [datafusion]

2024-12-09 Thread via GitHub
alamb commented on code in PR #13706: URL: https://github.com/apache/datafusion/pull/13706#discussion_r1876204773 ## docs/source/user-guide/sql/dialect.md: ## @@ -0,0 +1,38 @@ + + +# SQL Dialect + +By default, DataFusion follows the [PostgreSQL SQL dialect]. +For Array/List func

Re: [I] [DISCUSS] More extensive pre-release testing [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13661: URL: https://github.com/apache/datafusion/issues/13661#issuecomment-2528302271 > > It is my understanding that the apache voting / approval process prevents automated builds > > That's my understanding too, but i hope this process isn't nonnegotiable.

Re: [PR] `TypeSignatureClass` for mixed type function signature [datafusion]

2024-12-09 Thread via GitHub
alamb commented on code in PR #13372: URL: https://github.com/apache/datafusion/pull/13372#discussion_r1876234505 ## datafusion/expr-common/src/signature.rs: ## @@ -138,6 +141,48 @@ pub enum TypeSignature { NullAry, } +impl TypeSignature { +#[inline] +pub fn is_o

Re: [I] Write "upgrade guide" for DataFusion 44.0.0 [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13702: URL: https://github.com/apache/datafusion/issues/13702#issuecomment-2528497573 public function `find_df_window_func` was removed in https://github.com/apache/datafusion/pull/13201. It would have been better to mark it deprecated and have it return `None`

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1876251286 ## src/ast/mod.rs: ## @@ -624,6 +590,12 @@ pub enum Expr { Identifier(Ident), /// Multi-part identifier, e.g. `table_alias.column` or `sche

Re: [PR] Add support for TABLESAMPLE [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
iffyio commented on code in PR #1580: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1580#discussion_r1876252525 ## tests/sqlparser_snowflake.rs: ## @@ -2952,3 +2950,19 @@ fn test_sf_double_dot_notation() { #[test] fn test_parse_double_dot_notation_wrong_positi

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1876266146 ## src/ast/mod.rs: ## @@ -1289,12 +1267,19 @@ impl fmt::Display for Expr { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match

[PR] chore: Upgrade to latest DataFusion [datafusion-comet]

2024-12-09 Thread via GitHub
andygrove opened a new pull request, #1154: URL: https://github.com/apache/datafusion-comet/pull/1154 ## Which issue does this PR close? N/A Builds on https://github.com/apache/datafusion-comet/pull/1152 ## Rationale for this change Start updating i

Re: [PR] [comet-parquet-exec] Comet parquet exec 2 (copy of Parth's PR) [datafusion-comet]

2024-12-09 Thread via GitHub
parthchandra commented on PR #1138: URL: https://github.com/apache/datafusion-comet/pull/1138#issuecomment-2528772645 Not really. They are pretty much independent. The difference is that POC 1 uses DF ParquetExec operator directly. POC 2 uses arrow reader to replace the existing native col

Re: [I] Retry logic in ParquetSink [datafusion]

2024-12-09 Thread via GitHub
wiedld closed issue #13679: Retry logic in ParquetSink URL: https://github.com/apache/datafusion/issues/13679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2024-12-09 Thread via GitHub
djanderson commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2528816860 For my part, I'm going to spend the next few hours seeing if I can boil down an initial reproducer per the ask [here](https://github.com/apache/datafusion/pull/13424#issuecom

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-09 Thread via GitHub
andygrove commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2528144687 How would everyone feel about increasing the length of release votes from 3 days to 7 days to give downstream projects more time to test the release? Sometimes the vote

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2528281718 > How would everyone feel about increasing the length of release votes from 3 days to 7 days to give downstream projects more time to test the release? > > Sometimes the vot

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1876207258 ## src/parser/mod.rs: ## @@ -1144,53 +1144,52 @@ impl<'a> Parser<'a> { w_span: Span, ) -> Result { match self.peek_token().tok

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1876216235 ## src/parser/mod.rs: ## @@ -1144,53 +1144,52 @@ impl<'a> Parser<'a> { w_span: Span, ) -> Result { match self.peek_token().tok

Re: [PR] feat: Expose Ballista Scheduler and Executor in Python [datafusion-ballista]

2024-12-09 Thread via GitHub
andygrove merged PR #1148: URL: https://github.com/apache/datafusion-ballista/pull/1148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix: specify roottype in substrait fieldreference [datafusion]

2024-12-09 Thread via GitHub
mbwhite commented on code in PR #13647: URL: https://github.com/apache/datafusion/pull/13647#discussion_r1876330697 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -2422,6 +2424,20 @@ mod test { Ok(()) } +#[test] +fn to_field_reference() -> Re

Re: [I] Expose Ballista Scheduler and Executor in Python [datafusion-ballista]

2024-12-09 Thread via GitHub
andygrove closed issue #1107: Expose Ballista Scheduler and Executor in Python URL: https://github.com/apache/datafusion-ballista/issues/1107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2024-12-09 Thread via GitHub
alamb commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2528696445 > Ultimately moving the CPU bound work off tokio isn't that complex, and the DF community has successfully pulled off far more complex initiatives (e.g. StringView). This

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-09 Thread via GitHub
goldmedal commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1876341502 ## src/parser/mod.rs: ## @@ -1427,6 +1426,112 @@ impl<'a> Parser<'a> { } } +/// Try to parse an [Expr::CompoundExpr] like `a.b.c`

  1   2   >