Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1904814032 ## datafusion/core/src/dataframe/mod.rs: ## @@ -2743,6 +2754,143 @@ mod tests { Ok(()) } +// test for https://github.com/apache/datafusion/issue

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1904829349 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -522,7 +527,7 @@ impl GroupedHashAggregateStream { let spill_state = SpillState {

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-06 Thread via GitHub
kevinjqliu commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1904816353 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1904826148 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [I] supports_filters_pushdown is invoked more than once on a single Custom Data Source [datafusion]

2025-01-06 Thread via GitHub
jonahgao commented on issue #13994: URL: https://github.com/apache/datafusion/issues/13994#issuecomment-2574284445 > if `&mut self` was passed to this fn it would be much easier to control the functionality. Since checking for supportability is more of a read-only operation, I think

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-06 Thread via GitHub
simonvandel commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1904839947 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: Off

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1904814032 ## datafusion/core/src/dataframe/mod.rs: ## @@ -2743,6 +2754,143 @@ mod tests { Ok(()) } +// test for https://github.com/apache/datafusion/issue

Re: [I] supports_filters_pushdown is invoked more than once on a single Custom Data Source [datafusion]

2025-01-06 Thread via GitHub
jonahgao commented on issue #13994: URL: https://github.com/apache/datafusion/issues/13994#issuecomment-2574295463 > One last question: If I have a query with [filterA,filterB] and on the initial call I return [Unsupported,Exact] and on a subsequent call you send me [filterA] and I return [

[PR] [comet-parquet-exec] fix: Fix null struct [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove opened a new pull request, #1226: URL: https://github.com/apache/datafusion-comet/pull/1226 ## Which issue does this PR close? N/A ## Rationale for this change Fix bug in reading null structs to fix some test failures ## What changes are i

Re: [PR] chore: Follow-on PR to fully enable onheap memory usage [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove merged PR #1210: URL: https://github.com/apache/datafusion-comet/pull/1210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-06 Thread via GitHub
comphead commented on PR #14026: URL: https://github.com/apache/datafusion/pull/14026#issuecomment-2574160403 Thats a really nice idea, thanks @nuno-faria -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-06 Thread via GitHub
jonathanc-n opened a new pull request, #14032: URL: https://github.com/apache/datafusion/pull/14032 ## Which issue does this PR close? Closes #13968 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573663473 If ScanExec will be rarely used and we would like to use ParquetExec for most time, maybe I can just add an internal cast to ScanExec if the schema is different. Though it m

[PR] Refac: make Nested Func public and implement Default trait [datafusion]

2025-01-06 Thread via GitHub
dharanad opened a new pull request, #14030: URL: https://github.com/apache/datafusion/pull/14030 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Refac: make Nested Func public and implement Default trait [datafusion]

2025-01-06 Thread via GitHub
dharanad commented on PR #14030: URL: https://github.com/apache/datafusion/pull/14030#issuecomment-2573939947 cc @alamb @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] fix: Simplify native scan config [datafusion-comet]

2025-01-06 Thread via GitHub
parthchandra opened a new pull request, #1225: URL: https://github.com/apache/datafusion-comet/pull/1225 ## Which issue does this PR close? Simplifies native scan config To choose a native scan implementation we can now set `spark.comet.scan.impl` Valid values are `native`, `

Re: [PR] fix: Simplify native scan config [datafusion-comet]

2025-01-06 Thread via GitHub
parthchandra commented on PR #1225: URL: https://github.com/apache/datafusion-comet/pull/1225#issuecomment-2573863212 @andygrove @mbutrovich The config defaults to `full_native`. The switch the implementation in tests, change the values in `CometConf`, `CometTestBase`, and `CometPlanS

Re: [PR] Feat: Add support for `array_size` [datafusion-comet]

2025-01-06 Thread via GitHub
dharanad closed pull request #1214: Feat: Add support for `array_size` URL: https://github.com/apache/datafusion-comet/pull/1214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Feat: Add support for `array_size` [datafusion-comet]

2025-01-06 Thread via GitHub
dharanad commented on PR #1214: URL: https://github.com/apache/datafusion-comet/pull/1214#issuecomment-2573867883 > There is already one PR for array_size support: #1122 I must have overlooked. Thanks for letting me know. Closing this PR -- This is an automated message from the Apa

Re: [PR] [comet-parquet-exec] fix: Simplify native scan config [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove merged PR #1225: URL: https://github.com/apache/datafusion-comet/pull/1225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Define extension API for user-defined invariants. [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #14029: URL: https://github.com/apache/datafusion/issues/14029#issuecomment-2573905410 For example, if we added a function like this to the `ExecutionPlan` trait, as proposed in https://github.com/apache/datafusion/pull/13986#discussion_r1901312798 I think that wou

Re: [I] Define extension API for user-defined invariants. [datafusion]

2025-01-06 Thread via GitHub
wiedld commented on issue #14029: URL: https://github.com/apache/datafusion/issues/14029#issuecomment-2573909488 > For example, if we added a function like this to the `ExecutionPlan` trait, as proposed in [#13986 (comment)](https://github.com/apache/datafusion/pull/13986#discussion_r190131

Re: [PR] Minor: Remove redundant implementation of `StringArrayType` [datafusion]

2025-01-06 Thread via GitHub
alamb commented on PR #14023: URL: https://github.com/apache/datafusion/pull/14023#issuecomment-2573992922 I pushed a commit to deprecate (rather than remove) the trait. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Minor: make nested functions public and implement Default trait [datafusion]

2025-01-06 Thread via GitHub
alamb merged PR #14030: URL: https://github.com/apache/datafusion/pull/14030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Follow-on PR to fully enable onheap memory usage [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on PR #1210: URL: https://github.com/apache/datafusion-comet/pull/1210#issuecomment-2573995471 Thanks for the reviews @viirya @kazuyukitanimura @Kontinuation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-06 Thread via GitHub
zhuqi-lucas commented on PR #13996: URL: https://github.com/apache/datafusion/pull/13996#issuecomment-2574217686 Hi @alamb This is the PR support for groupby first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1904803932 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -802,6 +807,45 @@ impl RecordBatchStream for GroupedHashAggregateStream { } } +// fix https://

[I] Enhance msrv check to check all crates [datafusion]

2025-01-06 Thread via GitHub
Jefffrey opened a new issue, #14022: URL: https://github.com/apache/datafusion/issues/14022 I wonder if should be checking more (or all) crates here? https://github.com/apache/datafusion/blob/b8b0c5584f9f3a3aeca730ef1ac23dafc3e76dde/.github/workflows/rust.yml#L594-L641

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-06 Thread via GitHub
jayzhan211 commented on code in PR #14020: URL: https://github.com/apache/datafusion/pull/14020#discussion_r1904073770 ## datafusion/functions/src/unicode/find_in_set.rs: ## @@ -138,31 +138,144 @@ fn find_in_set(args: &[ArrayRef]) -> Result { } } -pub fn find_in_set_gene

Re: [PR] [Minor] refactor: make ArraySort public for broader access [datafusion]

2025-01-06 Thread via GitHub
jayzhan211 commented on PR #14006: URL: https://github.com/apache/datafusion/pull/14006#issuecomment-2572979702 Thanks @dharanad @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [Minor] refactor: make ArraySort public for broader access [datafusion]

2025-01-06 Thread via GitHub
jayzhan211 merged PR #14006: URL: https://github.com/apache/datafusion/pull/14006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1904077638 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -802,6 +807,45 @@ impl RecordBatchStream for GroupedHashAggregateStream { } } +// fix https://

Re: [PR] `url` dependancy update [datafusion]

2025-01-06 Thread via GitHub
vadimpiven commented on code in PR #14019: URL: https://github.com/apache/datafusion/pull/14019#discussion_r1904083177 ## Cargo.toml: ## @@ -150,7 +150,7 @@ serde_json = "1" sqlparser = { version = "0.53.0", features = ["visitor"] } tempfile = "3" tokio = { version = "1.36",

Re: [PR] Use workspace rust-version for all workspace crates [datafusion]

2025-01-06 Thread via GitHub
alamb commented on PR #14009: URL: https://github.com/apache/datafusion/pull/14009#issuecomment-2572999127 (BTW welcome back @Jefffrey -- it is great to have you around!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Assert for invariants in tests and debug builds [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #594: URL: https://github.com/apache/datafusion/issues/594#issuecomment-2572965153 It seems as have re-discovered this idea 10,000 tickets later in - https://github.com/apache/datafusion/issues/13652 FYI @wiedld Let's close this issue and use the ne

Re: [PR] Use workspace rust-version for all workspace crates [datafusion]

2025-01-06 Thread via GitHub
Jefffrey commented on PR #14009: URL: https://github.com/apache/datafusion/pull/14009#issuecomment-2572965242 Thanks @alamb Raised #14022 as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Assert for invariants in tests and debug builds [datafusion]

2025-01-06 Thread via GitHub
alamb closed issue #594: Assert for invariants in tests and debug builds URL: https://github.com/apache/datafusion/issues/594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Automatically check "invariants" [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #13652: URL: https://github.com/apache/datafusion/issues/13652#issuecomment-2572966102 I just discovered that @houqp basically filed this same ticket 2 years ago: - https://github.com/apache/datafusion/issues/594 -- This is an automated message from the

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-06 Thread via GitHub
korowa commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1904059022 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -522,7 +527,7 @@ impl GroupedHashAggregateStream { let spill_state = SpillState {

Re: [I] Fix rust-version key in workspace Cargo.toml to inherit from workspace [datafusion]

2025-01-06 Thread via GitHub
Jefffrey closed issue #9214: Fix rust-version key in workspace Cargo.toml to inherit from workspace URL: https://github.com/apache/datafusion/issues/9214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Use workspace rust-version for all workspace crates [datafusion]

2025-01-06 Thread via GitHub
Jefffrey merged PR #14009: URL: https://github.com/apache/datafusion/pull/14009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-06 Thread via GitHub
jayzhan-synnada commented on code in PR #14020: URL: https://github.com/apache/datafusion/pull/14020#discussion_r1904073374 ## datafusion/functions/src/unicode/find_in_set.rs: ## @@ -138,31 +138,144 @@ fn find_in_set(args: &[ArrayRef]) -> Result { } } -pub fn find_in_set

[PR] Minor: Remove redundant implementation of `StringArrayType` [datafusion]

2025-01-06 Thread via GitHub
tlm365 opened a new pull request, #14023: URL: https://github.com/apache/datafusion/pull/14023 ## Which issue does this PR close? Closes #. ## Rationale for this change Remove redundant implementation of `StringArrayType` ## What changes are included in thi

Re: [PR] Minor: Remove redundant implementation of `StringArrayType` [datafusion]

2025-01-06 Thread via GitHub
alamb commented on PR #14023: URL: https://github.com/apache/datafusion/pull/14023#issuecomment-2573043773 Thank you @tlm365 ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573648804 With the new Parquet POC 1 & 2, we will use ParquetExec instead of the current ScanExec, so at leat for that case the schema will already be known and we will no longer n

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573676778 We'll still use ScanExec for shuffle reader though. The main reason for the initial batch scan is to determine if strings are dictionary-encoded or not. We then cast all

[PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-06 Thread via GitHub
jeffreyssmith2nd opened a new pull request, #14028: URL: https://github.com/apache/datafusion/pull/14028 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Add support for MySQL's INSERT INTO ... SET syntax [datafusion-sqlparser-rs]

2025-01-06 Thread via GitHub
iffyio merged PR #1641: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573695955 Okay. Then seems we can get rid of first batch fetch in ScanExec and assign the scan schema from Spark. I will make a try. -- This is an automated message from the Apache

Re: [PR] chore: extract agg_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on PR #1224: URL: https://github.com/apache/datafusion-comet/pull/1224#issuecomment-2573697981 @rluvaton could you rebase this one and we can merge this one next? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Support pruning on string columns using LIKE [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #507: URL: https://github.com/apache/datafusion/issues/507#issuecomment-2573715986 > I think we also need follow up tickets for: > > * NOT LIKE > * Case insensitive matching Sounds good -- can you please file them (and the more hints you leave in the

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904545210 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/NativeBatchDecoderIterator.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Ap

[I] Define extension API for user-defined invariants. [datafusion]

2025-01-06 Thread via GitHub
wiedld opened a new issue, #14029: URL: https://github.com/apache/datafusion/issues/14029 ### Is your feature request related to a problem or challenge? As part of the work to [automatically check invariants](https://github.com/apache/datafusion/issues/13652) for the logical and exec

Re: [PR] chore: extract json_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-06 Thread via GitHub
codecov-commenter commented on PR #1220: URL: https://github.com/apache/datafusion-comet/pull/1220#issuecomment-2573627436 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1220?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904559170 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/NativeBatchDecoderIterator.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache So

Re: [PR] `url` dependancy update [datafusion]

2025-01-06 Thread via GitHub
korowa commented on code in PR #14019: URL: https://github.com/apache/datafusion/pull/14019#discussion_r1904056652 ## Cargo.toml: ## @@ -150,7 +150,7 @@ serde_json = "1" sqlparser = { version = "0.53.0", features = ["visitor"] } tempfile = "3" tokio = { version = "1.36", feat

Re: [PR] Feat: Add support for `array_size` [datafusion-comet]

2025-01-06 Thread via GitHub
viirya commented on PR #1214: URL: https://github.com/apache/datafusion-comet/pull/1214#issuecomment-2573878690 Thank you @dharanad -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Inference of ListingTableConfig does not work (anymore) for compressed json file [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #14016: URL: https://github.com/apache/datafusion/issues/14016#issuecomment-2573891190 Thank you @timvw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Define extension API for user-defined invariants. [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #14029: URL: https://github.com/apache/datafusion/issues/14029#issuecomment-2573901927 Thanks @wiedld -- I don't fully understand the usecase > Take the existing invariant infrastructure provided as part of https://github.com/apache/datafusion/issues/13652#iss

Re: [I] [EPIC] Add support for all array expressions [datafusion-comet]

2025-01-06 Thread via GitHub
dharanad commented on issue #1042: URL: https://github.com/apache/datafusion-comet/issues/1042#issuecomment-2573945137 Many array functions in DataFusion currently have limited visibility. I have a pull request that addresses this issue https://github.com/apache/datafusion/pull/14030 We

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904650489 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer();

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904652478 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -272,18 +272,19 @@ object CometConf extends ShimCometConf { .booleanConf

Re: [PR] Minor: Improve zero partition check when inserting into `MemTable` [datafusion]

2025-01-06 Thread via GitHub
alamb commented on PR #14024: URL: https://github.com/apache/datafusion/pull/14024#issuecomment-2573980366 Thanks @jonahgao and @comphead ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Minor: Improve zero partition check when inserting into `MemTable` [datafusion]

2025-01-06 Thread via GitHub
alamb merged PR #14024: URL: https://github.com/apache/datafusion/pull/14024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-01-06 Thread via GitHub
kylebarron commented on code in PR #982: URL: https://github.com/apache/datafusion-python/pull/982#discussion_r1904778623 ## python/datafusion/io.py: ## @@ -0,0 +1,181 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [I] Add support for lz4 compression in shuffle [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove closed issue #1178: Add support for lz4 compression in shuffle URL: https://github.com/apache/datafusion-comet/issues/1178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove merged PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-01-06 Thread via GitHub
kevinjqliu commented on code in PR #982: URL: https://github.com/apache/datafusion-python/pull/982#discussion_r1904776793 ## python/datafusion/io.py: ## @@ -0,0 +1,181 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1904789223 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,24 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-06 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1904789223 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,24 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#issuecomment-2573691492 @viirya @kazuyukitanimura @mbutrovich @comphead Thanks for the reviews so far. I believe I have addressed all feedback now. -- This is an automated message from the Apache G

Re: [I] Support pruning on string columns using LIKE [datafusion]

2025-01-06 Thread via GitHub
adriangb commented on issue #507: URL: https://github.com/apache/datafusion/issues/507#issuecomment-2573702237 I think we also need follow up tickets for: - NOT LIKE - Case insensitive matching -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904556240 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer(); l

Re: [I] Automatically check "invariants" [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #13652: URL: https://github.com/apache/datafusion/issues/13652#issuecomment-2573635690 I suggest we use this ticket to track the infrastructure for checking invariants (e.g. what @wiedld is doing in https://github.com/apache/datafusion/pull/13986) and then claim suc

Re: [I] sql result discrepency with sqlite, postgres and duckdb [datafusion]

2025-01-06 Thread via GitHub
Omega359 commented on issue #13780: URL: https://github.com/apache/datafusion/issues/13780#issuecomment-2573650403 Addendum: Since the sqlite tests come from sqlite (duh) where REAL is mapped to 8 bytes (Double/f64) I would like to propose that I update the sqlite .slt files and change:

Re: [I] Automatically check "invariants" [datafusion]

2025-01-06 Thread via GitHub
wiedld commented on issue #13652: URL: https://github.com/apache/datafusion/issues/13652#issuecomment-2573659546 > I suggest we use this ticket to track the infrastructure for checking invariants Agreed. Modifying [this list above](https://github.com/apache/datafusion/issues/13652#is

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904549692 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -1567,17 +1585,41 @@ pub fn write_ipc_compressed( let mut timer = ipc_time.timer();

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904551983 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/NativeBatchDecoderIterator.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache So

Re: [PR] feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on code in PR #1192: URL: https://github.com/apache/datafusion-comet/pull/1192#discussion_r1904587549 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/NativeBatchDecoderIterator.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache So

[PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-06 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #14031: URL: https://github.com/apache/datafusion/pull/14031 ## Which issue does this PR close? Closes #13621. ## Rationale for this change Unparsing unions with more than 2 inputs (produced by the logical optimiz

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-06 Thread via GitHub
MohamedAbdeen21 commented on PR #14031: URL: https://github.com/apache/datafusion/pull/14031#issuecomment-2574047801 Looks like there's a circular dep between optimizer and SQL packages. The easiest solution is moving the test somewhere else, not sure where though -- This is an automated

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-06 Thread via GitHub
kylebarron commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1904715954 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,24 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_

Re: [PR] feat: reads using global ctx [datafusion-python]

2025-01-06 Thread via GitHub
kylebarron commented on code in PR #982: URL: https://github.com/apache/datafusion-python/pull/982#discussion_r1904717182 ## python/datafusion/io.py: ## @@ -0,0 +1,181 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [I] supports_filters_pushdown is invoked more than once on a single Custom Data Source [datafusion]

2025-01-06 Thread via GitHub
cisaacson commented on issue #13994: URL: https://github.com/apache/datafusion/issues/13994#issuecomment-2574327502 Thanks @jonahgao , this is very helpful. The documentation does not fully reflect this, I will try and update it. The way I have things now I am not dependent on the DataFusio

Re: [I] supports_filters_pushdown is invoked more than once on a single Custom Data Source [datafusion]

2025-01-06 Thread via GitHub
cisaacson commented on issue #13994: URL: https://github.com/apache/datafusion/issues/13994#issuecomment-2573291404 @jonahgao Thanks for explaining this. We can probably work with this but the issue is that since we want some `filters` and not others (in other words some are preferred index

Re: [I] Memory account not adding up in SortExec [datafusion]

2025-01-06 Thread via GitHub
westonpace commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2573298788 > FWIW I'm still seeing the same issue through LanceDB (https://github.com/lancedb/lance/issues/2119#issuecomment-2136414811). This isn't necessarily indicative as Lanc

Re: [PR] Add support for the SQL OVERLAPS predicate [datafusion-sqlparser-rs]

2025-01-06 Thread via GitHub
iffyio merged PR #1638: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-06 Thread via GitHub
iffyio commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1904299807 ## src/parser/mod.rs: ## @@ -2353,14 +2355,30 @@ impl<'a> Parser<'a> { }; Ok(DateTimeField::Week(week_day))

Re: [PR] Add support for Snowflake LIST and REMOVE [datafusion-sqlparser-rs]

2025-01-06 Thread via GitHub
iffyio merged PR #1639: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] sql odd case of rounding compared to duckdb and postgresql [datafusion]

2025-01-06 Thread via GitHub
Omega359 commented on issue #13781: URL: https://github.com/apache/datafusion/issues/13781#issuecomment-2573381963 I suspect much of this is the same cause as #13780 - nullif typing being incorrect and real mapping to f32 where it is not possible to represent some integers exactly. P

Re: [PR] feat: add test to check for `ctx.enable_url_table()` [datafusion-ballista]

2025-01-06 Thread via GitHub
andygrove merged PR #1155: URL: https://github.com/apache/datafusion-ballista/pull/1155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: no need to run python test in rust [datafusion-ballista]

2025-01-06 Thread via GitHub
andygrove merged PR #1154: URL: https://github.com/apache/datafusion-ballista/pull/1154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] Memory account not adding up in SortExec [datafusion]

2025-01-06 Thread via GitHub
westonpace commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2573542512 Here's a pure-rust datafusion-only example: https://github.com/westonpace/arrow-datafusion/commit/26ed75c51ad649a274063ad3fa1262b7025a17cf It takes a bit of time the fi

Re: [I] Support pruning on string columns using LIKE [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #507: URL: https://github.com/apache/datafusion/issues/507#issuecomment-2573539614 Filed the following ticket to support `starts_with`: 🎣 - https://github.com/apache/datafusion/issues/14027 -- This is an automated message from the Apache Git Service. To res

[PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-06 Thread via GitHub
nuno-faria opened a new pull request, #14026: URL: https://github.com/apache/datafusion/pull/14026 Ensures selections can be pushed past window functions, similarly to what is already done with aggregations, when possible. Unlike aggregations, however, extra care must be taken when handli

[I] Support pruning on `starts_with` [datafusion]

2025-01-06 Thread via GitHub
alamb opened a new issue, #14027: URL: https://github.com/apache/datafusion/issues/14027 ### Is your feature request related to a problem or challenge? @adriangb implemented `PruningPredicate` support for prefix matching `LIKE` / `NOT LIKE` in - https://github.com/apache/datafusi

Re: [I] Support pruning on `starts_with` [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #14027: URL: https://github.com/apache/datafusion/issues/14027#issuecomment-2573538810 I think this is a good first issue as rewriting a function should be straightforward and doesn't require indepth knowledge of the rest of the engine -- This is an automated mess

Re: [I] Ballista 43.0.0 Release [datafusion-ballista]

2025-01-06 Thread via GitHub
andygrove commented on issue #974: URL: https://github.com/apache/datafusion-ballista/issues/974#issuecomment-2573567119 Sure, lets do it. Can you create a PR against `main` to update version numbers and add the changelog? -- This is an automated message from the Apache Git Service.

[I] fail to parse `set ez.grouping.max-size=1234;` in Hive Dialect [datafusion-sqlparser-rs]

2025-01-06 Thread via GitHub
wugeer opened a new issue, #1643: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1643 According to the Apache Tez code, parameter `tez.grouping.max-size` is supported. https://github.com/apache/tez/blob/1e6c9e3448bb9d934508ee995ad60c23dafa0610/tez-mapreduce/src/main/java/o

Re: [I] Panic in a query with NATURAL JOIN (SQLancer) [datafusion]

2025-01-06 Thread via GitHub
alamb commented on issue #14015: URL: https://github.com/apache/datafusion/issues/14015#issuecomment-2573011842 THanks @2010YOUY01 and @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

  1   2   >