Re: [I] Stop encoding schema with each batch in shuffle writer [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove closed issue #1186: Stop encoding schema with each batch in shuffle writer URL: https://github.com/apache/datafusion-comet/issues/1186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[I] Substrait roundtrip fails for Sort with a fetch [datafusion]

2024-12-20 Thread via GitHub
robtandy opened a new issue, #13860: URL: https://github.com/apache/datafusion/issues/13860 ### Describe the bug Substrait round trip fails for a query that produces a logical plan where the Sort node includes a fetch. As an example, ``` Sort: data.b ASC NULLS LAST, fetch

[PR] feat: Implement fast serde for single record batches [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove opened a new pull request, #1190: URL: https://github.com/apache/datafusion-comet/pull/1190 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1189 ## Rationale for this change Arrow IPC is overkill for en

Re: [PR] chore: Migration Guide [datafusion]

2024-12-20 Thread via GitHub
comphead commented on code in PR #13849: URL: https://github.com/apache/datafusion/pull/13849#discussion_r1894138893 ## docs/source/library-user-guide/api-health.md: ## @@ -69,3 +69,32 @@ For example: Deprecated methods will remain in the codebase for a period of 6 major versi

Re: [PR] chore: Migration Guide [datafusion]

2024-12-20 Thread via GitHub
comphead commented on code in PR #13849: URL: https://github.com/apache/datafusion/pull/13849#discussion_r1894140043 ## docs/source/library-user-guide/api-health.md: ## @@ -69,3 +69,32 @@ For example: Deprecated methods will remain in the codebase for a period of 6 major versi

Re: [PR] chore: Migration Guide [datafusion]

2024-12-20 Thread via GitHub
comphead commented on code in PR #13849: URL: https://github.com/apache/datafusion/pull/13849#discussion_r1894141226 ## docs/source/library-user-guide/api-health.md: ## @@ -69,3 +69,32 @@ For example: Deprecated methods will remain in the codebase for a period of 6 major versi

Re: [PR] chore: Migration Guide [datafusion]

2024-12-20 Thread via GitHub
comphead commented on PR #13849: URL: https://github.com/apache/datafusion/pull/13849#issuecomment-2557334458 Thanks @alamb and @findepi the main key point I got from your messages is to avoid method removal from migration doc and from development cycle. But if so, the obsolete methods will

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2024-12-20 Thread via GitHub
westonpace commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2557413683 A recent change was made to Substrait that used a feature that was only stabilized in protoc versions greater than 3.12 (I'm not actually sure the exact version it was staili

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2024-12-20 Thread via GitHub
westonpace commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2557416340 Ah, yes, according to https://github.com/rust-lang/crates-build-env it does appear that `docs.rs` uses Ubuntu 22.04 to build crate docs. -- This is an automated message fro

Re: [PR] chore: Migration Guide [datafusion]

2024-12-20 Thread via GitHub
findepi commented on code in PR #13849: URL: https://github.com/apache/datafusion/pull/13849#discussion_r1893873359 ## docs/source/library-user-guide/api-health.md: ## @@ -69,3 +69,32 @@ For example: Deprecated methods will remain in the codebase for a period of 6 major versio

[PR] fix: enable DF's nested_expressions feature by in datafusion-substrait tests to make them pass [datafusion]

2024-12-20 Thread via GitHub
Blizzara opened a new pull request, #13857: URL: https://github.com/apache/datafusion/pull/13857 ## Which issue does this PR close? Closes #13854 ## Rationale for this change https://github.com/apache/datafusion/pull/13594/files#diff-d195b6dfaaed44b9828e145dc50c0

Re: [PR] fix: enable DF's nested_expressions feature by in datafusion-substrait tests to make them pass [datafusion]

2024-12-20 Thread via GitHub
Blizzara commented on PR #13857: URL: https://github.com/apache/datafusion/pull/13857#issuecomment-2556721521 Fyi @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] replace CASE expressions in predicate pruning with boolean algebra [datafusion]

2024-12-20 Thread via GitHub
alamb merged PR #13795: URL: https://github.com/apache/datafusion/pull/13795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] replace CASE expressions in predicate pruning with boolean algebra [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13795: URL: https://github.com/apache/datafusion/pull/13795#issuecomment-2557040547 Thanks again @adriangb and @appletreeisyellow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat(function): add `least` function [datafusion]

2024-12-20 Thread via GitHub
rluvaton commented on code in PR #13786: URL: https://github.com/apache/datafusion/pull/13786#discussion_r1893959228 ## datafusion/functions/src/core/least.rs: ## @@ -0,0 +1,283 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat(function): add `least` function [datafusion]

2024-12-20 Thread via GitHub
rluvaton commented on code in PR #13786: URL: https://github.com/apache/datafusion/pull/13786#discussion_r1893959873 ## datafusion/functions/src/core/least.rs: ## @@ -0,0 +1,283 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat(function): add `least` function [datafusion]

2024-12-20 Thread via GitHub
rluvaton commented on code in PR #13786: URL: https://github.com/apache/datafusion/pull/13786#discussion_r1893959616 ## datafusion/functions/src/core/least.rs: ## @@ -0,0 +1,283 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat(function): add `least` function [datafusion]

2024-12-20 Thread via GitHub
rluvaton commented on PR #13786: URL: https://github.com/apache/datafusion/pull/13786#issuecomment-2557051641 So I merged both implementations into one, I don't really like it, the naming I choose is off, I need a better name but all comments and resolved -- This is an automated message f

Re: [PR] Minor: Replace `BooleanArray::extend` with `append_n` [datafusion]

2024-12-20 Thread via GitHub
comphead commented on code in PR #13832: URL: https://github.com/apache/datafusion/pull/13832#discussion_r1894200801 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1613,10 +1611,10 @@ impl SortMergeJoinStream { self.output_record_batches

[PR] Minor: Use `resize` instead of `extend` for static values in SMJ logic [datafusion]

2024-12-20 Thread via GitHub
comphead opened a new pull request, #13861: URL: https://github.com/apache/datafusion/pull/13861 ## Which issue does this PR close? Minor improvement when dealing with internal arrays for SMJ potentially removing extra allocation for filtered SortMergeJoin Closes #.

Re: [PR] feat: Implement fast serde for single record batches [datafusion-comet]

2024-12-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1894220969 ## native/core/src/execution/shuffle/batch_serde.rs: ## @@ -0,0 +1,213 @@ +use arrow::ipc::reader::StreamReader; Review Comment: license? -- T

Re: [PR] Minor: improve error message when ARRAY literals can not be planned [datafusion]

2024-12-20 Thread via GitHub
comphead commented on code in PR #13859: URL: https://github.com/apache/datafusion/pull/13859#discussion_r1894198442 ## datafusion/sql/src/expr/value.rs: ## @@ -169,7 +169,7 @@ impl SqlToRel<'_, S> { } } -internal_err!("Expected a simplified resul

Re: [PR] [comet-parquet-exec] Merge upstream/main and resolve conflicts [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove merged PR #1183: URL: https://github.com/apache/datafusion-comet/pull/1183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Substrait roundtrip fails for Sort with a fetch [datafusion]

2024-12-20 Thread via GitHub
robtandy commented on issue #13860: URL: https://github.com/apache/datafusion/issues/13860#issuecomment-2557493259 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove merged PR #1185: URL: https://github.com/apache/datafusion-comet/pull/1185 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Preserve ordering equivalencies on `with_reorder` [datafusion]

2024-12-20 Thread via GitHub
berkaysynnada commented on PR #13770: URL: https://github.com/apache/datafusion/pull/13770#issuecomment-2556528737 resolved the conflicts, will be merged once CI passes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] parquet RowGroup pruning for `Dictionary(Decimal)` type incorrect [datafusion]

2024-12-20 Thread via GitHub
kosiew commented on issue #13821: URL: https://github.com/apache/datafusion/issues/13821#issuecomment-2556552343 Research findings: with parquest_pruning turned off ``` let config = SessionConfig::default() .with_parquet_bloom_filter_pruning(true) .with_

Re: [I] substrait_integration integration tests are failing [datafusion]

2024-12-20 Thread via GitHub
Blizzara commented on issue #13854: URL: https://github.com/apache/datafusion/issues/13854#issuecomment-2556617531 Hmm, the two tests are failing for me locally as well. However they are run in CI, and there they pass: ``` Running tests/substrait_integration.rs (target/debug/deps

Re: [PR] Preserve ordering equivalencies on `with_reorder` [datafusion]

2024-12-20 Thread via GitHub
berkaysynnada merged PR #13770: URL: https://github.com/apache/datafusion/pull/13770 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-20 Thread via GitHub
ozankabak merged PR #13823: URL: https://github.com/apache/datafusion/pull/13823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] Support 1 or 3 arg in generate_series() UDTF [datafusion]

2024-12-20 Thread via GitHub
UBarney opened a new pull request, #13856: URL: https://github.com/apache/datafusion/pull/13856 ## Which issue does this PR close? Closes #13615. ## Rationale for this change ## What changes are included in this PR? + Added some args validation to `Gene

Re: [PR] Support n-ary monotonic functions in ordering equivalence [datafusion]

2024-12-20 Thread via GitHub
berkaysynnada merged PR #13841: URL: https://github.com/apache/datafusion/pull/13841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Support n-ary monotonic functions in ordering equivalence [datafusion]

2024-12-20 Thread via GitHub
berkaysynnada closed issue #13839: Support n-ary monotonic functions in ordering equivalence URL: https://github.com/apache/datafusion/issues/13839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] ParquetSink should be aware of arrow schema encoding (configurable) in the file metadata. [datafusion]

2024-12-20 Thread via GitHub
wiedld opened a new pull request, #13866: URL: https://github.com/apache/datafusion/pull/13866 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11770 ## Rationale for this change The [ArrowWriter](https://docs.rs/parquet/53.3.0/parque

Re: [PR] ParquetSink should be aware of arrow schema encoding (configurable) in the file metadata. [datafusion]

2024-12-20 Thread via GitHub
wiedld commented on code in PR #13866: URL: https://github.com/apache/datafusion/pull/13866#discussion_r1894526006 ## datafusion/common/src/file_options/parquet_writer.rs: ## @@ -140,6 +162,32 @@ impl TryFrom<&TableParquetOptions> for WriterPropertiesBuilder { } } +///

Re: [PR] ParquetSink should be aware of arrow schema encoding (configurable) in the file metadata. [datafusion]

2024-12-20 Thread via GitHub
wiedld commented on code in PR #13866: URL: https://github.com/apache/datafusion/pull/13866#discussion_r1894526384 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -2335,42 +2347,64 @@ mod tests { async fn parquet_sink_write() -> Result<()> { let par

Re: [PR] ParquetSink should be aware of arrow schema encoding (configurable) in the file metadata. [datafusion]

2024-12-20 Thread via GitHub
wiedld commented on code in PR #13866: URL: https://github.com/apache/datafusion/pull/13866#discussion_r1894526449 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -2335,42 +2347,64 @@ mod tests { async fn parquet_sink_write() -> Result<()> { let par

Re: [PR] ParquetSink should be aware of arrow schema encoding (configurable) in the file metadata. [datafusion]

2024-12-20 Thread via GitHub
wiedld commented on code in PR #13866: URL: https://github.com/apache/datafusion/pull/13866#discussion_r1894525850 ## datafusion/common/src/file_options/parquet_writer.rs: ## @@ -51,38 +58,53 @@ impl ParquetWriterOptions { } } -impl TryFrom<&TableParquetOptions> for Parq

Re: [I] CI: Windows flow takes 1.5h [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13726: URL: https://github.com/apache/datafusion/issues/13726#issuecomment-2557941757 > > @Alexhuszagh can you build DataFusion with a command like this: > > ```shell > > cargo build -j 1 > > ``` > > > > > > > > > > > >

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2557940737 Thanks @westonpace One thing I couldn't understand is how substrait the substrait docs themselves are built, seemingly just fine on docs.rs (the same runners): https://do

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2557940944 Also, randomly, I learned today that @andygrove *also* is an owner of the substrait crate. Fascinating! https://github.com/user-attachments/assets/ffc7bf3e-7858-4539-b98c-c8

[I] Replace `BufferBuilder` with `Vec` [datafusion]

2024-12-20 Thread via GitHub
jayzhan211 opened a new issue, #13867: URL: https://github.com/apache/datafusion/issues/13867 ### Is your feature request related to a problem or challenge? The functionality we need from BufferBuilder can be handled by Vec. Switching to Vec will simplify things, and maybe faster (?

Re: [PR] DataFusion Python 43.1.0 announcement [datafusion-site]

2024-12-20 Thread via GitHub
andygrove merged PR #43: URL: https://github.com/apache/datafusion-site/pull/43 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Find keywords using perfect hashing [datafusion-sqlparser-rs]

2024-12-20 Thread via GitHub
tobyhede commented on PR #1590: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1590#issuecomment-2557914787 Also an open question regarding maintenance of the phf crate https://github.com/rust-phf/rust-phf/issues/318 It does appear to be relatively stable, most of the open i

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2024-12-20 Thread via GitHub
comphead commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2557916785 Some good experiments are https://github.com/johnthagen/min-sized-rust?tab=readme-ov-file#optimize-libstd-with-xargo with this profile ``` [profile.release]

Re: [I] Test DataFusion 44.0.0 with Sail [datafusion]

2024-12-20 Thread via GitHub
shehabgamin commented on issue #13855: URL: https://github.com/apache/datafusion/issues/13855#issuecomment-2557868273 > > After a brief review of the errors I suspect there may be something up with lists and structs but I would have to see the actual logic being tested to know for certain.

Re: [PR] feat: Implement fast serde for single record batches [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove commented on code in PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#discussion_r1894511912 ## native/core/src/execution/shuffle/batch_serde.rs: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2024-12-20 Thread via GitHub
comphead commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2557881624 ``` print_functions_docs print_functions_config ``` binaries can be moved out from the main release -- This is an automated message from the Apache Git Service. To

[PR] fix: [comet-parquet-exec] Fix timestamp cast errors [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove opened a new pull request, #1191: URL: https://github.com/apache/datafusion-comet/pull/1191 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] fix: [comet-parquet-exec] Fix timestamp cast errors [datafusion-comet]

2024-12-20 Thread via GitHub
andygrove commented on code in PR #1191: URL: https://github.com/apache/datafusion-comet/pull/1191#discussion_r1894276152 ## native/spark-expr/src/cast.rs: ## @@ -138,240 +138,6 @@ pub struct Cast { pub cast_options: SparkCastOptions, } -/// Determine if Comet supports a

Re: [PR] Implement predicate pruning for `like` expressions (prefix matching) [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #12978: URL: https://github.com/apache/datafusion/pull/12978#issuecomment-2557697673 Clearly I failed to review this -- I will do so hopefully later today but may be tomorrow -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] feat: support normalized expr in CSE [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13315: URL: https://github.com/apache/datafusion/pull/13315#issuecomment-2557698644 Thanks again @zhuliquan @peter-toth @jayzhan211 -- this release is shaping up to be the best yet! -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Minor: Use `resize` instead of `extend` for static values in SMJ logic [datafusion]

2024-12-20 Thread via GitHub
comphead merged PR #13861: URL: https://github.com/apache/datafusion/pull/13861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-20 Thread via GitHub
Blizzara commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1894445457 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -262,7 +718,25 @@ async fn except_rels( /// Convert Substrait Plan to DataFusion LogicalPlan pub a

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-20 Thread via GitHub
Blizzara commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1894445457 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -262,7 +718,25 @@ async fn except_rels( /// Convert Substrait Plan to DataFusion LogicalPlan pub a

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-20 Thread via GitHub
vbarua commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1894475383 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -262,7 +718,25 @@ async fn except_rels( /// Convert Substrait Plan to DataFusion LogicalPlan pub asy

Re: [PR] feat(substrait): modular substrait consumer [datafusion]

2024-12-20 Thread via GitHub
vbarua commented on code in PR #13803: URL: https://github.com/apache/datafusion/pull/13803#discussion_r1894475748 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -262,7 +718,25 @@ async fn except_rels( /// Convert Substrait Plan to DataFusion LogicalPlan pub asy

Re: [PR] Support Null aware anti join by HashJoin [datafusion]

2024-12-20 Thread via GitHub
github-actions[bot] commented on PR #10584: URL: https://github.com/apache/datafusion/pull/10584#issuecomment-2557948679 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Add snapshot testing to CLI & set up AWS mock [datafusion]

2024-12-20 Thread via GitHub
alamb commented on code in PR #13672: URL: https://github.com/apache/datafusion/pull/13672#discussion_r1894533684 ## datafusion-cli/tests/cli_integration.rs: ## @@ -17,42 +17,223 @@ use std::process::Command; -use assert_cmd::prelude::{CommandCargoExt, OutputAssertExt}; -us

Re: [PR] Minor: improve error message when ARRAY literals can not be planned [datafusion]

2024-12-20 Thread via GitHub
alamb commented on code in PR #13859: URL: https://github.com/apache/datafusion/pull/13859#discussion_r1894534570 ## datafusion/sql/src/expr/value.rs: ## @@ -169,7 +169,7 @@ impl SqlToRel<'_, S> { } } -internal_err!("Expected a simplified result,

Re: [PR] Add configurable normalization for configuration options and preserve case for S3 paths [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13576: URL: https://github.com/apache/datafusion/pull/13576#issuecomment-2557121105 Thanks @blaginin @findepi @comphead @berkaysynnada and @ozankabak for sticking with this. It is a nice improvement. -- This is an automated message from the Apache Git Service. To r

Re: [I] Proposal: Restructure DataFusion site [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #1821: URL: https://github.com/apache/datafusion/issues/1821#issuecomment-2557105293 I think the current site, while not quite the proposal of @matthewmturner is much better than when this ticket was written https://datafusion.apache.org/ Thus let's cl

Re: [I] Proposal: Restructure DataFusion site [datafusion]

2024-12-20 Thread via GitHub
alamb closed issue #1821: Proposal: Restructure DataFusion site URL: https://github.com/apache/datafusion/issues/1821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13823: URL: https://github.com/apache/datafusion/pull/13823#issuecomment-2557107279 > One alternative to consider would be making 44 a major release with all the API changes (including this) so that we can proceed with a few "stable" releases after 44. It is also not

Re: [I] Dec 13, 2024: This week(s) in DataFusion [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13760: URL: https://github.com/apache/datafusion/issues/13760#issuecomment-2557108394 Maybe time to ressurect discussion about stable / long term releases - https://github.com/apache/datafusion/issues/5269 Is anyone else interested in this? We would likely

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-20 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r189428 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [I] [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13648: URL: https://github.com/apache/datafusion/issues/13648#issuecomment-2557109771 I found a good previous discussion by @andygrove about potential patch releases: - https://github.com/apache/datafusion/issues/5269 -- This is an automated message from the

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-20 Thread via GitHub
jayzhan211 commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1894002307 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .ma

Re: [I] [EPIC] A collection of items to improve developer / CI speed [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13813: URL: https://github.com/apache/datafusion/issues/13813#issuecomment-2557117029 > I've managed to knock some time off of the ci runs on a sidebranch (need to rebase off of main before pushing a PR) but here is what I've managed to get to so far: LO

Re: [I] Support per-option value normalization [datafusion]

2024-12-20 Thread via GitHub
alamb closed issue #11650: Support per-option value normalization URL: https://github.com/apache/datafusion/issues/11650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Add configurable normalization for configuration options and preserve case for S3 paths [datafusion]

2024-12-20 Thread via GitHub
alamb merged PR #13576: URL: https://github.com/apache/datafusion/pull/13576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] 2gb parquet file takes 100s to process, even on second attempt (on main) [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13785: URL: https://github.com/apache/datafusion/issues/13785#issuecomment-2557119039 > Hmm... it looks like `DataFrameWriteOptions` is missing an order by / sort by like available in SQL. @TheBuilderJR if you are you willing to file a ticket for that featur

Re: [I] Test DataFusion 44.0.0 with Sail [datafusion]

2024-12-20 Thread via GitHub
alamb commented on issue #13855: URL: https://github.com/apache/datafusion/issues/13855#issuecomment-2557124801 > After a brief review of the errors I suspect there may be something up with lists and structs but I would have to see the actual logic being tested to know for certain. Many of

Re: [PR] Improve`Signature` and `comparison_coercion` documentation [datafusion]

2024-12-20 Thread via GitHub
alamb merged PR #13840: URL: https://github.com/apache/datafusion/pull/13840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve`Signature` and `comparison_coercion` documentation [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13840: URL: https://github.com/apache/datafusion/pull/13840#issuecomment-2557127722 Thanks again @findepi and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Improve`Signature` and `comparison_coercion` documentation [datafusion]

2024-12-20 Thread via GitHub
alamb commented on code in PR #13840: URL: https://github.com/apache/datafusion/pull/13840#discussion_r1894012298 ## datafusion/expr-common/src/signature.rs: ## @@ -123,24 +127,29 @@ pub enum TypeSignature { /// One or more arguments belonging to the [`TypeSignatureClass`],

Re: [PR] feat: support normalized expr in CSE [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13315: URL: https://github.com/apache/datafusion/pull/13315#issuecomment-2557132600 Benchmark results. If anything this branch seems to be slightly faster than main. I am rerunning to check again but I see no reason not to merge. Thanks again @zhuliquan @jayzhan

Re: [PR] feat: support normalized expr in CSE [datafusion]

2024-12-20 Thread via GitHub
alamb merged PR #13315: URL: https://github.com/apache/datafusion/pull/13315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Consolidate Example: simplify_udaf_expression.rs into advanced_udaf.rs [datafusion]

2024-12-20 Thread via GitHub
takaebato commented on issue #13842: URL: https://github.com/apache/datafusion/issues/13842#issuecomment-2557131459 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Consolidate Example: simplify_udaf_expression.rs into advanced_udaf.rs [datafusion]

2024-12-20 Thread via GitHub
takaebato commented on issue #13842: URL: https://github.com/apache/datafusion/issues/13842#issuecomment-2557129464 I'll work on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Upgrade to sqlparser `0.53.0` [datafusion]

2024-12-20 Thread via GitHub
alamb merged PR #13767: URL: https://github.com/apache/datafusion/pull/13767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): update sqlparser requirement from 0.52.0 to 0.53.0 [datafusion]

2024-12-20 Thread via GitHub
dependabot[bot] closed pull request #13837: chore(deps): update sqlparser requirement from 0.52.0 to 0.53.0 URL: https://github.com/apache/datafusion/pull/13837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] chore(deps): update sqlparser requirement from 0.52.0 to 0.53.0 [datafusion]

2024-12-20 Thread via GitHub
dependabot[bot] commented on PR #13837: URL: https://github.com/apache/datafusion/pull/13837#issuecomment-2557137501 Looks like sqlparser is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Upgrade to sqlparser `0.53.0` [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13767: URL: https://github.com/apache/datafusion/pull/13767#issuecomment-2557135488 Thanks for the reviews @comphead (as always). Let's keep this code flowing 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Implement `SHOW FUNCTIONS` [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13799: URL: https://github.com/apache/datafusion/pull/13799#issuecomment-2557140181 > Maybe we can consider improving the UX of the CLI. That is an excellent idea -- I think it is one of @matthewmturner 's goals with https://github.com/datafusion-contrib/datafus

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2024-12-20 Thread via GitHub
timsaucer commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2557143735 There is only one point where we have ffi calls that are async, currently. This happens in the record batch stream. So I think what needs to happen is that within the private

Re: [I] substrait_integration integration tests are failing [datafusion]

2024-12-20 Thread via GitHub
alamb closed issue #13854: substrait_integration integration tests are failing URL: https://github.com/apache/datafusion/issues/13854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] fix: enable DF's nested_expressions feature by in datafusion-substrait tests to make them pass [datafusion]

2024-12-20 Thread via GitHub
alamb merged PR #13857: URL: https://github.com/apache/datafusion/pull/13857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] substrait_integration integration tests are failing [datafusion]

2024-12-20 Thread via GitHub
alamb closed issue #13854: substrait_integration integration tests are failing URL: https://github.com/apache/datafusion/issues/13854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] fix: enable DF's nested_expressions feature by in datafusion-substrait tests to make them pass [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13857: URL: https://github.com/apache/datafusion/pull/13857#issuecomment-2557076322 > Interestingly, the tests still passed in CI but failed when run locally - maybe due to CI running a whole bunch of tests at once and thus compiling DF with different set of features,

[PR] Minor: improve error message when ARRAY literals can not be planned [datafusion]

2024-12-20 Thread via GitHub
alamb opened a new pull request, #13859: URL: https://github.com/apache/datafusion/pull/13859 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13854 ## Rationale for this change There was a problem running substrait tests as they di

Re: [PR] fix: enable DF's nested_expressions feature by in datafusion-substrait tests to make them pass [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13857: URL: https://github.com/apache/datafusion/pull/13857#issuecomment-2557092695 Thank you very much @Blizzara -- great 🕵️ I also filed a follow up ticket to make the error message more helpful - https://github.com/apache/datafusion/pull/13859 -- This

Re: [PR] Replace `execution_mode` with `emission_type` and `boundedness` [datafusion]

2024-12-20 Thread via GitHub
alamb commented on PR #13823: URL: https://github.com/apache/datafusion/pull/13823#issuecomment-2557093668 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Test DataFusion 44.0.0 with Sail [datafusion]

2024-12-20 Thread via GitHub
Omega359 commented on issue #13855: URL: https://github.com/apache/datafusion/issues/13855#issuecomment-2557060605 > Smooth Sailing testing commit `5d563d9` from DataFusion `main` branch. > > The following test reports run tests on the PR branch and compares them against the tests fro

Re: [I] CI: Windows flow takes 1.5h [datafusion]

2024-12-20 Thread via GitHub
Alexhuszagh commented on issue #13726: URL: https://github.com/apache/datafusion/issues/13726#issuecomment-2557178542 > @Alexhuszagh can you build DataFusion with a command like this: > > ```shell > cargo build -j 1 > ``` > > (I am wondering if something about doing so man

Re: [PR] Deprecate ScalarUDFImpl::return_type [datafusion]

2024-12-20 Thread via GitHub
findepi commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1893845870 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .map(|

Re: [I] substrait_integration integration tests are failing [datafusion]

2024-12-20 Thread via GitHub
Blizzara commented on issue #13854: URL: https://github.com/apache/datafusion/issues/13854#issuecomment-2556714089 bisect lead to https://github.com/apache/datafusion/commit/11d49b49dd3361e975fb67f66f40715bdaf4650e. Specifically, this change: https://github.com/apache/datafusion/pull/13594

Re: [PR] Improve`Signature` and `comparison_coercion` documentation [datafusion]

2024-12-20 Thread via GitHub
jayzhan211 commented on code in PR #13840: URL: https://github.com/apache/datafusion/pull/13840#discussion_r1893820169 ## datafusion/expr-common/src/signature.rs: ## @@ -123,24 +127,29 @@ pub enum TypeSignature { /// One or more arguments belonging to the [`TypeSignatureCla

[PR] TESTING: new object store release [datafusion]

2024-12-20 Thread via GitHub
alamb opened a new pull request, #13858: URL: https://github.com/apache/datafusion/pull/13858 NOT INTENDED FOR MERGE - Related to https://github.com/apache/arrow-rs/issues/6902 I am testing the object_store release prior to publishing it -- This is an automated message from

Re: [PR] fix: [comet-parquet-exec] Fix timestamp cast errors [datafusion-comet]

2024-12-20 Thread via GitHub
parthchandra commented on code in PR #1191: URL: https://github.com/apache/datafusion-comet/pull/1191#discussion_r1894277600 ## native/spark-expr/src/cast.rs: ## @@ -138,240 +138,6 @@ pub struct Cast { pub cast_options: SparkCastOptions, } -/// Determine if Comet support

  1   2   >