Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]

2025-02-23 Thread via GitHub
varun-bhardwaj-sde closed pull request #14792: Fixed Migrate Datetime functions to invoke_with_args Issue 14705 URL: https://github.com/apache/datafusion/pull/14792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] BigQuery: Add support for `BEGIN` [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
alamb commented on code in PR #1718: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1718#discussion_r1966760320 ## src/ast/mod.rs: ## @@ -3058,6 +3058,33 @@ pub enum Statement { begin: bool, transaction: Option, modifier: Option, +

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
alamb commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2676808035 🤔 interestingly we don't seem to have an MSRV policy in this crate (at least not that I could find in https://github.com/apache/datafusion-sqlparser-rs/blob/main/README.md

Re: [I] Adopt temporalio/snipsync for documentation [datafusion]

2025-02-23 Thread via GitHub
alamb closed issue #10768: Adopt temporalio/snipsync for documentation URL: https://github.com/apache/datafusion/issues/10768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Discuss: Should we implement custom lints? [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #5644: URL: https://github.com/apache/datafusion/issues/5644#issuecomment-2676842151 I think our current lints with clippy have served us well and we have used custom lints where appriate Let's open new issues if we have specific ideas for new lints -- This

Re: [I] Adopt temporalio/snipsync for documentation [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #10768: URL: https://github.com/apache/datafusion/issues/10768#issuecomment-2676843045 I believe @ugoa 's changes in the following PR now test the examples in the documentation automatically so closing this issue - https://github.com/apache/datafusion/pull/14544

Re: [I] Discuss: Should we implement custom lints? [datafusion]

2025-02-23 Thread via GitHub
alamb closed issue #5644: Discuss: Should we implement custom lints? URL: https://github.com/apache/datafusion/issues/5644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Add `#[recursive]` [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
alamb commented on PR #1522: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1522#issuecomment-2676807084 > I didn't notice the missing error handling either. Your [rust-lang/stacker#116](https://github.com/rust-lang/stacker/pull/116) seems like a nice improvement. But if it do

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
alamb commented on PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#issuecomment-2676808148 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
alamb commented on PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#issuecomment-2676821051 FYI @eliaperantoni -- any chance you are willing to help our with this one? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2676834154 > For anyone looking, I think [this](https://github.com/influxdata/arrow-datafusion/pull/55) is the workaround Influx has for this issue. Indeed -- that basically disables

Re: [I] Support User-Defined Sorting [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2676836921 Maybe this is a good usecase for "user defined types" - https://github.com/apache/datafusion/issues/12644 -- This is an automated message from the Apache Git Service. To respo

Re: [I] Review the need of `make_scalar_function` for `functions` [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14835: URL: https://github.com/apache/datafusion/issues/14835#issuecomment-2676838578 > Revisit the functions that use `make_scalar_function` and identify those where it is no longer necessary. If there are functions that no longer require it, remove the usage enti

Re: [I] Optimize nested joins [datafusion]

2025-02-23 Thread via GitHub
alamb closed issue #128: Optimize nested joins URL: https://github.com/apache/datafusion/issues/128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gi

Re: [PR] Update Community Events in concepts-readings-events.md [datafusion]

2025-02-23 Thread via GitHub
alamb commented on PR #14629: URL: https://github.com/apache/datafusion/pull/14629#issuecomment-2676847534 > I'd definitely be willing to do so, but I'm going to need someone to help me list what we need to mention and what details they should include. I'm unfortunately not well-versed in t

Re: [I] [Epic] DataFusion Blogs [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14836: URL: https://github.com/apache/datafusion/issues/14836#issuecomment-2676847386 In general, I also think @andygrove 's pattern of creating a blog post for each comet release is amazing. For example: - https://github.com/apache/datafusion-site/pull/56

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-23 Thread via GitHub
alamb commented on PR #57: URL: https://github.com/apache/datafusion-site/pull/57#issuecomment-2676854405 I'll plan to merge this in the tomorrow unless anyone else would like time to comment -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] build(deps): bump arrow from 54.1.0 to 54.2.0 [datafusion-python]

2025-02-23 Thread via GitHub
alamb merged PR #1035: URL: https://github.com/apache/datafusion-python/pull/1035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-23 Thread via GitHub
ozankabak commented on PR #57: URL: https://github.com/apache/datafusion-site/pull/57#issuecomment-2676856751 We will take a look tomorrow and come up with suggestions if we can think of anything -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Simplify `FileSource::create_file_opener`'s signature [datafusion]

2025-02-23 Thread via GitHub
alamb commented on PR #14798: URL: https://github.com/apache/datafusion/pull/14798#issuecomment-2676857066 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Simplify `FileSource::create_file_opener`'s signature [datafusion]

2025-02-23 Thread via GitHub
alamb merged PR #14798: URL: https://github.com/apache/datafusion/pull/14798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Simplify `FileSource::create_file_opener`'s signature [datafusion]

2025-02-23 Thread via GitHub
alamb commented on PR #14798: URL: https://github.com/apache/datafusion/pull/14798#issuecomment-2676857121 Thanks again @AdamGS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-23 Thread via GitHub
timsaucer commented on PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#issuecomment-2676857092 Thank you! I can take care of the pypi upload today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-23 Thread via GitHub
alamb commented on PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#issuecomment-2676856559 Thanks @timsaucer ! I did the final upload / release here: The release is available here: https://dist.apache.org/repos/dist/release/datafusion/datafusion-python-45.2.

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2676857655 @xudong963 are we still thinking of trying to get the release ready this upcoming week? I will try and focus my efforts starting tomorrow on ensuring the bugs listed in "B

Re: [PR] feat: pretty explain [datafusion]

2025-02-23 Thread via GitHub
irenjj commented on PR #14677: URL: https://github.com/apache/datafusion/pull/14677#issuecomment-2676863874 PTAL @alamb @xudong963 👀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-23 Thread via GitHub
alamb commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2676862136 I plan to review this one carefully tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] substrait generated by Apache Calcite does not run in DataFusion [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14831: URL: https://github.com/apache/datafusion/issues/14831#issuecomment-2676863189 @niebayes said they may be interested in this: - https://github.com/apache/datafusion/issues/14373#issuecomment-2667370515 -- This is an automated message from the Apache Git

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-23 Thread via GitHub
alamb commented on PR #57: URL: https://github.com/apache/datafusion-site/pull/57#issuecomment-2676866630 I'll wait for your comments before releasing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-02-23 Thread via GitHub
alamb commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2676860682 Hi @geoffreyclaude @NGA-TRAN -- I will plan to review this PR shortly. Sorry for the delay. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-23 Thread via GitHub
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2676870028 I'll try and take a stab at it, @alamb do you have a preference as to how many PRs I should break it into? There are no logical changes but I expect a very large numbers of small

Re: [PR] Remove unused crate dependencies [datafusion]

2025-02-23 Thread via GitHub
xudong963 merged PR #14827: URL: https://github.com/apache/datafusion/pull/14827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] Introduce Async User Defined Functions [datafusion]

2025-02-23 Thread via GitHub
goldmedal opened a new pull request, #14837: URL: https://github.com/apache/datafusion/pull/14837 ## Which issue does this PR close? - Closes #6518. ## Rationale for this change I have been working with @alamb to implement the functional for the async UDF. - http

Re: [I] Async User Defined Functions (UDF) [datafusion]

2025-02-23 Thread via GitHub
goldmedal commented on issue #6518: URL: https://github.com/apache/datafusion/issues/6518#issuecomment-2676875774 I have created a draft PR for this issue. - https://github.com/apache/datafusion/pull/14837 It still has some remaining work, but feel free to share your opinion. --

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2676876878 > I'll try and take a stab at it, [@alamb](https://github.com/alamb) do you have a preference as to how many PRs I should break it into? There are no logical changes but I expect

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-23 Thread via GitHub
xudong963 commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2676878054 > Specifically, if we let `X_i` denote the value of `i`th row of column `X`, the maximum value for the column would be `M = max(X_1, ..., X_N)` with `N` being the number of rows. G

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2676877830 > btw after changes in 45.0.0 the image size is 49M 🎉 Nice! Do you know what changed? Indeed I checked on my mac after doing `cargo build --release` and the size is

Re: [I] Add a hint about expected extension in error message in register_csv, register_parquet, register_json [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14144: URL: https://github.com/apache/datafusion/issues/14144#issuecomment-2676880600 > [@cj-zhukov](https://github.com/cj-zhukov) [@alamb](https://github.com/alamb) hi, I've just hit this one. judging file content by an extension doesn't feel fully right to me (as

Re: [I] Nested Fields Access on StructArray field not working [datafusion]

2025-02-23 Thread via GitHub
alamb commented on issue #14768: URL: https://github.com/apache/datafusion/issues/14768#issuecomment-2676887133 ```sql > create or replace table my_table as values ({'a': 'A1', 'myobjects': [{'name': '1', 'value': 'V'}, {'name': '2', 'value': 'V2'}]}); 0 row(s) fetched. Elapsed 0.00

Re: [PR] Improve benchmark docs [datafusion]

2025-02-23 Thread via GitHub
alamb commented on code in PR #14820: URL: https://github.com/apache/datafusion/pull/14820#discussion_r1966797142 ## benchmarks/README.md: ## @@ -243,28 +244,92 @@ The `dfbench` program contains subcommands to run the various benchmarks. When benchmarking, it should always be

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-23 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2676905580 In `Bugs that would be good to fix`, four issues already have PRs, and one does not. I'll focus on reviewing the four in the next two days and plan to update version and chang

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-23 Thread via GitHub
xudong963 commented on code in PR #14397: URL: https://github.com/apache/datafusion/pull/14397#discussion_r1966805448 ## dev/changelog/45.0.0.md: ## Review Comment: @alamb What tool do you use to generate the doc? -- This is an automated message from the Apache Git Serv

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-23 Thread via GitHub
andygrove commented on code in PR #14397: URL: https://github.com/apache/datafusion/pull/14397#discussion_r1966806198 ## dev/changelog/45.0.0.md: ## Review Comment: The release process is documented at https://github.com/apache/datafusion-comet/blob/main/dev/release/README

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-23 Thread via GitHub
andygrove commented on code in PR #14397: URL: https://github.com/apache/datafusion/pull/14397#discussion_r1966806198 ## dev/changelog/45.0.0.md: ## Review Comment: The release process is documented at https://github.com/apache/datafusion-comet/blob/main/dev/release/README

Re: [PR] refactor: collect dataframe as stream in `__repr__` [datafusion-python]

2025-02-23 Thread via GitHub
konjac commented on PR #1015: URL: https://github.com/apache/datafusion-python/pull/1015#issuecomment-2676912860 > This looks good. It took me a while to parse through the logic of the `get_batches`. I think it's worth adding some documentation within the file to explain why we are doing t

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-23 Thread via GitHub
xudong963 commented on code in PR #14397: URL: https://github.com/apache/datafusion/pull/14397#discussion_r1966807754 ## dev/changelog/45.0.0.md: ## Review Comment: Thank you andy! -- This is an automated message from the Apache Git Service. To respond to the message, p

[PR] [WIP] [datafusion]

2025-02-23 Thread via GitHub
AdamGS opened a new pull request, #14838: URL: https://github.com/apache/datafusion/pull/14838 ## Which issue does this PR close? This PR doesn't close any specific issue, its part of the ongoing #1. ## Rationale for this change ## What changes are inc

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-23 Thread via GitHub
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2676927900 Ok I got a first draft that just moves `FileStream` and `FileScanConfig` and everything that comes with them. There are still some small issues to solve, mostly around test funct

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-23 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966815085 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [I] Feature: support Timestamp with TZ for function `to_unixtime` [datafusion]

2025-02-23 Thread via GitHub
xudong963 closed issue #14659: Feature: support Timestamp with TZ for function `to_unixtime` URL: https://github.com/apache/datafusion/issues/14659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] date_part is calculating results incorrectly for intervals [datafusion]

2025-02-23 Thread via GitHub
delamarch3 commented on issue #14817: URL: https://github.com/apache/datafusion/issues/14817#issuecomment-2676930233 I took a look into this and opened an issue in arrow: https://github.com/apache/arrow-rs/issues/7182 -- This is an automated message from the Apache Git Service. To respond

Re: [I] Feature: support Timestamp with TZ for function `to_unixtime` [datafusion]

2025-02-23 Thread via GitHub
xudong963 commented on issue #14659: URL: https://github.com/apache/datafusion/issues/14659#issuecomment-2676928561 Thanks, it works! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-23 Thread via GitHub
ozankabak commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2676934776 Thanks for all the comments and questions. I've incorporated the naming suggestion by @alamb (and updated many comments and variable names accordingly). I also switched to `Generic

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-23 Thread via GitHub
zhuqi-lucas commented on code in PR #14823: URL: https://github.com/apache/datafusion/pull/14823#discussion_r1966820859 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -414,6 +419,66 @@ impl ExternalSorter { Ok(used) } +/// Reconstruct `self.in_mem_batch

[PR] Ignore examples output directory [datafusion]

2025-02-23 Thread via GitHub
AdamGS opened a new pull request, #14840: URL: https://github.com/apache/datafusion/pull/14840 ## Which issue does this PR close? - Closes #14839. ## Rationale for this change Minor DX improvement. ## What changes are included in this PR?

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-23 Thread via GitHub
berkaysynnada commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r1966830680 ## datafusion/physical-plan/src/windows/mod.rs: ## @@ -498,6 +697,15 @@ pub fn get_window_mode( None } +fn all_possible_sort_options(expr: Arc) -> V

[I] DeltaLake integration not working (Python) [datafusion]

2025-02-23 Thread via GitHub
riziles opened a new issue, #14842: URL: https://github.com/apache/datafusion/issues/14842 ### Describe the bug After upgrading to deltalake (Python) 0.25.1, this basic example fails. Was working fine before. ```python from deltalake import DeltaTable, write_deltalake imp

Re: [I] DeltaLake integration not working (Python) [datafusion]

2025-02-23 Thread via GitHub
ion-elgreco commented on issue #14842: URL: https://github.com/apache/datafusion/issues/14842#issuecomment-2677014358 What error are you seeing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-23 Thread via GitHub
berkaysynnada commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r1966828820 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2086,1265 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input

[PR] fix: use `return_type_from_args` and mark nullable if any of the input is nullable [datafusion]

2025-02-23 Thread via GitHub
rluvaton opened a new pull request, #14841: URL: https://github.com/apache/datafusion/pull/14841 ## Which issue does this PR close? N/A ## Rationale for this change Currently because this function does not override `return_type_from_args`, it will return that the return

[PR] fix: Reduce number of shuffle spill files [wip] [datafusion-comet]

2025-02-23 Thread via GitHub
andygrove opened a new pull request, #1440: URL: https://github.com/apache/datafusion-comet/pull/1440 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1436 Builds on https://github.com/apache/datafusion-comet/pull/1439 #

Re: [I] DeltaLake integration not working (Python) [datafusion]

2025-02-23 Thread via GitHub
riziles commented on issue #14842: URL: https://github.com/apache/datafusion/issues/14842#issuecomment-2677013787 cross posted here: https://github.com/delta-io/delta-rs/discussions/3221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] fix: Reduce number of shuffle spill files [wip] [datafusion-comet]

2025-02-23 Thread via GitHub
codecov-commenter commented on PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#issuecomment-2677064393 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1440?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-02-23 Thread via GitHub
berkaysynnada commented on code in PR #14523: URL: https://github.com/apache/datafusion/pull/14523#discussion_r1966851842 ## datafusion/common/src/scalar/mod.rs: ## @@ -1583,6 +1583,17 @@ impl ScalarValue { } } +/// Returns negation for a boolean scalar value

Re: [PR] fix: use `return_type_from_args` and mark nullable if any of the input is nullable [datafusion]

2025-02-23 Thread via GitHub
rluvaton commented on code in PR #14841: URL: https://github.com/apache/datafusion/pull/14841#discussion_r1966840711 ## datafusion/functions/src/unicode/strpos.rs: ## @@ -83,6 +83,15 @@ impl ScalarUDFImpl for StrposFunc { utf8_to_int_type(&arg_types[0], "strpos/instr/po

Re: [PR] Prepare for `45.0.0` release: Version and Changelog [datafusion]

2025-02-23 Thread via GitHub
andygrove commented on code in PR #14397: URL: https://github.com/apache/datafusion/pull/14397#discussion_r1966806560 ## dev/changelog/45.0.0.md: ## Review Comment: The release process is documented at https://github.com/apache/datafusion/blob/main/dev/release/README.md

Re: [I] DeltaLake integration not working (Python) [datafusion]

2025-02-23 Thread via GitHub
riziles commented on issue #14842: URL: https://github.com/apache/datafusion/issues/14842#issuecomment-2677027474 Nothing. It just silently crashes on the `register_table_provider` step. Can't debug or anything. Never seen that in Python before. -- This is an automated message from the Ap

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-23 Thread via GitHub
berkaysynnada commented on PR #14813: URL: https://github.com/apache/datafusion/pull/14813#issuecomment-2677017783 PTAL @ozankabak, @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] Can we do a release? [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
iajoiner opened a new issue, #1740: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1740 I really hope we can have a release soon so that https://github.com/apache/datafusion-sqlparser-rs/pull/1730 can get in -- This is an automated message from the Apache Git Service. To re

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2025-02-23 Thread via GitHub
comphead commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2677151841 so in the data above(ARM Macos) the biggest parts are - code. compiled instructions 41MB - consts (2-3MB) @alamb WDYT should we dig deeper? -- This is an autom

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2025-02-23 Thread via GitHub
comphead commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2677180516 I checked the biggest methods are std panic methods, removing unwind can save even more ``` panic = "abort" ``` ``` du -s -h target/release/datafusion-c

Re: [I] Review the need of `make_scalar_function` for `functions` [datafusion]

2025-02-23 Thread via GitHub
jayzhan211 commented on issue #14835: URL: https://github.com/apache/datafusion/issues/14835#issuecomment-2677241293 > I think trying out @findepi 's Simple Functions #12635 would be an excellent candidate for these functions. The ones that use make_scalar_function are all fairly simple an

Re: [PR] feat: pretty explain [datafusion]

2025-02-23 Thread via GitHub
irenjj commented on PR #14677: URL: https://github.com/apache/datafusion/pull/14677#issuecomment-2677245452 > Happy to see this happen. > > Maybe it's better to open a tracking issue to record next todos, such as rich other physical operator's information to display. Then we can make

Re: [PR] Add support column prefix index for MySQL [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
zzzdong commented on code in PR #1732: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1732#discussion_r1966963134 ## src/ast/mod.rs: ## @@ -8591,6 +8591,61 @@ pub enum CopyIntoSnowflakeKind { Location, } +/// Index Field +/// +/// This structure used here [

Re: [PR] fix: use `return_type_from_args` and mark nullable if any of the input is nullable [datafusion]

2025-02-23 Thread via GitHub
jayzhan211 commented on code in PR #14841: URL: https://github.com/apache/datafusion/pull/14841#discussion_r1966963244 ## datafusion/functions/src/unicode/strpos.rs: ## @@ -83,6 +83,15 @@ impl ScalarUDFImpl for StrposFunc { utf8_to_int_type(&arg_types[0], "strpos/instr/

Re: [PR] Implement actual count wildcard in physical layer and fix duplicated schema name error from count wildcard [datafusion]

2025-02-23 Thread via GitHub
jayzhan211 commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2677255180 fix the extended test in main branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-23 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966955590 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] Update hashbrown requirement from 0.14.5 to 0.15.2 [datafusion]

2025-02-23 Thread via GitHub
github-actions[bot] commented on PR #13557: URL: https://github.com/apache/datafusion/pull/13557#issuecomment-2677291102 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix(physical-expr): Remove empty constants check when ordering is satisfied [datafusion]

2025-02-23 Thread via GitHub
rkrishn7 commented on PR #14829: URL: https://github.com/apache/datafusion/pull/14829#issuecomment-2677289153 Thanks @alamb @berkaysynnada! Yeah, not too sure why it was there 🤔. For additional context, this also was occurring for `UNION ALL`, when only one input in the union had an o

Re: [I] substrait generated by Apache Calcite does not run in DataFusion [datafusion]

2025-02-23 Thread via GitHub
niebayes commented on issue #14831: URL: https://github.com/apache/datafusion/issues/14831#issuecomment-2677324032 Query federation is amazing. I will first look at the Substrait's consumer-testing repo for DataFusion. -- This is an automated message from the Apache Git Service. To respo

Re: [I] Attach `Diagnostic` to "function x does not exist" error [datafusion]

2025-02-23 Thread via GitHub
onlyjackfrost commented on issue #14430: URL: https://github.com/apache/datafusion/issues/14430#issuecomment-2677326478 Hi @eliaperantoni, just updating my progress. I've implemented the diagnostic like this. @eliaperantoni thanks for implementing the `try_from_sqlparser_span` in the

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-02-23 Thread via GitHub
TheBuilderJR commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2677466140 @alamb given that the arrow folks don't seem super motivated to fix this in a timely manner, can we do a fix on the datafusion side? Maybe the fix is we can try to do an ar

Re: [PR] Add support for `Dictionary` to AST datatype in unparser [datafusion]

2025-02-23 Thread via GitHub
phillipleblanc commented on PR #14783: URL: https://github.com/apache/datafusion/pull/14783#issuecomment-2677510102 > Thank you @cetra3 > > Ideally we would write a test for this., but since I couldn't find any existing tests for data types, I don't think it is strictly necessary here

Re: [PR] Add support for `Dictionary` to AST datatype in unparser [datafusion]

2025-02-23 Thread via GitHub
phillipleblanc commented on code in PR #14783: URL: https://github.com/apache/datafusion/pull/14783#discussion_r1967078010 ## datafusion/sql/src/unparser/expr.rs: ## @@ -1624,9 +1624,7 @@ impl Unparser<'_> { DataType::Union(_, _) => { not_impl_err!(

Re: [PR] chore: Strip debuginfo symbols for release [datafusion]

2025-02-23 Thread via GitHub
rkrishn7 commented on PR #14843: URL: https://github.com/apache/datafusion/pull/14843#issuecomment-2677527403 🙌🏾 One consideration re: `panic=abort` is that it will cause the program to crash upon some task panicking. Unclear on whether this is more/less desirable than bubbling up t

Re: [PR] Add support for `Dictionary` to AST datatype in unparser [datafusion]

2025-02-23 Thread via GitHub
cetra3 commented on code in PR #14783: URL: https://github.com/apache/datafusion/pull/14783#discussion_r1967088349 ## datafusion/sql/src/unparser/expr.rs: ## @@ -1624,9 +1624,7 @@ impl Unparser<'_> { DataType::Union(_, _) => { not_impl_err!("Unsuppo

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-23 Thread via GitHub
kosiew commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1967089979 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified c

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-23 Thread via GitHub
kosiew commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1967089979 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified c

Re: [PR] chore: Strip debuginfo symbols for release [datafusion]

2025-02-23 Thread via GitHub
rkrishn7 commented on code in PR #14843: URL: https://github.com/apache/datafusion/pull/14843#discussion_r1967096334 ## Cargo.toml: ## @@ -159,19 +159,20 @@ url = "2.5.4" [profile.release] codegen-units = 1 lto = true +debug = false Review Comment: nit: I believe this is

Re: [I] DuplicateQualifiedField With Paritioned Data [datafusion-python]

2025-02-23 Thread via GitHub
kosiew commented on issue #1018: URL: https://github.com/apache/datafusion-python/issues/1018#issuecomment-2677496800 hi @cfis > Not sure if this error is from Arrow or from the way DataFusion It's from Datafusion https://github.com/apache/datafusion/blob/c92982c393c69cbc4f5

[PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-23 Thread via GitHub
Lordworms opened a new pull request, #14844: URL: https://github.com/apache/datafusion/pull/14844 ## Which issue does this PR close? - Closes #13286 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-23 Thread via GitHub
Lordworms commented on PR #14844: URL: https://github.com/apache/datafusion/pull/14844#issuecomment-2677559931 I haven't find a good place to add tests, but here is the results ![image](https://github.com/user-attachments/assets/01497acb-dc7c-4f9c-bb0c-beab65b0e041) -- This is an a

[PR] Update gsoc_application_guidelines.md [datafusion]

2025-02-23 Thread via GitHub
oznur-synnada opened a new pull request, #14845: URL: https://github.com/apache/datafusion/pull/14845 Updated the expired Discord link with a valid one ## Which issue does this PR close? ## Rationale for this change ## What changes are included in this

Re: [PR] Add support for `Dictionary` to AST datatype in unparser [datafusion]

2025-02-23 Thread via GitHub
cetra3 commented on code in PR #14783: URL: https://github.com/apache/datafusion/pull/14783#discussion_r196712 ## datafusion/sql/src/unparser/expr.rs: ## @@ -1586,7 +1586,7 @@ impl Unparser<'_> { not_impl_err!("Unsupported DataType: conversion: {data_type:?

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-23 Thread via GitHub
Lordworms commented on PR #14844: URL: https://github.com/apache/datafusion/pull/14844#issuecomment-2677595271 ![image](https://github.com/user-attachments/assets/23c8dd3e-c3f1-427b-b9b4-aa6c85bf4c38) -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Update gsoc_application_guidelines.md [datafusion]

2025-02-23 Thread via GitHub
oznur-synnada closed pull request #14845: Update gsoc_application_guidelines.md URL: https://github.com/apache/datafusion/pull/14845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
eliaperantoni commented on code in PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#discussion_r1967128017 ## src/ast/value.rs: ## @@ -26,10 +26,25 @@ use bigdecimal::BigDecimal; #[cfg(feature = "serde")] use serde::{Deserialize, Serialize}; -use c

Re: [I] Attach `Diagnostic` to "function x does not exist" error [datafusion]

2025-02-23 Thread via GitHub
eliaperantoni commented on issue #14430: URL: https://github.com/apache/datafusion/issues/14430#issuecomment-2677611141 @onlyjackfrost yes that's perfect! > I'm trying to add test case in the /sql/cases/diagnostic.rs file... will raise PR later. Sweet! Looking forward to it. Gr

[PR] Update website links [datafusion]

2025-02-23 Thread via GitHub
oznur-synnada opened a new pull request, #14846: URL: https://github.com/apache/datafusion/pull/14846 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-02-23 Thread via GitHub
iffyio commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2677624981 Adding an MSRV sounds good to me! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >