Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-20 Thread via GitHub
waynexia commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2673032822 >@waynexia, can you help mentor a student working on improving WASM support? Sure! Hope it's not too late 🙈. I'm willing and happy to draft a proposal if this isn't expir

Re: [PR] test: Register Spark-compatible expressions with a DataFusion context [datafusion-comet]

2025-02-20 Thread via GitHub
codecov-commenter commented on PR #1432: URL: https://github.com/apache/datafusion-comet/pull/1432#issuecomment-2673017366 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1432?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-20 Thread via GitHub
parthchandra commented on PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#issuecomment-2673138542 Do we know why Spark's decision is so bad to start with? Spark has the same logic here: https://github.com/apache/spark/blob/fb17856a22be6968b2ed55ccbd7cf72111920bea/sql/ca

Re: [I] Improve datafusion-cli memory usage and considering reserve memory for the result batches [datafusion]

2025-02-20 Thread via GitHub
zhuqi-lucas closed issue #14751: Improve datafusion-cli memory usage and considering reserve memory for the result batches URL: https://github.com/apache/datafusion/issues/14751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] feat: implement contextualized ObjectStore [datafusion]

2025-02-20 Thread via GitHub
waynr opened a new pull request, #14805: URL: https://github.com/apache/datafusion/pull/14805 - **chore(temporary): patch objectstore with local path** - **feat: implement ContextualizedObjectStore that passes along session state via GetOptions** - **chore: update temporary object_stor

Re: [PR] chore: Add simple complex type microbenchmark [datafusion-comet]

2025-02-20 Thread via GitHub
codecov-commenter commented on PR #1433: URL: https://github.com/apache/datafusion-comet/pull/1433#issuecomment-2673049985 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1433?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-20 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1964563153 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +169,13 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sor

Re: [PR] Simple Functions Preview [datafusion]

2025-02-20 Thread via GitHub
jayzhan211 commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2673054919 > I don't have expectations that generic implementation will be faster than 50x[1](https://github.com/apache/datafusion/pull/14668#user-content-fn-1-8db9f2d544d304aebaed47866e77cf6

[PR] Chore/Add additional FFI unit tests [datafusion]

2025-02-20 Thread via GitHub
timsaucer opened a new pull request, #14802: URL: https://github.com/apache/datafusion/pull/14802 ## Which issue does this PR close? None ## Rationale for this change Improve code coverage ## What changes are included in this PR? Adds unit tests, no function

Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]

2025-02-20 Thread via GitHub
niebayes commented on code in PR #14792: URL: https://github.com/apache/datafusion/pull/14792#discussion_r1964634594 ## datafusion/functions/src/datetime/current_date.rs: ## @@ -81,16 +81,19 @@ impl ScalarUDFImpl for CurrentDateFunc { Ok(Date32) } -fn invoke_

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-20 Thread via GitHub
andygrove commented on PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#issuecomment-2673152457 > Do we know why Spark's decision is so bad to start with? Spark has the same logic here: https://github.com/apache/spark/blob/fb17856a22be6968b2ed55ccbd7cf72111920bea/sql/cat

Re: [I] Make the `serde` feature of `sqlparser` no_std [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio closed issue #1729: Make the `serde` feature of `sqlparser` no_std URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] fix: make `serde` feature no_std [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio merged PR #1730: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Implement SnowFlake ALTER SESSION [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio merged PR #1712: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for `ORDER BY ALL` [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio commented on PR #1724: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1724#issuecomment-2673590812 @PokIsemaine could you look into the merge conflicts on the branch when you get the time? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers [datafusion-comet]

2025-02-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1415: URL: https://github.com/apache/datafusion-comet/pull/1415#discussion_r1963031627 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1001,7 +1012,7 @@ abstract class ParquetReadSuite extends CometTestBas

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
LucaCappelletti94 commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2670746858 At this time I am somewhat stuck as I would not know how to parse the class operator names if not as a keyword - do you have any examples that could be applied

[PR] chore(deps): bump serde_json from 1.0.138 to 1.0.139 [datafusion]

2025-02-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14784: URL: https://github.com/apache/datafusion/pull/14784 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.138 to 1.0.139. Release notes Sourced from https://github.com/serde-rs/json/releases";>serde_json's releases.

[PR] chore(deps): bump testcontainers from 0.23.2 to 0.23.3 [datafusion]

2025-02-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14787: URL: https://github.com/apache/datafusion/pull/14787 Bumps [testcontainers](https://github.com/testcontainers/testcontainers-rs) from 0.23.2 to 0.23.3. Release notes Sourced from https://github.com/testcontainers/testcontainers-rs/

[PR] chore(deps): bump arrow-flight from 54.1.0 to 54.2.0 [datafusion]

2025-02-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14786: URL: https://github.com/apache/datafusion/pull/14786 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 54.1.0 to 54.2.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow-flight's releases.

[PR] chore(deps): bump sqllogictest from 0.27.1 to 0.27.2 [datafusion]

2025-02-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14785: URL: https://github.com/apache/datafusion/pull/14785 Bumps [sqllogictest](https://github.com/risinglightdb/sqllogictest-rs) from 0.27.1 to 0.27.2. Release notes Sourced from https://github.com/risinglightdb/sqllogictest-rs/releases

[PR] chore(deps): bump serde from 1.0.217 to 1.0.218 [datafusion]

2025-02-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14788: URL: https://github.com/apache/datafusion/pull/14788 Bumps [serde](https://github.com/serde-rs/serde) from 1.0.217 to 1.0.218. Release notes Sourced from https://github.com/serde-rs/serde/releases";>serde's releases. v1.0.218

Re: [I] Enable `used_underscore_binding` clippy lint [datafusion]

2025-02-20 Thread via GitHub
Ramjee194 commented on issue #14649: URL: https://github.com/apache/datafusion/issues/14649#issuecomment-2671014453 can you assign this issue #14649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] 14709 : migrated all the UDFS to invoke_with_args [datafusion]

2025-02-20 Thread via GitHub
niebayes commented on PR #14779: URL: https://github.com/apache/datafusion/pull/14779#issuecomment-2671012993 Hi @sidshehria After reviewing your code multiple times, it seems you're relatively new to Rust and the DataFusion community. I recommend taking some time to familiarize yourself

Re: [I] Unpin the pyarrow builder image to use ubuntu-latest [datafusion]

2025-02-20 Thread via GitHub
Owen-CH-Leung commented on issue #14776: URL: https://github.com/apache/datafusion/issues/14776#issuecomment-2671019446 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Enable `used_underscore_binding` clippy lint [datafusion]

2025-02-20 Thread via GitHub
ding-young commented on issue #14649: URL: https://github.com/apache/datafusion/issues/14649#issuecomment-2671028126 Hi @Ramjee194, I assigned myself with writing "take" (you can also do this for other issues, too!) and already working on this issue. :) -- This is an automated message fr

Re: [I] A 'cache control' header is missing or empty webkit [datafusion]

2025-02-20 Thread via GitHub
Ramjee194 commented on issue #14542: URL: https://github.com/apache/datafusion/issues/14542#issuecomment-2671026346 the emogi images is not should be fix to line .the images has been wraping and frontend not looks like good -- This is an automated message from the Apache Git Service. To r

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-20 Thread via GitHub
EmilyMatt commented on code in PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#discussion_r1963121789 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -216,6 +216,17 @@ object CometConf extends ShimCometConf { val COMET_EXEC_INITCAP_ENABLED

Re: [PR] Make Expr::alias and alias_qualified smarter by calling unalias [datafusion]

2025-02-20 Thread via GitHub
joroKr21 commented on PR #14749: URL: https://github.com/apache/datafusion/pull/14749#issuecomment-2671067679 Marking this ready for review because I need some feedback. It looks like `plan_to_sql` relies on these nested aliases in a non-trivial way. Is this intended? ``` roundtrip sq

Re: [PR] chore(deps): bump serde_json from 1.0.138 to 1.0.139 [datafusion]

2025-02-20 Thread via GitHub
alamb merged PR #14784: URL: https://github.com/apache/datafusion/pull/14784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-20 Thread via GitHub
timsaucer opened a new pull request, #1031: URL: https://github.com/apache/datafusion-python/pull/1031 # Which issue does this PR close? Closes #1027 # Rationale for this change User requested. # What changes are included in this PR? Adds a page to the onl

Re: [PR] feat: Improve datafusion-cli memory usage and considering reserve mem… [datafusion]

2025-02-20 Thread via GitHub
zhuqi-lucas commented on PR #14766: URL: https://github.com/apache/datafusion/pull/14766#issuecomment-2671279588 > Thanks @zhuqi-lucas and @2010YOUY01 > > I think it would be nice to remove the unused `stop_after_max_rows` option now, but we could also do it as a follow on PR too if y

Re: [PR] fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers [datafusion-comet]

2025-02-20 Thread via GitHub
parthchandra commented on PR #1415: URL: https://github.com/apache/datafusion-comet/pull/1415#issuecomment-2671283538 @kazuyukitanimura addressed your last comment and also rebased (there were merge conflicts). Test failure count after rebase: ``` native_datafusion: Tests: suc

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-20 Thread via GitHub
EmilyMatt commented on code in PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#discussion_r1963122920 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -216,6 +216,17 @@ object CometConf extends ShimCometConf { val COMET_EXEC_INITCAP_ENABLED

Re: [PR] Extend Visitor trait for Value type [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
tomershaniii commented on code in PR #1725: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1725#discussion_r1963365481 ## src/ast/visitor.rs: ## @@ -889,34 +909,74 @@ mod tests { ), ]; for (sql, expected) in tests { -let a

Re: [I] [EPIC] Substrait: Add producer and consumer for physical plans [datafusion]

2025-02-20 Thread via GitHub
niebayes commented on issue #5173: URL: https://github.com/apache/datafusion/issues/5173#issuecomment-2670953355 @Blizzara Thanks for your reply. I initially choose the physical plan because there're more computation can be distributed to executors in a distributed query engine. Say

Re: [PR] fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers [datafusion-comet]

2025-02-20 Thread via GitHub
parthchandra commented on code in PR #1415: URL: https://github.com/apache/datafusion-comet/pull/1415#discussion_r1963221463 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1001,7 +1012,7 @@ abstract class ParquetReadSuite extends CometTestBase {

Re: [I] Unpin the pyarrow builder image to use ubuntu-latest [datafusion]

2025-02-20 Thread via GitHub
Owen-CH-Leung commented on issue #14776: URL: https://github.com/apache/datafusion/issues/14776#issuecomment-2671141224 May I know if it is strictly necessary to stick with python 3.8 ? Can we just run this CI with python 3.9 ? https://github.com/apache/datafusion/blob/main/.github/w

Re: [PR] Add support for DISTINCT + ORDER BY in ARRAY_AGG [datafusion]

2025-02-20 Thread via GitHub
gabotechs commented on code in PR #14413: URL: https://github.com/apache/datafusion/pull/14413#discussion_r1963100216 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg { let data_type = acc_args.exprs[0].data_

[PR] Change to ubuntu-latest to observe the error msg [datafusion]

2025-02-20 Thread via GitHub
Owen-CH-Leung opened a new pull request, #14790: URL: https://github.com/apache/datafusion/pull/14790 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Add support for DISTINCT + ORDER BY in ARRAY_AGG [datafusion]

2025-02-20 Thread via GitHub
gabotechs commented on code in PR #14413: URL: https://github.com/apache/datafusion/pull/14413#discussion_r1963095533 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg { let data_type = acc_args.exprs[0].data_

Re: [PR] Fix CI fail for extended test (by freeing up more disk space in CI runner) [datafusion]

2025-02-20 Thread via GitHub
alamb commented on code in PR #14745: URL: https://github.com/apache/datafusion/pull/14745#discussion_r1963434459 ## .github/workflows/extended.yml: ## @@ -39,43 +39,54 @@ jobs: linux-build-lib: name: linux build test runs-on: ubuntu-latest -container: - im

Re: [PR] dependabot: group arrow/parquet minor/patch bumps, remove limit [datafusion]

2025-02-20 Thread via GitHub
alamb merged PR #14730: URL: https://github.com/apache/datafusion/pull/14730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] test: Register Spark-compatible expressions with a DataFusion context [datafusion-comet]

2025-02-20 Thread via GitHub
viczsaurav opened a new pull request, #1432: URL: https://github.com/apache/datafusion-comet/pull/1432 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1365 ## Rationale for this change Showcase way to register Spark-compatible e

[PR] chore: Add simple complex type microbenchmark [datafusion-comet]

2025-02-20 Thread via GitHub
andygrove opened a new pull request, #1433: URL: https://github.com/apache/datafusion-comet/pull/1433 ## Which issue does this PR close? Part of ## Rationale for this change We need to start running benchmarks for complex type support. ## What chan

[PR] Ignore escaped LIKE wildcards in MySQL [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
mvzink opened a new pull request, #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735 MySQL has a special case for escaped LIKE wildcards appearing in string literals: the escaping is ignored, whereas normally for any other (non-special) character, the backslash would be

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-20 Thread via GitHub
ozankabak commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2673204321 @edmondop, maybe I can offer some clarification here. What we want is a computational framework that gives us how statistical quantities transform under functions defined by expres

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iajoiner commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2673750295 @iffyio Yes! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Allow setting the recursion limit for sql parsing [datafusion]

2025-02-20 Thread via GitHub
adriangb commented on PR #14756: URL: https://github.com/apache/datafusion/pull/14756#issuecomment-2673767266 I'll note that we encountered this when parsing a dynamically generated expression containing a lot of `AND`s and `OR`s. I believe these sorts of expressions are parsed recursively,

Re: [I] Implement ALTER SESSION [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio closed issue #1710: Implement ALTER SESSION URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Duplicate Unqualified Field Name [datafusion]

2025-02-20 Thread via GitHub
kosiew commented on issue #14799: URL: https://github.com/apache/datafusion/issues/14799#issuecomment-2673788381 The error "Schema error: Ambiguous reference to unqualified field id" occurs because multiple tables in your query contain a column named id, and you're using USING (id), which r

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-20 Thread via GitHub
adriangb commented on code in PR #14362: URL: https://github.com/apache/datafusion/pull/14362#discussion_r1964478571 ## datafusion/common/src/dfschema.rs: ## @@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier: Option<&TableReference>, name: &str) -> String } } +///

[PR] feat: use Edition 2024 [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iajoiner opened a new pull request, #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736 The only warning I got was ```bash warning: relative drop order changing in Rust 2024 --> src/test_utils.rs:305:11 | 304 | let mut iter = v.into_iter();

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-20 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1964584626 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] feat: adjust create and drop trigger for mysql dialect [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio commented on code in PR #1734: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1734#discussion_r1964928671 ## src/parser/mod.rs: ## @@ -5061,20 +5066,19 @@ impl<'a> Parser<'a> { } pub fn parse_trigger_period(&mut self) -> Result { -Ok( -

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-20 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2673605817 @waynexia -- of course not, thank you for getting involved! All -- the decision on our application as an organization will probably be announced soon. I will let you kno

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2673621460 > At this time I am somewhat stuck as I would not know how to parse the class operator names if not as a keyword - do you have any examples that could be applied to this u

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
iffyio commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2673610757 @iajoiner could you take a look at the ci failure? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] chore: Update protobuf to 3.25.5 [datafusion-comet]

2025-02-20 Thread via GitHub
kazuyukitanimura opened a new pull request, #1434: URL: https://github.com/apache/datafusion-comet/pull/1434 ## Which issue does this PR close? ## Rationale for this change To fix [CVE-2024-7254](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-7254) ## What chang

Re: [PR] Fix Clippy 1.85 warnings [datafusion]

2025-02-20 Thread via GitHub
jonahgao commented on PR #14800: URL: https://github.com/apache/datafusion/pull/14800#issuecomment-2673222180 It seems we should also update [rustfmt.toml](https://github.com/apache/datafusion/blob/main/rust-toolchain.toml) to let CI use the new version. -- This is an automated message f

Re: [I] DuplicateQualifiedField With Paritioned Data [datafusion-python]

2025-02-20 Thread via GitHub
kosiew commented on issue #1018: URL: https://github.com/apache/datafusion-python/issues/1018#issuecomment-2673258796 Managed to reproduce this without S3 ``` import os import tempfile import pyarrow as pa import pyarrow.parquet as pq import datafusion def create

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-20 Thread via GitHub
parthchandra commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1964726051 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -67,4 +78,21 @@ object RewriteJoin extends JoinSelectionHelper { } cas

[PR] chore: Update guava to 33.2.1-jre [datafusion-comet]

2025-02-20 Thread via GitHub
kazuyukitanimura opened a new pull request, #1435: URL: https://github.com/apache/datafusion-comet/pull/1435 ## Which issue does this PR close? ## Rationale for this change To fix [CVE-2023-2976](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-2976) [CVE-2020-8908]

Re: [PR] Fix CI fail for extended test (by freeing up more disk space in CI runner) [datafusion]

2025-02-20 Thread via GitHub
2010YOUY01 merged PR #14745: URL: https://github.com/apache/datafusion/pull/14745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-20 Thread via GitHub
hayman42 commented on PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#issuecomment-2673430269 @andygrove Thanks for opening this PR! I have one questions though. I also tried to apply the same build side selection logic but found that with multi executors, the Com

Re: [PR] DF 45 blog post [datafusion-site]

2025-02-20 Thread via GitHub
2010YOUY01 commented on code in PR #57: URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1964838748 ## content/blog/2025-02-20-datafusion-45.0.0.md: ## @@ -0,0 +1,300 @@ +--- +layout: post +title: Apache DataFusion 45.0.0 Released +date: 2025-02-20 +categories:

Re: [PR] chore: Update guava to 33.2.1-jre [datafusion-comet]

2025-02-20 Thread via GitHub
codecov-commenter commented on PR #1435: URL: https://github.com/apache/datafusion-comet/pull/1435#issuecomment-2673490619 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1435?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Pass SessionConfig extensions to ObjectStore used during physical plan execution [datafusion]

2025-02-20 Thread via GitHub
waynr opened a new issue, #14804: URL: https://github.com/apache/datafusion/issues/14804 ### Is your feature request related to a problem or challenge? In https://github.com/apache/arrow-rs/issues/7155 I've described a general need for the `ObjectStore` trait to be able to support pas

Re: [PR] feat: Improve datafusion-cli memory usage and considering reserve mem… [datafusion]

2025-02-20 Thread via GitHub
2010YOUY01 merged PR #14766: URL: https://github.com/apache/datafusion/pull/14766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] chore: Update protobuf to 3.25.5 [datafusion-comet]

2025-02-20 Thread via GitHub
codecov-commenter commented on PR #1434: URL: https://github.com/apache/datafusion-comet/pull/1434#issuecomment-2673352451 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1434?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-20 Thread via GitHub
parthchandra commented on PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#issuecomment-2673227075 > Spark is building a SortMergeJoin and we are replacing with ShuffledHashJoin. Makes sense -- This is an automated message from the Apache Git Service. To re

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-02-20 Thread via GitHub
zhuqi-lucas commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2673384556 Can't find a workaround for this, and i think the Schema::try_merge passed before this error. So when we map_schema at the end, we should still check the cast error w

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-20 Thread via GitHub
adriangb commented on code in PR #14362: URL: https://github.com/apache/datafusion/pull/14362#discussion_r1964480019 ## datafusion/common/src/dfschema.rs: ## @@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier: Option<&TableReference>, name: &str) -> String } } +///

[I] Support `UNNEST` as table function (UDTF) [datafusion]

2025-02-20 Thread via GitHub
waynexia opened a new issue, #14801: URL: https://github.com/apache/datafusion/issues/14801 ### Is your feature request related to a problem or challenge? `UNNEST` is implemented in two ways, `SELECT UNNEST(...)` is handled with other functions https://github.com/apache/datafusion/bl

Re: [PR] Fix CI fail for extended test (by freeing up more disk space in CI runner) [datafusion]

2025-02-20 Thread via GitHub
2010YOUY01 commented on code in PR #14745: URL: https://github.com/apache/datafusion/pull/14745#discussion_r1964763701 ## datafusion/core/tests/memory_limit/memory_limit_validation/sort_mem_validation.rs: ## @@ -67,10 +69,35 @@ fn sort_with_mem_limit_2_cols_2_runner() { spa

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-20 Thread via GitHub
ozankabak commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-267676 Thank you for the review @xudong963. Here are my thoughts on your questions: > 1. As the summary says, `StatisticsV2` will replace the usage of `Precision`, so the min/max/nd

Re: [I] extended_test (with memory limit tracking) are commented out [datafusion]

2025-02-20 Thread via GitHub
2010YOUY01 closed issue #14680: extended_test (with memory limit tracking) are commented out URL: https://github.com/apache/datafusion/issues/14680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[I] Inline disk cleanup script to replace third-party GitHub Action in CI [datafusion]

2025-02-20 Thread via GitHub
2010YOUY01 opened a new issue, #14803: URL: https://github.com/apache/datafusion/issues/14803 ### Is your feature request related to a problem or challenge? Our CI runner is running short in disk space and caused some test to fail, https://github.com/apache/datafusion/pull/14745 is tr

Re: [PR] Fix CI fail for extended test (by freeing up more disk space in CI runner) [datafusion]

2025-02-20 Thread via GitHub
2010YOUY01 commented on code in PR #14745: URL: https://github.com/apache/datafusion/pull/14745#discussion_r1964747688 ## .github/workflows/extended.yml: ## @@ -39,43 +39,54 @@ jobs: linux-build-lib: name: linux build test runs-on: ubuntu-latest -container: -

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-20 Thread via GitHub
parthchandra commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1964726051 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -67,4 +78,21 @@ object RewriteJoin extends JoinSelectionHelper { } cas

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2025-02-20 Thread via GitHub
comphead commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2673189679 btw after changes in 45.0.0 the image size is 49M 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-20 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1964585028 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] Add support for `Dictionary` to AST datatype in unparser [datafusion]

2025-02-20 Thread via GitHub
alamb commented on PR #14783: URL: https://github.com/apache/datafusion/pull/14783#issuecomment-2671337594 Thanks @cetra3 @phillipleblanc do you have any suggestions about where we could test this one? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-20 Thread via GitHub
alamb commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1963481027 ## datafusion/expr-common/src/signature.rs: ## @@ -466,6 +551,186 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +/// Represents type coercion

Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]

2025-02-20 Thread via GitHub
alamb commented on PR #14735: URL: https://github.com/apache/datafusion/pull/14735#issuecomment-2671325046 Thanks @clflushopt -- I will review this one over the next few days hopefully -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-20 Thread via GitHub
timsaucer commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2671308129 I know there are at least two people who have specifically reached out either on here or to me directly about GSoC for `datafusion-python`. I have opened an issue in that repo

Re: [PR] feat: Add ScalarUDF support in FFI crate [datafusion]

2025-02-20 Thread via GitHub
alamb commented on PR #14579: URL: https://github.com/apache/datafusion/pull/14579#issuecomment-2671329643 woohoo! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-20 Thread via GitHub
benrsatori commented on PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#issuecomment-2671531452 Hi @iffyio I fixed the CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]

2025-02-20 Thread via GitHub
varun-bhardwaj-sde opened a new pull request, #14792: URL: https://github.com/apache/datafusion/pull/14792 ## Which issue does this PR close? - Closes #14705. ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [I] Access a Map with a non-string keys [datafusion]

2025-02-20 Thread via GitHub
alamb closed issue #11785: Access a Map with a non-string keys URL: https://github.com/apache/datafusion/issues/11785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-20 Thread via GitHub
timsaucer commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2671311582 @Spaarsh @sidshehria I think this is a good place to coordinate specific ideas. I'm also seeking out an additional person who would be willing to mentor. We might also c

Re: [PR] Fix build after logical conflict [datafusion]

2025-02-20 Thread via GitHub
findepi merged PR #14791: URL: https://github.com/apache/datafusion/pull/14791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Add support for DISTINCT + ORDER BY in ARRAY_AGG [datafusion]

2025-02-20 Thread via GitHub
gabotechs commented on code in PR #14413: URL: https://github.com/apache/datafusion/pull/14413#discussion_r1963493195 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -131,7 +133,32 @@ impl AggregateUDFImpl for ArrayAgg { let data_type = acc_args.exprs[0].data_

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-20 Thread via GitHub
sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2671606569 @timsaucer can we resolve these issues right now which are given above? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] chore : migrated all the UDFS to invoke_with_args [datafusion]

2025-02-20 Thread via GitHub
alamb commented on PR #14779: URL: https://github.com/apache/datafusion/pull/14779#issuecomment-2671433162 Thanks @sidshehria and @niebayes Marking this one as a draft as we work through the comments -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] [DISCUSSION] Lowering the barrier to new users (Lessons from-799 CMU Optimizer Class) [datafusion]

2025-02-20 Thread via GitHub
lmwnshn commented on issue #14373: URL: https://github.com/apache/datafusion/issues/14373#issuecomment-2671641892 @ozankabak Thanks for the heads up. I am not sure if the timeline matches up with students already starting to pick projects, but perhaps in any future offerings of this course.

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-20 Thread via GitHub
timsaucer commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2671621389 Yes, of course. We're always interested in contribution to our open issues! -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] refactor: move `DataSource` to `datafusion-datasource` [datafusion]

2025-02-20 Thread via GitHub
mertak-synnada commented on code in PR #14671: URL: https://github.com/apache/datafusion/pull/14671#discussion_r1963665247 ## datafusion/physical-plan/src/test.rs: ## @@ -17,27 +17,347 @@ //! Utilities for testing datafusion-physical-plan +use std::any::Any; use std::colle

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-20 Thread via GitHub
edmondop commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1963629140 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -405,13 +406,18 @@ impl Interval { // There must be no way to create an interval whose endp

[I] Fallback reason missing for incompatible casts [datafusion-comet]

2025-02-20 Thread via GitHub
andygrove opened a new issue, #1429: URL: https://github.com/apache/datafusion-comet/issues/1429 ### Describe the bug I see a HashAggregate falling back to Spark but the root cause is hidden: ``` HashAggregate [COMET: Unsupported result expressions found in: List((0.2 * cast

  1   2   >