Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-03-17 Thread via GitHub
adriangb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2729472951 I've thought about it a bit and I think the way to get variant support into DataFusion is: - Add general support for shredding via per-file filter rewrites and projection ex

[I] [EPIC] A collection of tickets for improving sorting larger-than-ram datasets [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new issue, #15271: URL: https://github.com/apache/datafusion/issues/15271 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2729588907 Thanks @geoffreyclaude -- I will try and look at this. I think @goldmedal was also interested in this capability (using tracing via the `otel` crate -- I will try and find time

Re: [I] March 4, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15005: URL: https://github.com/apache/datafusion/issues/15005#issuecomment-2729594349 Next week: https://github.com/apache/datafusion/issues/15269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] March 4, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15005: March 4, 2025: This week(s) in DataFusion URL: https://github.com/apache/datafusion/issues/15005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Add support for `RAISE` statement [datafusion-sqlparser-rs]

2025-03-17 Thread via GitHub
alamb commented on code in PR #1766: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1766#discussion_r1998847620 ## src/ast/mod.rs: ## @@ -2256,6 +2256,57 @@ impl fmt::Display for ConditionalStatements { } } +/// A `RAISE` statement. +/// +/// Examples: +///

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-17 Thread via GitHub
goldmedal commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2729811174 I'll take a look this PR in a few day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
jsai28 commented on code in PR #15262: URL: https://github.com/apache/datafusion/pull/15262#discussion_r1998958709 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -75,34 +75,28 @@ async fn create_test_table() -> Result { } /// Executes an expression on the t

Re: [PR] docs: update documentation for Final GroupBy in accumulator.rs [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15279: URL: https://github.com/apache/datafusion/pull/15279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Weekly Plan (Andrew Lamb) March 10, 2025 [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15274: URL: https://github.com/apache/datafusion/issues/15274#issuecomment-2730373688 REview queue: - [ ] https://github.com/apache/datafusion/pull/15263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] chore: Re-enable GitHub discussions [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove merged PR #1535: URL: https://github.com/apache/datafusion-comet/pull/1535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] GitHub discussions have been disabled [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove closed issue #1533: GitHub discussions have been disabled URL: https://github.com/apache/datafusion-comet/issues/1533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] feat: Native support utf8view for binary operator [datafusion]

2025-03-17 Thread via GitHub
zhuqi-lucas opened a new pull request, #15275: URL: https://github.com/apache/datafusion/pull/15275 ## Which issue does this PR close? - Closes sub_task of [#15096](https://github.com/apache/datafusion/issues/15096) ## Rationale for this change feat: Native support utf8v

Re: [I] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15277: Implement tree explain for UnionExec URL: https://github.com/apache/datafusion/issues/15277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] fix: handle duplicate WindowFunction expressions in Substrait consumer [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15211: URL: https://github.com/apache/datafusion/pull/15211#discussion_r1999329931 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1074,6 +1074,9 @@ pub async fn from_project_rel( // leaving only explicit expressions.

Re: [PR] chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [datafusion-comet]

2025-03-17 Thread via GitHub
kazuyukitanimura commented on PR #1534: URL: https://github.com/apache/datafusion-comet/pull/1534#issuecomment-2730385010 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] fix: handle duplicate WindowFunction expressions in Substrait consumer [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15211: URL: https://github.com/apache/datafusion/pull/15211#discussion_r1999332505 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1074,6 +1074,9 @@ pub async fn from_project_rel( // leaving only explicit expressions.

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-03-17 Thread via GitHub
comphead commented on PR #14975: URL: https://github.com/apache/datafusion/pull/14975#issuecomment-2730388364 Sorry, it totally slipped my mind -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix: Unconditonally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-17 Thread via GitHub
rkrishn7 commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2730402547 Thanks @Omega359! @alamb Yes sorry for the delay, I can fix this up later today. To expand on my comment [here](https://github.com/apache/datafusion/pull/15242#issuecom

Re: [PR] Implement tree explain for PlaceholderRowExec [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15270: URL: https://github.com/apache/datafusion/pull/15270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Implement tree explain for `PlaceholderRowExec` [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15138: Implement tree explain for `PlaceholderRowExec` URL: https://github.com/apache/datafusion/issues/15138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] docs: update documentation for Final GroupBy in accumulator.rs [datafusion]

2025-03-17 Thread via GitHub
qazxcdswe123 opened a new pull request, #15279: URL: https://github.com/apache/datafusion/pull/15279 I think this is what the "passed to the" part is missing but I might be wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15268: URL: https://github.com/apache/datafusion/pull/15268#discussion_r1999148261 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -465,45 +465,103 @@ impl FileFormat for ParquetFormat { } } -/// Coerces the file schema if the ta

Re: [I] Add `ctx = SessionContext()` to __init__ [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer commented on issue #1065: URL: https://github.com/apache/datafusion-python/issues/1065#issuecomment-2730138953 Is this necessary? Now that we have the global context for reading files to create dataframes, this is potentially creating conflict of floating around two different def

Re: [PR] Reanimate Code Coverage [datafusion]

2025-03-17 Thread via GitHub
codecov-commenter commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2730139264 ## [Codecov](https://app.codecov.io/gh/apache/datafusion/pull/15256?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-17 Thread via GitHub
m09526 commented on code in PR #15268: URL: https://github.com/apache/datafusion/pull/15268#discussion_r1998724986 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -465,45 +465,103 @@ impl FileFormat for ParquetFormat { } } -/// Coerces the file schema if the t

Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]

2025-03-17 Thread via GitHub
goldmedal commented on code in PR #15212: URL: https://github.com/apache/datafusion/pull/15212#discussion_r1998987034 ## .github/workflows/rust.yml: ## @@ -246,6 +246,7 @@ jobs: mc cp -r /source/* localminio/data" - name: Run tests (excluding doctests)

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-17 Thread via GitHub
alan910127 commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r1997943659 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> { right: Expr, right_schema:

Re: [PR] Add debug logging for default catalog overwrite in SessionState build [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15251: URL: https://github.com/apache/datafusion/pull/15251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] chore: Update links for released version [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new pull request, #1540: URL: https://github.com/apache/datafusion-comet/pull/1540 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Improve feature flag CI coverage `datafusion` and `datafusion-functions` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15203: URL: https://github.com/apache/datafusion/pull/15203#issuecomment-2729763285 Thank you for the review @xudong963 -- much apprecaited -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] chore(deps): Update sqlparser to 0.55.0 [datafusion]

2025-03-17 Thread via GitHub
jonahgao commented on code in PR #15183: URL: https://github.com/apache/datafusion/pull/15183#discussion_r1998943593 ## datafusion/sql/src/expr/function.rs: ## @@ -217,13 +217,13 @@ impl SqlToRel<'_, S> { // it shouldn't have ordering requirement as function argument

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15039: URL: https://github.com/apache/datafusion/pull/15039#issuecomment-2730069735 This PR appears to have some build / CI failures that are preventing it from merging so marking it as drat @LuQQiu can you please resolve the issues so it can be merged? -- Th

Re: [PR] fix: Unconditonally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2730093478 @rkrishn7 will you have time to apply @Omega359 's suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer merged PR #1062: URL: https://github.com/apache/datafusion-python/pull/1062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] [TEST] enable recursive_protection to fix CI stack overflow [datafusion]

2025-03-17 Thread via GitHub
goldmedal closed pull request #15272: [TEST] enable recursive_protection to fix CI stack overflow URL: https://github.com/apache/datafusion/pull/15272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15239: URL: https://github.com/apache/datafusion/pull/15239#discussion_r1999161222 ## datafusion/common/src/dfschema.rs: ## @@ -563,29 +563,6 @@ impl DFSchema { .all(|(dffield, arrowfield)| dffield.name() == arrowfield.name()) }

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-17 Thread via GitHub
blaginin commented on PR #15255: URL: https://github.com/apache/datafusion/pull/15255#issuecomment-2729413005 > I tried allow_duplicates!(), but it gets a bit tricky with async functions Can you please explain more on this? I tried modifying _async_ `run_and_compare_query` and this wo

Re: [PR] chore: remove deprecated variants of UDF's invoke (invoke, invoke_no_args, invoke_batch) [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15123: URL: https://github.com/apache/datafusion/pull/15123#issuecomment-2729560442 🔨 let's go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove merged PR #1534: URL: https://github.com/apache/datafusion-comet/pull/1534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Publish official Docker images to Docker Hub under Apache account [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on issue #1510: URL: https://github.com/apache/datafusion-comet/issues/1510#issuecomment-2730485723 We can run `docker scout` locally. ``` docker scout cves apache/datafusion-comet:0.5.0-spark3.4.3-scala2.12-java11 ``` There are some CVEs in dependenci

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on code in PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#discussion_r1999378679 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -188,69 +185,62 @@ class CometSparkSessionExtensions scanE

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#issuecomment-2729501003 @mbutrovich @parthchandra This is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] External sort failing with modest memory limit when writing parquet files [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15028: URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2729506412 I started organizing tickets related to sorting larger than available RAM datasets here: - https://github.com/apache/datafusion/issues/15271 -- This is an automated message f

Re: [I] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
zebsme commented on issue #15277: URL: https://github.com/apache/datafusion/issues/15277#issuecomment-2730027331 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
zebsme opened a new issue, #15277: URL: https://github.com/apache/datafusion/issues/15277 ### Is your feature request related to a problem or challenge? - Part of #14914 ### Describe the solution you'd like Add tree format to the ExecutionPlan specified in the subject of

Re: [I] [EPIC] A collection of tickets for improving sorting larger-than-ram datasets / spilling sorts [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15271: URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2729488222 I also think that by collecting the related items we may be able to find some more review capacity (as I think this is an important capability for Spark / Comet. FYI @comphead /

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2729759078 Also, huge thanks to @xudong963 for running the release process - https://github.com/apache/datafusion/issues/14123 - https://github.com/apache/datafusion/issues/15151

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-17 Thread via GitHub
alamb commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r1998872033 ## content/blog/2025-03-11-ordering-analysis.md: ## @@ -291,6 +291,31 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_bin

Re: [I] [DISCUSS] Release DataFusion `46.0.1` Patch or `46.1.0` minor release (March 2025) [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15151: URL: https://github.com/apache/datafusion/issues/15151#issuecomment-2729754194 Thanks again @xudong963 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Apply additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer closed issue #1056: Apply additional ruff suggestions URL: https://github.com/apache/datafusion-python/issues/1056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer commented on PR #1062: URL: https://github.com/apache/datafusion-python/pull/1062#issuecomment-2729760767 Thank you for all the work on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2729731675 > erformance improvement: Binary operators native support for Utf8View Yes, 100% -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-17 Thread via GitHub
zhuqi-lucas commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2729876918 > > erformance improvement: Binary operators native support for Utf8View > > Yes, 100% Submitted the PR for review: https://github.com/apache/datafusion/pull/

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-17 Thread via GitHub
eliaperantoni commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2729888439 What do you think @alamb? I could take on this and make a PR if you approve the solution :) -- This is an automated message from the Apache Git Service. To respond to th

[PR] 1065/enhancement/add ctx to `__init__.py` [datafusion-python]

2025-03-17 Thread via GitHub
Spaarsh opened a new pull request, #1072: URL: https://github.com/apache/datafusion-python/pull/1072 # Which issue does this PR close? Closes #1065 # Rationale for this change To improve ergonomics of the API by providing a pre-initialized `SessionContext` inst

[PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-17 Thread via GitHub
timsaucer opened a new pull request, #15280: URL: https://github.com/apache/datafusion/pull/15280 ## Which issue does this PR close? None. ## Rationale for this change This PR expands the interfaces available via FFI to include Catalogs and Schemas (catalog schema, not a

[PR] Fix the well known `count-bug` similar cases [datafusion]

2025-03-17 Thread via GitHub
suibianwanwank opened a new pull request, #15281: URL: https://github.com/apache/datafusion/pull/15281 ## Which issue does this PR close? - Closes #15032. ## Rationale for this change ## What changes are included in this PR? ## Are these cha

Re: [PR] Fix the well known `count-bug` similar cases [datafusion]

2025-03-17 Thread via GitHub
suibianwanwank commented on code in PR #15281: URL: https://github.com/apache/datafusion/pull/15281#discussion_r1999272559 ## datafusion/optimizer/src/scalar_subquery_to_join.rs: ## @@ -476,19 +485,19 @@ mod tests { .build()?; let expected = "Projection:

[I] Weekly Plan (Andrew Lamb) March 10, 2025 [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new issue, #15274: URL: https://github.com/apache/datafusion/issues/15274 This is an attempt to organize myself and make what I plan to work on more visible ## Weekly High Level Goals - [ ] Make arrow release: https://github.com/apache/arrow-rs/issues/7107 - [ ] C

[PR] [TEST] enable recursive_protection to fix CI stack overflow [datafusion]

2025-03-17 Thread via GitHub
goldmedal opened a new pull request, #15272: URL: https://github.com/apache/datafusion/pull/15272 ## Which issue does this PR close? - A test for #15212 ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-17 Thread via GitHub
alan910127 commented on PR #15110: URL: https://github.com/apache/datafusion/pull/15110#issuecomment-2727773494 > Does this make sense @alan910127 ? So we'll keep both the coercion and cast unwrapping optimizations in this PR, is that correct? I'm unsure about "support other types," d

Re: [I] [Epic] A collection of FFI related tasks [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15283: URL: https://github.com/apache/datafusion/issues/15283#issuecomment-2730526009 FYI @timsaucer in case you have other items you want to add here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15280: URL: https://github.com/apache/datafusion/pull/15280#issuecomment-2730531904 Thank you @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r1999392980 ## datafusion/expr/src/expr.rs: ## @@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space( schema_name_from_exprs_inner(exp

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on code in PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#discussion_r1999438466 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -188,69 +185,62 @@ class CometSparkSessionExtensions scanE

Re: [PR] datafusion-cli: add streaming state struct [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15234: URL: https://github.com/apache/datafusion/pull/15234#issuecomment-2730540542 > I only had time to take a quick glance - but could this functionality be added to datafusion so it could be used by other apps that have CLIs built on datafusion? Seems like a

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-03-17 Thread via GitHub
duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2730673982 From this [PR](https://github.com/apache/datafusion/pull/6457), there are several types of query mentioned that need support 1. In Subquery contains limit/order by ``

[PR] chore: Enable Spark SQL tests for native_datafusion [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new pull request, #1543: URL: https://github.com/apache/datafusion-comet/pull/1543 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15278: URL: https://github.com/apache/datafusion/pull/15278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-17 Thread via GitHub
timsaucer commented on PR #15280: URL: https://github.com/apache/datafusion/pull/15280#issuecomment-2730732221 Oh, good point. Added in latest push. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Remove the need for registering an ObjectStore for remote files [datafusion-python]

2025-03-17 Thread via GitHub
kylebarron commented on issue #899: URL: https://github.com/apache/datafusion-python/issues/899#issuecomment-2730730310 I published `pyo3-object_store` 0.1, which works with pyo3 0.23, and `pyo3-object_store` 0.2, which works with pyo3 0.24. But both of these require `object_store` 0.12, s

[PR] build: Use unique name for surefire artifacts [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new pull request, #1544: URL: https://github.com/apache/datafusion-comet/pull/1544 ## Which issue does this PR close? N/A ## Rationale for this change I have recently seen some build failures due to: ``` Error: Failed to Create

Re: [I] Analysis to support`SortPreservingMerge` --> `ProgressiveEval` [datafusion]

2025-03-17 Thread via GitHub
suremarc commented on issue #15191: URL: https://github.com/apache/datafusion/issues/15191#issuecomment-2730732469 > I think it uses [FileGroupPartitioner](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.FileGroupPartitioner.html) that maintains the same orderin

[I] [Epic] A collection of FFI related tasks [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new issue, #15283: URL: https://github.com/apache/datafusion/issues/15283 ### Is your feature request related to a problem or challenge? We are adding FFI bindings to Datafusion (see https://crates.io/crates/datafusion-ffi) mostly for API stability (e.g. so python wrap

Re: [PR] chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [datafusion-comet]

2025-03-17 Thread via GitHub
kazuyukitanimura commented on PR #1534: URL: https://github.com/apache/datafusion-comet/pull/1534#issuecomment-2730537142 Thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15245: Migrate dataframe tests to `insta` URL: https://github.com/apache/datafusion/issues/15245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2730516080 Oh, and of course @timsaucer is cranking out FFI bindings like - https://github.com/apache/datafusion/pull/15280 -- This is an automated message from the Apache Git Service.

Re: [PR] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15262: URL: https://github.com/apache/datafusion/pull/15262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: Add changelog for 0.7.0 release [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove merged PR #1527: URL: https://github.com/apache/datafusion-comet/pull/1527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Enable parquet filter pushdown by default [datafusion]

2025-03-17 Thread via GitHub
adriangb commented on issue #3463: URL: https://github.com/apache/datafusion/issues/3463#issuecomment-2730550280 I don't think this needs to block but I'll point out that I have a PR up for a bug from the interaction between `SchemaAdapter` and parquet filter pushdown: https://github.com/ap

Re: [I] Migrate the following tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
blaginin commented on issue #15282: URL: https://github.com/apache/datafusion/issues/15282#issuecomment-2730556871 Thanks, added to the list 🌻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: Enable Spark SQL tests for native_iceberg_compat [datafusion-comet]

2025-03-17 Thread via GitHub
codecov-commenter commented on PR #1541: URL: https://github.com/apache/datafusion-comet/pull/1541#issuecomment-2730586578 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1541?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new issue, #1542: URL: https://github.com/apache/datafusion-comet/issues/1542 ### Describe the bug This issue is to track Spark SQL test failures in native_iceberg_compat mode. - Comet tries to read JSON files with Parquet reader ### Steps to reprod

Re: [PR] fix: remove code duplication in native_datafusion and native_iceberg_compat implementations [datafusion-comet]

2025-03-17 Thread via GitHub
mbutrovich commented on code in PR #1443: URL: https://github.com/apache/datafusion-comet/pull/1443#discussion_r1999472634 ## native/core/src/parquet/mod.rs: ## @@ -46,23 +47,22 @@ use self::util::jni::TypePromotionInfo; use crate::execution::operators::ExecutionError; use cra

Re: [I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on issue #1542: URL: https://github.com/apache/datafusion-comet/issues/1542#issuecomment-2730625465 @parthchandra @mbutrovich fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15263: URL: https://github.com/apache/datafusion/pull/15263#discussion_r1999523980 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -224,6 +224,64 @@ mod tests { ) } +#[tokio::test] +async fn test_pushdown_w

Re: [I] `native_datafusion` scan is only enabled when `spark.comet.exec.enabled` is set [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove closed issue #1536: `native_datafusion` scan is only enabled when `spark.comet.exec.enabled` is set URL: https://github.com/apache/datafusion-comet/issues/1536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] Minor: consistently apply `clippy::clone_on_ref_ptr` in all crates [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new pull request, #15284: URL: https://github.com/apache/datafusion/pull/15284 ## Which issue does this PR close? ## Rationale for this change - Found while reviewing https://github.com/apache/datafusion/pull/15263 from @adriangb Some of the newer Dat

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-17 Thread via GitHub
shruti2522 commented on PR #15255: URL: https://github.com/apache/datafusion/pull/15255#issuecomment-2730590045 > > I tried allow_duplicates!(), but it gets a bit tricky with async functions > > Can you please explain more on this? I tried modifying _async_ `run_and_compare_query` an

Re: [I] [Epic] Add snapshot tests (migrate to `insta` for tests) [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15178: URL: https://github.com/apache/datafusion/issues/15178#issuecomment-2730830528 > [@alamb](https://github.com/alamb) can I ask you to put "good first issue" on tickets in the list if you're happy with them? I don't think I have permission to do that Do

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2730836455 > Add specialized support in arrow-rs for variant binary types (specifically for the metadata columns) I think this will be a fun project for the right type of person. I

Re: [I] Add SQL examples to window functions: `nth_value`, etc [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #13399: URL: https://github.com/apache/datafusion/issues/13399#issuecomment-2730849740 > [@sageraven1](https://github.com/sageraven1) , Are you still working on this? I see that PR was marked as stale and got closed. If you aren't working on this, I would like to pi

Re: [PR] Minor: consistently apply `clippy::clone_on_ref_ptr` in all crates [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15284: URL: https://github.com/apache/datafusion/pull/15284#discussion_r1999606913 ## datafusion/catalog/src/session.rs: ## @@ -145,7 +145,7 @@ impl From<&dyn Session> for TaskContext { state.scalar_functions().clone(), st

Re: [PR] chore: remove deprecated variants of UDF's invoke (invoke, invoke_no_args, invoke_batch) [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15123: URL: https://github.com/apache/datafusion/pull/15123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Deprecate and eventually remove `ScalarUDF::invoke_batch` [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #14652: Deprecate and eventually remove `ScalarUDF::invoke_batch` URL: https://github.com/apache/datafusion/issues/14652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] chore: remove deprecated variants of UDF's invoke (invoke, invoke_no_args, invoke_batch) [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15123: URL: https://github.com/apache/datafusion/pull/15123#issuecomment-2729560622 Thanks again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
Spaarsh commented on PR #1062: URL: https://github.com/apache/datafusion-python/pull/1062#issuecomment-2729797059 @timsaucer there are still several rules to be enabled. But those require significant changes. Take a look at the rule `PLR0913`: ``` docs/source/conf.py:76:5: PLR0913 Too

Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]

2025-03-17 Thread via GitHub
chenkovsky commented on code in PR #15212: URL: https://github.com/apache/datafusion/pull/15212#discussion_r1999031362 ## datafusion/sql/src/unparser/expr.rs: ## @@ -94,6 +94,7 @@ impl Unparser<'_> { Ok(root_expr) } +#[cfg_attr(feature = "recursive_protection

Re: [PR] Improve feature flag CI coverage `datafusion` and `datafusion-functions` [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15203: URL: https://github.com/apache/datafusion/pull/15203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

  1   2   3   >