Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-03-17 Thread via GitHub
eliaperantoni commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2728625287 @onlyjackfrost > Not: I didn't see any error that could attach a diagnostic with. I get this in `datafusion-cli` ![Image](https://github.com/user-atta

Re: [I] Emit warning with attached `Diagnostic` when doing `= NULL` [datafusion]

2025-03-17 Thread via GitHub
eliaperantoni commented on issue #14434: URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2728578964 Thank you @changsun20 for the detailed analysis 🙏 1. Warning Scope for = NULL Patterns I would say that, if's not _too_ complicated, we should go for the

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2025-03-17 Thread via GitHub
logan-keede commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2728637022 > Hey [@logan-keede](https://github.com/logan-keede) please ping me in ASF slack, I'm not using discord now @comphead I pinged you on slack. -- This is an autom

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-03-17 Thread via GitHub
adriangb commented on PR #14975: URL: https://github.com/apache/datafusion/pull/14975#issuecomment-2729201199 This makes total sense to me, I’ve certainly wanted this feature! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Only unnest source for `EmptyRelation` [datafusion]

2025-03-17 Thread via GitHub
blaginin commented on code in PR #15159: URL: https://github.com/apache/datafusion/pull/15159#discussion_r1998600989 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## Review Comment: Had to remove this _new_ test because now (even in main) it produces `SELECT * FROM (SELECT

Re: [PR] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
blaginin commented on PR #15262: URL: https://github.com/apache/datafusion/pull/15262#issuecomment-2729279806 Thank you! FYI to fix fmt, you can `cargo fmt --all` and commit all files. I also find it useful to use this: https://github.com/apache/datafusion/blob/87eec43856a5d8cefef24

[PR] Implement tree explain for PlaceholderRowExec [datafusion]

2025-03-17 Thread via GitHub
zebsme opened a new pull request, #15270: URL: https://github.com/apache/datafusion/pull/15270 ## Which issue does this PR close? - Closes #15138 - Part of #14914 ## Rationale for this change ## What changes are included in this PR? - Implement Plac

Re: [D] Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? [datafusion]

2025-03-17 Thread via GitHub
GitHub user alamb added a comment to the discussion: Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? I also filed https://github.com/apache/datafusion/issues/15267 to track adding some better documentation here GitHub link: https://github.com/apache/dataf

[PR] Refactor file schema type coercions [datafusion]

2025-03-17 Thread via GitHub
xudong963 opened a new pull request, #15268: URL: https://github.com/apache/datafusion/pull/15268 ## Which issue does this PR close? - Closes #. ## Rationale for this change Reduced Schema Traversals: The refactored code traverses the schema fields only o

Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-17 Thread via GitHub
eliaperantoni commented on PR #15209: URL: https://github.com/apache/datafusion/pull/15209#issuecomment-2728582201 > @eliaperantoni could I raise another PR for the others unary expressions and keep this PR for the PLUS unary expression? Yes absolutely 😊 -- This is an automated mes

Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-17 Thread via GitHub
eliaperantoni commented on code in PR #15209: URL: https://github.com/apache/datafusion/pull/15209#discussion_r1998206440 ## datafusion/sql/tests/cases/diagnostic.rs: ## @@ -351,6 +351,36 @@ fn test_in_subquery_multiple_columns() -> Result<(), Box> .collect::>(),

Re: [I] Dynamic pruning filters from TopK state [datafusion]

2025-03-17 Thread via GitHub
adriangb commented on issue #15037: URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2729172917 Does anyone have a handle on how we might implement this? I was thinking we’d need to add a method to exec operators called `apply_filter` but that basically sends down the add

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-03-17 Thread via GitHub
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2729189144 I was out for a couple of weeks on vacation and had some time to think, and what I came up with is to build this layer and maybe parts of IO around some abstracted columnar forma

Re: [PR] Fix wildcard dataframe case [datafusion]

2025-03-17 Thread via GitHub
jayzhan211 commented on PR #15230: URL: https://github.com/apache/datafusion/pull/15230#issuecomment-2729174279 Thanks @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Add additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer commented on PR #1062: URL: https://github.com/apache/datafusion-python/pull/1062#issuecomment-2729223711 Since we know this PR only addresses a portion of the rules we're working on, it seems perfectly reasonable to put it back into the ignore list for now. -- This is an autom

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-03-17 Thread via GitHub
adriangb commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2729232834 This is exciting! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] chore: Upgrade `rand` crate and some other minor crates [datafusion]

2025-03-17 Thread via GitHub
mbrobbel commented on code in PR #14967: URL: https://github.com/apache/datafusion/pull/14967#discussion_r1998296079 ## datafusion/core/tests/parquet/filter_pushdown.rs: ## @@ -65,7 +65,12 @@ fn generate_file(tempdir: &TempDir, props: WriterProperties) -> TestParquetFile t

Re: [PR] chore: Upgrade `rand` crate and some other minor crates [datafusion]

2025-03-17 Thread via GitHub
mbrobbel commented on code in PR #14967: URL: https://github.com/apache/datafusion/pull/14967#discussion_r1998296079 ## datafusion/core/tests/parquet/filter_pushdown.rs: ## @@ -65,7 +65,12 @@ fn generate_file(tempdir: &TempDir, props: WriterProperties) -> TestParquetFile t

[I] Question about Statistics Collection(specifically NDV) [datafusion]

2025-03-17 Thread via GitHub
xudong963 opened a new issue, #15265: URL: https://github.com/apache/datafusion/issues/15265 ### Background I've been exploring the statistics collection in DataFusion, particularly for parquet, in the `datafusion/datasource-parquet/src/file_format.rs` file's `infer_stats` method. I noti

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2729452487 I am more convinced than ever that `variant` is the correct long term way to support this But it will take non trivial time as we need to implement it lower in the stack f

Re: [PR] chore(deps): Update sqlparser to 0.55.0 [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15183: URL: https://github.com/apache/datafusion/pull/15183#issuecomment-2729455047 Amazing! Thank you @PokIsemaine -- I'll try and review this sometime this week FYI @jonahgao -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-03-17 Thread via GitHub
adriangb commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2729472951 I've thought about it a bit and I think the way to get variant support into DataFusion is: - Add general support for shredding via per-file filter rewrites and projection ex

[I] [EPIC] A collection of tickets for improving sorting larger-than-ram datasets [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new issue, #15271: URL: https://github.com/apache/datafusion/issues/15271 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2729588907 Thanks @geoffreyclaude -- I will try and look at this. I think @goldmedal was also interested in this capability (using tracing via the `otel` crate -- I will try and find time

Re: [I] March 4, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15005: URL: https://github.com/apache/datafusion/issues/15005#issuecomment-2729594349 Next week: https://github.com/apache/datafusion/issues/15269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] March 4, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15005: March 4, 2025: This week(s) in DataFusion URL: https://github.com/apache/datafusion/issues/15005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Add support for `RAISE` statement [datafusion-sqlparser-rs]

2025-03-17 Thread via GitHub
alamb commented on code in PR #1766: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1766#discussion_r1998847620 ## src/ast/mod.rs: ## @@ -2256,6 +2256,57 @@ impl fmt::Display for ConditionalStatements { } } +/// A `RAISE` statement. +/// +/// Examples: +///

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-17 Thread via GitHub
goldmedal commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2729811174 I'll take a look this PR in a few day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
jsai28 commented on code in PR #15262: URL: https://github.com/apache/datafusion/pull/15262#discussion_r1998958709 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -75,34 +75,28 @@ async fn create_test_table() -> Result { } /// Executes an expression on the t

Re: [PR] docs: update documentation for Final GroupBy in accumulator.rs [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15279: URL: https://github.com/apache/datafusion/pull/15279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Weekly Plan (Andrew Lamb) March 10, 2025 [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15274: URL: https://github.com/apache/datafusion/issues/15274#issuecomment-2730373688 REview queue: - [ ] https://github.com/apache/datafusion/pull/15263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] chore: Re-enable GitHub discussions [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove merged PR #1535: URL: https://github.com/apache/datafusion-comet/pull/1535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] GitHub discussions have been disabled [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove closed issue #1533: GitHub discussions have been disabled URL: https://github.com/apache/datafusion-comet/issues/1533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] feat: Native support utf8view for binary operator [datafusion]

2025-03-17 Thread via GitHub
zhuqi-lucas opened a new pull request, #15275: URL: https://github.com/apache/datafusion/pull/15275 ## Which issue does this PR close? - Closes sub_task of [#15096](https://github.com/apache/datafusion/issues/15096) ## Rationale for this change feat: Native support utf8v

Re: [I] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15277: Implement tree explain for UnionExec URL: https://github.com/apache/datafusion/issues/15277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] fix: handle duplicate WindowFunction expressions in Substrait consumer [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15211: URL: https://github.com/apache/datafusion/pull/15211#discussion_r1999329931 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1074,6 +1074,9 @@ pub async fn from_project_rel( // leaving only explicit expressions.

Re: [PR] chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [datafusion-comet]

2025-03-17 Thread via GitHub
kazuyukitanimura commented on PR #1534: URL: https://github.com/apache/datafusion-comet/pull/1534#issuecomment-2730385010 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] fix: handle duplicate WindowFunction expressions in Substrait consumer [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15211: URL: https://github.com/apache/datafusion/pull/15211#discussion_r1999332505 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1074,6 +1074,9 @@ pub async fn from_project_rel( // leaving only explicit expressions.

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-03-17 Thread via GitHub
comphead commented on PR #14975: URL: https://github.com/apache/datafusion/pull/14975#issuecomment-2730388364 Sorry, it totally slipped my mind -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix: Unconditonally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-17 Thread via GitHub
rkrishn7 commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2730402547 Thanks @Omega359! @alamb Yes sorry for the delay, I can fix this up later today. To expand on my comment [here](https://github.com/apache/datafusion/pull/15242#issuecom

Re: [PR] Implement tree explain for PlaceholderRowExec [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15270: URL: https://github.com/apache/datafusion/pull/15270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Implement tree explain for `PlaceholderRowExec` [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15138: Implement tree explain for `PlaceholderRowExec` URL: https://github.com/apache/datafusion/issues/15138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] docs: update documentation for Final GroupBy in accumulator.rs [datafusion]

2025-03-17 Thread via GitHub
qazxcdswe123 opened a new pull request, #15279: URL: https://github.com/apache/datafusion/pull/15279 I think this is what the "passed to the" part is missing but I might be wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15268: URL: https://github.com/apache/datafusion/pull/15268#discussion_r1999148261 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -465,45 +465,103 @@ impl FileFormat for ParquetFormat { } } -/// Coerces the file schema if the ta

Re: [I] Add `ctx = SessionContext()` to __init__ [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer commented on issue #1065: URL: https://github.com/apache/datafusion-python/issues/1065#issuecomment-2730138953 Is this necessary? Now that we have the global context for reading files to create dataframes, this is potentially creating conflict of floating around two different def

Re: [PR] Reanimate Code Coverage [datafusion]

2025-03-17 Thread via GitHub
codecov-commenter commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2730139264 ## [Codecov](https://app.codecov.io/gh/apache/datafusion/pull/15256?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-17 Thread via GitHub
m09526 commented on code in PR #15268: URL: https://github.com/apache/datafusion/pull/15268#discussion_r1998724986 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -465,45 +465,103 @@ impl FileFormat for ParquetFormat { } } -/// Coerces the file schema if the t

Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]

2025-03-17 Thread via GitHub
goldmedal commented on code in PR #15212: URL: https://github.com/apache/datafusion/pull/15212#discussion_r1998987034 ## .github/workflows/rust.yml: ## @@ -246,6 +246,7 @@ jobs: mc cp -r /source/* localminio/data" - name: Run tests (excluding doctests)

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-17 Thread via GitHub
alan910127 commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r1997943659 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> { right: Expr, right_schema:

Re: [PR] Add debug logging for default catalog overwrite in SessionState build [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15251: URL: https://github.com/apache/datafusion/pull/15251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] chore: Update links for released version [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new pull request, #1540: URL: https://github.com/apache/datafusion-comet/pull/1540 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Improve feature flag CI coverage `datafusion` and `datafusion-functions` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15203: URL: https://github.com/apache/datafusion/pull/15203#issuecomment-2729763285 Thank you for the review @xudong963 -- much apprecaited -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] chore(deps): Update sqlparser to 0.55.0 [datafusion]

2025-03-17 Thread via GitHub
jonahgao commented on code in PR #15183: URL: https://github.com/apache/datafusion/pull/15183#discussion_r1998943593 ## datafusion/sql/src/expr/function.rs: ## @@ -217,13 +217,13 @@ impl SqlToRel<'_, S> { // it shouldn't have ordering requirement as function argument

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15039: URL: https://github.com/apache/datafusion/pull/15039#issuecomment-2730069735 This PR appears to have some build / CI failures that are preventing it from merging so marking it as drat @LuQQiu can you please resolve the issues so it can be merged? -- Th

Re: [PR] fix: Unconditonally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2730093478 @rkrishn7 will you have time to apply @Omega359 's suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer merged PR #1062: URL: https://github.com/apache/datafusion-python/pull/1062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] [TEST] enable recursive_protection to fix CI stack overflow [datafusion]

2025-03-17 Thread via GitHub
goldmedal closed pull request #15272: [TEST] enable recursive_protection to fix CI stack overflow URL: https://github.com/apache/datafusion/pull/15272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15239: URL: https://github.com/apache/datafusion/pull/15239#discussion_r1999161222 ## datafusion/common/src/dfschema.rs: ## @@ -563,29 +563,6 @@ impl DFSchema { .all(|(dffield, arrowfield)| dffield.name() == arrowfield.name()) }

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-17 Thread via GitHub
blaginin commented on PR #15255: URL: https://github.com/apache/datafusion/pull/15255#issuecomment-2729413005 > I tried allow_duplicates!(), but it gets a bit tricky with async functions Can you please explain more on this? I tried modifying _async_ `run_and_compare_query` and this wo

Re: [PR] chore: remove deprecated variants of UDF's invoke (invoke, invoke_no_args, invoke_batch) [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15123: URL: https://github.com/apache/datafusion/pull/15123#issuecomment-2729560442 🔨 let's go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove merged PR #1534: URL: https://github.com/apache/datafusion-comet/pull/1534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Publish official Docker images to Docker Hub under Apache account [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on issue #1510: URL: https://github.com/apache/datafusion-comet/issues/1510#issuecomment-2730485723 We can run `docker scout` locally. ``` docker scout cves apache/datafusion-comet:0.5.0-spark3.4.3-scala2.12-java11 ``` There are some CVEs in dependenci

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on code in PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#discussion_r1999378679 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -188,69 +185,62 @@ class CometSparkSessionExtensions scanE

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#issuecomment-2729501003 @mbutrovich @parthchandra This is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] External sort failing with modest memory limit when writing parquet files [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15028: URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2729506412 I started organizing tickets related to sorting larger than available RAM datasets here: - https://github.com/apache/datafusion/issues/15271 -- This is an automated message f

Re: [I] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
zebsme commented on issue #15277: URL: https://github.com/apache/datafusion/issues/15277#issuecomment-2730027331 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
zebsme opened a new issue, #15277: URL: https://github.com/apache/datafusion/issues/15277 ### Is your feature request related to a problem or challenge? - Part of #14914 ### Describe the solution you'd like Add tree format to the ExecutionPlan specified in the subject of

Re: [I] [EPIC] A collection of tickets for improving sorting larger-than-ram datasets / spilling sorts [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15271: URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2729488222 I also think that by collecting the related items we may be able to find some more review capacity (as I think this is an important capability for Spark / Comet. FYI @comphead /

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2729759078 Also, huge thanks to @xudong963 for running the release process - https://github.com/apache/datafusion/issues/14123 - https://github.com/apache/datafusion/issues/15151

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-17 Thread via GitHub
alamb commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r1998872033 ## content/blog/2025-03-11-ordering-analysis.md: ## @@ -291,6 +291,31 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_bin

Re: [I] [DISCUSS] Release DataFusion `46.0.1` Patch or `46.1.0` minor release (March 2025) [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15151: URL: https://github.com/apache/datafusion/issues/15151#issuecomment-2729754194 Thanks again @xudong963 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Apply additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer closed issue #1056: Apply additional ruff suggestions URL: https://github.com/apache/datafusion-python/issues/1056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add additional ruff suggestions [datafusion-python]

2025-03-17 Thread via GitHub
timsaucer commented on PR #1062: URL: https://github.com/apache/datafusion-python/pull/1062#issuecomment-2729760767 Thank you for all the work on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2729731675 > erformance improvement: Binary operators native support for Utf8View Yes, 100% -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-17 Thread via GitHub
zhuqi-lucas commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2729876918 > > erformance improvement: Binary operators native support for Utf8View > > Yes, 100% Submitted the PR for review: https://github.com/apache/datafusion/pull/

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-17 Thread via GitHub
eliaperantoni commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2729888439 What do you think @alamb? I could take on this and make a PR if you approve the solution :) -- This is an automated message from the Apache Git Service. To respond to th

[PR] 1065/enhancement/add ctx to `__init__.py` [datafusion-python]

2025-03-17 Thread via GitHub
Spaarsh opened a new pull request, #1072: URL: https://github.com/apache/datafusion-python/pull/1072 # Which issue does this PR close? Closes #1065 # Rationale for this change To improve ergonomics of the API by providing a pre-initialized `SessionContext` inst

[PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-17 Thread via GitHub
timsaucer opened a new pull request, #15280: URL: https://github.com/apache/datafusion/pull/15280 ## Which issue does this PR close? None. ## Rationale for this change This PR expands the interfaces available via FFI to include Catalogs and Schemas (catalog schema, not a

[PR] Fix the well known `count-bug` similar cases [datafusion]

2025-03-17 Thread via GitHub
suibianwanwank opened a new pull request, #15281: URL: https://github.com/apache/datafusion/pull/15281 ## Which issue does this PR close? - Closes #15032. ## Rationale for this change ## What changes are included in this PR? ## Are these cha

Re: [PR] Fix the well known `count-bug` similar cases [datafusion]

2025-03-17 Thread via GitHub
suibianwanwank commented on code in PR #15281: URL: https://github.com/apache/datafusion/pull/15281#discussion_r1999272559 ## datafusion/optimizer/src/scalar_subquery_to_join.rs: ## @@ -476,19 +485,19 @@ mod tests { .build()?; let expected = "Projection:

[I] Weekly Plan (Andrew Lamb) March 10, 2025 [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new issue, #15274: URL: https://github.com/apache/datafusion/issues/15274 This is an attempt to organize myself and make what I plan to work on more visible ## Weekly High Level Goals - [ ] Make arrow release: https://github.com/apache/arrow-rs/issues/7107 - [ ] C

[PR] [TEST] enable recursive_protection to fix CI stack overflow [datafusion]

2025-03-17 Thread via GitHub
goldmedal opened a new pull request, #15272: URL: https://github.com/apache/datafusion/pull/15272 ## Which issue does this PR close? - A test for #15212 ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-17 Thread via GitHub
alan910127 commented on PR #15110: URL: https://github.com/apache/datafusion/pull/15110#issuecomment-2727773494 > Does this make sense @alan910127 ? So we'll keep both the coercion and cast unwrapping optimizations in this PR, is that correct? I'm unsure about "support other types," d

Re: [I] [Epic] A collection of FFI related tasks [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15283: URL: https://github.com/apache/datafusion/issues/15283#issuecomment-2730526009 FYI @timsaucer in case you have other items you want to add here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15280: URL: https://github.com/apache/datafusion/pull/15280#issuecomment-2730531904 Thank you @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-17 Thread via GitHub
alamb commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r1999392980 ## datafusion/expr/src/expr.rs: ## @@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space( schema_name_from_exprs_inner(exp

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove commented on code in PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#discussion_r1999438466 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -188,69 +185,62 @@ class CometSparkSessionExtensions scanE

Re: [PR] datafusion-cli: add streaming state struct [datafusion]

2025-03-17 Thread via GitHub
alamb commented on PR #15234: URL: https://github.com/apache/datafusion/pull/15234#issuecomment-2730540542 > I only had time to take a quick glance - but could this functionality be added to datafusion so it could be used by other apps that have CLIs built on datafusion? Seems like a

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-03-17 Thread via GitHub
duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2730673982 From this [PR](https://github.com/apache/datafusion/pull/6457), there are several types of query mentioned that need support 1. In Subquery contains limit/order by ``

[PR] chore: Enable Spark SQL tests for native_datafusion [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new pull request, #1543: URL: https://github.com/apache/datafusion-comet/pull/1543 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Implement tree explain for UnionExec [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15278: URL: https://github.com/apache/datafusion/pull/15278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-17 Thread via GitHub
timsaucer commented on PR #15280: URL: https://github.com/apache/datafusion/pull/15280#issuecomment-2730732221 Oh, good point. Added in latest push. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Remove the need for registering an ObjectStore for remote files [datafusion-python]

2025-03-17 Thread via GitHub
kylebarron commented on issue #899: URL: https://github.com/apache/datafusion-python/issues/899#issuecomment-2730730310 I published `pyo3-object_store` 0.1, which works with pyo3 0.23, and `pyo3-object_store` 0.2, which works with pyo3 0.24. But both of these require `object_store` 0.12, s

[PR] build: Use unique name for surefire artifacts [datafusion-comet]

2025-03-17 Thread via GitHub
andygrove opened a new pull request, #1544: URL: https://github.com/apache/datafusion-comet/pull/1544 ## Which issue does this PR close? N/A ## Rationale for this change I have recently seen some build failures due to: ``` Error: Failed to Create

Re: [I] Analysis to support`SortPreservingMerge` --> `ProgressiveEval` [datafusion]

2025-03-17 Thread via GitHub
suremarc commented on issue #15191: URL: https://github.com/apache/datafusion/issues/15191#issuecomment-2730732469 > I think it uses [FileGroupPartitioner](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.FileGroupPartitioner.html) that maintains the same orderin

[I] [Epic] A collection of FFI related tasks [datafusion]

2025-03-17 Thread via GitHub
alamb opened a new issue, #15283: URL: https://github.com/apache/datafusion/issues/15283 ### Is your feature request related to a problem or challenge? We are adding FFI bindings to Datafusion (see https://crates.io/crates/datafusion-ffi) mostly for API stability (e.g. so python wrap

Re: [PR] chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [datafusion-comet]

2025-03-17 Thread via GitHub
kazuyukitanimura commented on PR #1534: URL: https://github.com/apache/datafusion-comet/pull/1534#issuecomment-2730537142 Thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
alamb closed issue #15245: Migrate dataframe tests to `insta` URL: https://github.com/apache/datafusion/issues/15245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-17 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2730516080 Oh, and of course @timsaucer is cranking out FFI bindings like - https://github.com/apache/datafusion/pull/15280 -- This is an automated message from the Apache Git Service.

Re: [PR] Migrate dataframe tests to `insta` [datafusion]

2025-03-17 Thread via GitHub
alamb merged PR #15262: URL: https://github.com/apache/datafusion/pull/15262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

<    1   2   3   >