Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
drexler-sky commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-2998998589 @andygrove @parthchandra @comphead Could you please take another look? The CI failure doesn't seem to be related to this PR. -- This is an automated message from the Apach

Re: [PR] fix: parse snowflake fetch clause [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
iffyio merged PR #1894: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Use `IndexColumn` in all index definitions [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
iffyio merged PR #1900: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2162979272 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -433,6 +433,117 @@ async fn test_topk_dynamic_filter_pushdown() { ); } +#[tokio

Re: [I] Add support for clickbench data and benchmark with page index [datafusion]

2025-06-23 Thread via GitHub
zhuqi-lucas commented on issue #16427: URL: https://github.com/apache/datafusion/issues/16427#issuecomment-2998767526 Thank you @adriangb for this good point, i agree with you, and why i create this jira because we also can use it to mock more custom data based current clickbench.

[PR] Add PhysicalExpr optimizer and cast unwrapping [datafusion]

2025-06-23 Thread via GitHub
adriangb opened a new pull request, #16530: URL: https://github.com/apache/datafusion/pull/16530 Closes #16004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-23 Thread via GitHub
adamreeve commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2998703950 I've been experimenting with how this work could be extended to support more ways of configuring encryption beyond having fixed and known AES keys for all files. For example, data

[PR] Fix array_has to return false for empty arrays instead of null [datafusion]

2025-06-23 Thread via GitHub
kosiew opened a new pull request, #16529: URL: https://github.com/apache/datafusion/pull/16529 ## Which issue does this PR close? - Closes #16474 ## Rationale for this change The `array_has` function incorrectly returns `null` for empty arrays, which should logically ret

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-2998670275 I opened https://github.com/apache/datafusion/issues/16528 to track further ideas / steps -- This is an automated message from the Apache Git Service. To respond to the message, p

[PR] Add compression_level support to ParquetWriterOptions and enhance write_parquet to accept full options object [datafusion-python]

2025-06-23 Thread via GitHub
kosiew opened a new pull request, #1169: URL: https://github.com/apache/datafusion-python/pull/1169 ## Which issue does this PR close? - Closes #1162 ## Rationale for this change This change enhances the flexibility of the Parquet writing process by allowing users to spe

Re: [PR] wip: proto to physical plan conversion [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] closed pull request #14530: wip: proto to physical plan conversion URL: https://github.com/apache/datafusion/pull/14530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] updatted github action by change version tag to sha hashes [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] closed pull request #15315: updatted github action by change version tag to sha hashes URL: https://github.com/apache/datafusion/pull/15315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] refactor!: consistent null handling in coercible signatures [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] closed pull request #15404: refactor!: consistent null handling in coercible signatures URL: https://github.com/apache/datafusion/pull/15404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] fix!: incorrect coercion when comparing with string literals [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] closed pull request #15482: fix!: incorrect coercion when comparing with string literals URL: https://github.com/apache/datafusion/pull/15482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add Cloud-Native Performance Monitoring System with GitHub Integration [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #15624: URL: https://github.com/apache/datafusion/pull/15624#issuecomment-2998537216 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] PostgreSQL dialect's `WITHIN GROUP` clause gets ignored [datafusion]

2025-06-23 Thread via GitHub
chenkovsky commented on issue #16515: URL: https://github.com/apache/datafusion/issues/16515#issuecomment-2998502848 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-06-23 Thread via GitHub
goldmedal commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2998502945 Thanks @alamb @berkaysynnada @kylebarron @ozankabak @Omega359 @paleolimbot for reviewing and suggestions 🚀 -- This is an automated message from the Apache Git Service. To respon

Re: [PR] chore: fix CI failures on `ddl.md` [datafusion]

2025-06-23 Thread via GitHub
comphead commented on PR #16526: URL: https://github.com/apache/datafusion/pull/16526#issuecomment-2998481347 Thanks @xudong963 for the quick review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] chore: fix CI failures on `ddl.md` [datafusion]

2025-06-23 Thread via GitHub
comphead merged PR #16526: URL: https://github.com/apache/datafusion/pull/16526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
comphead merged PR #16509: URL: https://github.com/apache/datafusion/pull/16509 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Perf: Optimize CursorValues compare performance for StringViewArray [datafusion]

2025-06-23 Thread via GitHub
comphead closed issue #16508: Perf: Optimize CursorValues compare performance for StringViewArray URL: https://github.com/apache/datafusion/issues/16508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add more doc for physical filter pushdown [datafusion]

2025-06-23 Thread via GitHub
xudong963 merged PR #16504: URL: https://github.com/apache/datafusion/pull/16504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add more doc for physical filter pushdown [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on PR #16504: URL: https://github.com/apache/datafusion/pull/16504#issuecomment-2998476200 Thank you all, let's go! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[PR] chore: fix CI failures on `ddl.md` [datafusion]

2025-06-23 Thread via GitHub
comphead opened a new pull request, #16526: URL: https://github.com/apache/datafusion/pull/16526 ## Which issue does this PR close? CI was broken in #16524 this PR to fix `ddl.md` formatting - Closes #. ## Rationale for this change ## What changes are i

Re: [PR] feat: Finalize support for `RightMark` join + `Mark` join swap [datafusion]

2025-06-23 Thread via GitHub
jonathanc-n commented on PR #16488: URL: https://github.com/apache/datafusion/pull/16488#issuecomment-2998466640 If you have the time, are you able to take a look? Should be a straightforward review, thanks! @comphead @Dandandan -- This is an automated message from the Apache Git Service

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-06-23 Thread via GitHub
kazuyukitanimura merged PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-23 Thread via GitHub
kylebarron commented on PR #1167: URL: https://github.com/apache/datafusion-python/pull/1167#issuecomment-2998437499 As [mentioned in a comment on SO](https://stackoverflow.com/questions/15411967/how-can-i-check-if-code-is-executed-in-the-ipython-notebook/24937408#comment81917993_39662359),

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
drexler-sky commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162749959 ## spark/src/main/scala/org/apache/comet/serde/arrays.scala: ## @@ -171,9 +184,9 @@ object CometArrayMax extends CometExpressionSerde { binding: Boo

Re: [PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-23 Thread via GitHub
timsaucer commented on PR #1167: URL: https://github.com/apache/datafusion-python/pull/1167#issuecomment-2998390832 > I don't think this is a reasonable workaround because there are many Jupyter-protocol frontends that do not support displaying HTML output. This means that repr would be br

Re: [PR] `TableProvider` to skip files in the folder which non relevant to selected reader [datafusion]

2025-06-23 Thread via GitHub
comphead commented on code in PR #16487: URL: https://github.com/apache/datafusion/pull/16487#discussion_r2162736134 ## datafusion/core/src/datasource/listing_table_factory.rs: ## @@ -125,6 +125,13 @@ impl TableProviderFactory for ListingTableFactory { // specifical

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
comphead commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162734270 ## spark/src/main/scala/org/apache/comet/serde/arrays.scala: ## @@ -171,9 +184,9 @@ object CometArrayMax extends CometExpressionSerde { binding: Boolea

Re: [PR] chore: move udf registration to better place [datafusion-comet]

2025-06-23 Thread via GitHub
comphead merged PR #1899: URL: https://github.com/apache/datafusion-comet/pull/1899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-23 Thread via GitHub
codecov-commenter commented on PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925#issuecomment-2998365766 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1925?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Implement ToPrettyString [datafusion-comet]

2025-06-23 Thread via GitHub
comphead commented on code in PR #1921: URL: https://github.com/apache/datafusion-comet/pull/1921#discussion_r2162731604 ## native/core/src/execution/planner.rs: ## @@ -746,6 +746,22 @@ impl PhysicalPlanner { let child = self.create_expr(expr.child.as_ref().unwr

Re: [I] Make `datafusion` read parquet folders if non parquet files exists [datafusion]

2025-06-23 Thread via GitHub
comphead commented on issue #16460: URL: https://github.com/apache/datafusion/issues/16460#issuecomment-2998350932 @hendrikmakait sorry I took the liberty to wrap my PR up, please feel free to review -- This is an automated message from the Apache Git Service. To respond to the message,

[I] Add support for Spark SQL `explode` expression [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove opened a new issue, #1927: URL: https://github.com/apache/datafusion-comet/issues/1927 ### What is the problem the feature request solves? Add support for `explode`: https://spark.apache.org/docs/latest/api/sql/index.html#explode > explode(expr) - Separates the

[I] Add support for `size` expression [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove opened a new issue, #1926: URL: https://github.com/apache/datafusion-comet/issues/1926 ### What is the problem the feature request solves? Add support for Spark SQL `size` expression: https://spark.apache.org/docs/latest/api/sql/index.html#size From the document

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-23 Thread via GitHub
parthchandra commented on PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925#issuecomment-2998299473 @Kontinuation please review if you can. (This PR is draft because I haven't been able to test it with S3 yet. The unit test passes, though). -- This is an automated me

[PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-23 Thread via GitHub
parthchandra opened a new pull request, #1925: URL: https://github.com/apache/datafusion-comet/pull/1925 #1817 introduced S3A configuration for the `native_datafusion` reader. This PR does the same for `native_iceberg_compat` ## How are these changes tested? Existing unit t

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
drexler-sky commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162673921 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -232,24 +232,42 @@ class CometArrayExpressionSuite extends CometTestBase w

Re: [PR] Perform type coercion for corr aggregate function during physical planning [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #15776: URL: https://github.com/apache/datafusion/pull/15776#issuecomment-2997818951 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

[PR] Fix signature of `__arrow_c_stream__` [datafusion-python]

2025-06-23 Thread via GitHub
kylebarron opened a new pull request, #1168: URL: https://github.com/apache/datafusion-python/pull/1168 # Which issue does this PR close? Closes https://github.com/apache/datafusion-python/issues/1166. # Rationale for this change # What changes are included in this PR?

Re: [I] [Async UDF] Add high level context / examples for user defined async functions [datafusion]

2025-06-23 Thread via GitHub
alamb closed issue #16521: [Async UDF] Add high level context / examples for user defined async functions URL: https://github.com/apache/datafusion/issues/16521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[I] [Async UDF] Add high level context / examples for user defined async functions [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new issue, #16521: URL: https://github.com/apache/datafusion/issues/16521 It would be nice to add some high level context to this example -- like an introduction saying that most functions are sync, but for some functions can be run as async ... I can help with this po

Re: [PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-23 Thread via GitHub
kylebarron commented on PR #1167: URL: https://github.com/apache/datafusion-python/pull/1167#issuecomment-2998107917 > By design in a Jupyter notebook `display()` calls both `__repr__` and `_repr_html_`. Ref https://discourse.jupyter.org/t/find-out-if-my-code-runs-inside-a-notebook-

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16523: URL: https://github.com/apache/datafusion/pull/16523#discussion_r2162390224 ## datafusion-examples/examples/async_udf.rs: ## @@ -243,9 +252,20 @@ impl AsyncScalarUDFImpl for AsyncEqual { Ok(DataType::Boolean) } +fn invoke

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
parthchandra commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162015639 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -232,24 +232,42 @@ class CometArrayExpressionSuite extends CometTestBase

Re: [I] Async User Defined Functions (UDF) [datafusion]

2025-06-23 Thread via GitHub
alamb closed issue #6518: Async User Defined Functions (UDF) URL: https://github.com/apache/datafusion/issues/6518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Add support for Arrow Duration type in Substrait [datafusion]

2025-06-23 Thread via GitHub
gabotechs commented on PR #16503: URL: https://github.com/apache/datafusion/pull/16503#issuecomment-2998046913 Sure! I'll get it done in a couple of days. Thanks for submitting this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
Dandandan commented on PR #16509: URL: https://github.com/apache/datafusion/pull/16509#issuecomment-2998041649 > 🤖: Benchmark completed > > Details had a look at the "regressions", I think should not be impacted by this change (thus noise). -- This is an automated message f

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-23 Thread via GitHub
GitHub user adriangb added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA I think I'd be able to attend and would love to present on dynamic filter pushdown work GitHub link: https://github.com/apache/datafusion/discussions/16265#discussioncomment-13554120 -

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2998026890 > FYI, I added more doc for the related code based on my understanding: [#16504](https://github.com/apache/datafusion/pull/16504) That writeup is 🧑‍🍳 👌 really nice -- Th

Re: [PR] Add support for Arrow Duration type in Substrait [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16503: URL: https://github.com/apache/datafusion/pull/16503#issuecomment-2997820357 Thank you @jkosh44 🙏 @gabotechs is there any chance you have time to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16497: URL: https://github.com/apache/datafusion/pull/16497#discussion_r2162564838 ## datafusion/physical-plan/src/unnest.rs: ## @@ -299,7 +296,7 @@ impl UnnestStream { continue; };

Re: [PR] Split clickbench query set into one file per query [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16476: URL: https://github.com/apache/datafusion/pull/16476#issuecomment-2998005328 Thanks again @pepijnve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Split clickbench query set into one file per query [datafusion]

2025-06-23 Thread via GitHub
alamb merged PR #16476: URL: https://github.com/apache/datafusion/pull/16476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2162482061 ## datafusion/optimizer/src/simplify_expressions/simplify_predicates.rs: ## @@ -0,0 +1,248 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or m

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16497: URL: https://github.com/apache/datafusion/pull/16497#issuecomment-2997997694 > @alamb, sorry, my bad! I didn't notice that I didn't have the pre-commit hooked up and the failing `cargo fmt` that caused; CI is green now. No worries -- thank you @hendrikmak

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-23 Thread via GitHub
hendrikmakait commented on PR #16497: URL: https://github.com/apache/datafusion/pull/16497#issuecomment-2997890026 @alamb, sorry, my bad! I didn't notice that I didn't have the pre-commit hooked up and the failing `cargo fmt` that caused; CI is green now. -- This is an automated message f

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16461: URL: https://github.com/apache/datafusion/pull/16461#discussion_r2162492604 ## datafusion/physical-expr/src/schema_rewriter.rs: ## @@ -0,0 +1,318 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
adriangb commented on code in PR #16461: URL: https://github.com/apache/datafusion/pull/16461#discussion_r2162548231 ## datafusion/physical-expr/src/schema_rewriter.rs: ## @@ -0,0 +1,318 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove merged PR #1694: URL: https://github.com/apache/datafusion-comet/pull/1694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] WASM UDFs [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #9326: URL: https://github.com/apache/datafusion/issues/9326#issuecomment-2997961684 Here is another blog on the subject: - https://www.splitgraph.com/blog/seafowl-wasm-udfs -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-06-23 Thread via GitHub
Omega359 commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-2997941934 I noticed that in the API but I haven't yet had a chance to see how it's implemented. It would be awesome if it's compatible/workable with non-async functions too -- This is an a

Re: [PR] doc: Document DESCRIBE comman in ddl.md [datafusion]

2025-06-23 Thread via GitHub
comphead merged PR #16524: URL: https://github.com/apache/datafusion/pull/16524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] doc: Document `DESCRIBE` command in `ddl.md` [datafusion]

2025-06-23 Thread via GitHub
comphead closed issue #16518: doc: Document `DESCRIBE` command in `ddl.md` URL: https://github.com/apache/datafusion/issues/16518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] doc: Document DESCRIBE comman in ddl.md [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16524: URL: https://github.com/apache/datafusion/pull/16524#issuecomment-2997837151 I took the liberty of pushing a commit to fixup the markdown here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Support `DESC ` statement [datafusion]

2025-06-23 Thread via GitHub
comphead closed issue #16311: Support `DESC ` statement URL: https://github.com/apache/datafusion/issues/16311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] Add DESC alias for DESCRIBE command. [datafusion]

2025-06-23 Thread via GitHub
comphead merged PR #16514: URL: https://github.com/apache/datafusion/pull/16514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-2997873713 I would personally recommend proceeding in parallel with the two approaches, ensuring there are good end to end tests (.slt) -- and then if we find that the projection pushdown / rewri

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-2997870791 > But we’d still need an array-level counterpart to actually materialize those null nested fields in the RecordBatch when we call map_batch. FWIW I think this is one mechanism to

Re: [PR] Minor: Add more links to cooperative / scheduling docs [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16484: URL: https://github.com/apache/datafusion/pull/16484#issuecomment-2997844377 Thanks @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Use `IndexColumn` in all index definitions [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
mvzink commented on PR #1900: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1900#issuecomment-2997848751 Having started using this, I noticed that it's actually fairly inconvenient to have the `TableConstraint::ForeignKey` variant have a different type for variants (e.g. if y

Re: [PR] Minor: Add more links to cooperative / scheduling docs [datafusion]

2025-06-23 Thread via GitHub
alamb merged PR #16484: URL: https://github.com/apache/datafusion/pull/16484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-2997826090 Interestingly I noticed that @goldmedal added `ConfigOptions` to the `async UDF` API in https://github.com/apache/datafusion/pull/14837 Maybe we can do something similar with no

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16497: URL: https://github.com/apache/datafusion/pull/16497#issuecomment-2997814919 Marking as draft as the CI is failing and I am trying to make it easier to find the PRs that need active reviews. Thank you @hendrikmakait -- This is an automated message from the A

[I] [Blog] Async Scalar User Defined Functions [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new issue, #16525: URL: https://github.com/apache/datafusion/issues/16525 ### Is your feature request related to a problem or challenge? @goldmedal added support for Async user defined functions in - https://github.com/apache/datafusion/pull/14837 As @comp

Re: [I] [datafusion-spark] Implement `factorial` function [datafusion]

2025-06-23 Thread via GitHub
alamb closed issue #16124: [datafusion-spark] Implement `factorial` function URL: https://github.com/apache/datafusion/issues/16124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-23 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-2997787662 > I was not planning on changing it substantially anymore. I was thinking of maybe rereading the text with a fresh pair of eyes and editing a sentence here or there, but that's it. Need

Re: [I] Docker build kube/Dockerfile failed with ### COMPILER BUG DETECTED ### [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove closed issue #1917: Docker build kube/Dockerfile failed with ### COMPILER BUG DETECTED ### URL: https://github.com/apache/datafusion-comet/issues/1917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [datafusion-spark] Implement `factorical` function [datafusion]

2025-06-23 Thread via GitHub
alamb merged PR #16125: URL: https://github.com/apache/datafusion/pull/16125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [datafusion-spark] Implement `factorical` function [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16125: URL: https://github.com/apache/datafusion/pull/16125#issuecomment-2997749229 gogogogogogo THanks again @shehabgamin and @tlm365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Update `AsyncScalarUDFImpl` API to match `ScalarUDFImpl `API [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new issue, #16522: URL: https://github.com/apache/datafusion/issues/16522 ### Is your feature request related to a problem or challenge? * https://github.com/apache/datafusion/pull/14837 introduces `AsyncScalarUDFImpl` to run async functions 🥳 🦜 🚀 However, the

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16523: URL: https://github.com/apache/datafusion/pull/16523#issuecomment-2997741800 I feel like there may be some more duplication we can remove as part of the PhysicalExpr layer too -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16523: URL: https://github.com/apache/datafusion/pull/16523#discussion_r2162390807 ## datafusion/expr/src/async_udf.rs: ## @@ -35,34 +35,7 @@ use std::sync::Arc; /// /// The name is chosen to mirror ScalarUDFImpl #[async_trait] -pub trait AsyncS

[PR] doc: Document DESCRIBE comman in ddl.md [datafusion]

2025-06-23 Thread via GitHub
krikera opened a new pull request, #16524: URL: https://github.com/apache/datafusion/pull/16524 Add documentation for DESCRIBE and DESC commands with syntax, examples, and output format explanation. Fixes #16518 ## Which issue does this PR close? - Closes #16518.

Re: [I] [EPIC] More Async User Defined Function work [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #16520: URL: https://github.com/apache/datafusion/issues/16520#issuecomment-2997682321 @goldmedal is there any other todo items you can think of for async udfs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new pull request, #16523: URL: https://github.com/apache/datafusion/pull/16523 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16522 ## Rationale for this change Following @berkaysynnada 's suggestion in http

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-06-23 Thread via GitHub
alamb merged PR #14837: URL: https://github.com/apache/datafusion/pull/14837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-23 Thread via GitHub
huaxingao commented on PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#issuecomment-2997669574 I somehow got some strange errors: ``` [info] ParquetV1QuerySuite: [info] - simple select queries (635 milliseconds) [info] - appending (254 milliseconds) [info]

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16509: URL: https://github.com/apache/datafusion/pull/16509#issuecomment-2997163295 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubun

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #14837: URL: https://github.com/apache/datafusion/pull/14837#discussion_r2162340524 ## datafusion/core/src/physical_planner.rs: ## @@ -775,12 +776,44 @@ impl DefaultPhysicalPlanner { let runtime_expr = self.cr

Re: [I] [Async UDF] Add high level context / examples for user defined async functions [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #16521: URL: https://github.com/apache/datafusion/issues/16521#issuecomment-2997633524 Actually I see this is now done as part of https://github.com/apache/datafusion/pull/14837 -- This is an automated message from the Apache Git Service. To respond to the message

[I] [EPIC] More Async User Defined Function work [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new issue, #16520: URL: https://github.com/apache/datafusion/issues/16520 In https://github.com/apache/datafusion/issues/6518 and https://github.com/apache/datafusion/pull/14837, @goldmedal introduced async user defined functions. This ticket captures follow on work to impro

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2997613389 Thanks @goldmedal -- I will file some follow on tickets and then merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on PR #1694: URL: https://github.com/apache/datafusion-comet/pull/1694#issuecomment-2997605256 Thanks for the reviews @parthchandra and @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] [Blog] Proposal: Add categorical-tags to blogs for better navigation [datafusion-site]

2025-06-23 Thread via GitHub
JigaoLuo opened a new issue, #77: URL: https://github.com/apache/datafusion-site/issues/77 Hi datafusion team, First, thank you for consistently publishing [high-quality blogs](https://datafusion.apache.org/blog/)! I appreciate the effort behind them. Feedback & Suggestions:

Re: [I] Add content tags to the blogs [datafusion-site]

2025-06-23 Thread via GitHub
alamb closed issue #76: Add content tags to the blogs URL: https://github.com/apache/datafusion-site/issues/76 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Add content tags to the blogs [datafusion-site]

2025-06-23 Thread via GitHub
alamb commented on issue #76: URL: https://github.com/apache/datafusion-site/issues/76#issuecomment-2997595536 I forgot I can move tickets to different repos -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] [Blog] Proposal: Add categorical-tags to blogs for better navigation [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #16407: URL: https://github.com/apache/datafusion/issues/16407#issuecomment-2997594633 I think we need to do two things as two separate PRs 1. Figure out how to get the tags to render with the pelican sie generator 2. Add additional relevant tags to existing

  1   2   >