Re: [PR] Chore: Implement BloomFilterMightContain as a ScalarUDFImpl [datafusion-comet]

2025-06-28 Thread via GitHub
codecov-commenter commented on PR #1954: URL: https://github.com/apache/datafusion-comet/pull/1954#issuecomment-3016356204 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1954?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Postgres constraint NOT VALID and VALIDATE CONSTRAINT [datafusion-sqlparser-rs]

2025-06-28 Thread via GitHub
iffyio commented on code in PR #1908: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1908#discussion_r2173612089 ## src/parser/mod.rs: ## @@ -8477,7 +8477,14 @@ impl<'a> Parser<'a> { pub fn parse_alter_table_operation(&mut self) -> Result { let opera

Re: [PR] chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` [datafusion]

2025-06-28 Thread via GitHub
2010YOUY01 commented on PR #16500: URL: https://github.com/apache/datafusion/pull/16500#issuecomment-3016318383 Thank you! this implementation looks correct to me. Since the state transition in joins are tricky, could you add a test (or ensure there are some existing tests), to double

Re: [D] DataSourceExec metrics explanation [datafusion]

2025-06-28 Thread via GitHub
GitHub user debajyoti-truefoundry added a comment to the discussion: DataSourceExec metrics explanation Seems like this is the case. Ref: https://discord.com/channels/885562378132000778/1388186871594745956/1388748472491970662 GitHub link: https://github.com/apache/datafusion/discussions/165

Re: [D] DataSourceExec metrics explanation [datafusion]

2025-06-28 Thread via GitHub
GitHub user 2010YOUY01 added a comment to the discussion: DataSourceExec metrics explanation My guess: If data source is actually scanning in [0,10], [15,20]. During [10,15] it's not scanning because this datasource operator is not scheduled in the runtime, and its parents are using the CPU t

Re: [D] DataSourceExec metrics explanation [datafusion]

2025-06-28 Thread via GitHub
GitHub user debajyoti-truefoundry added a comment to the discussion: DataSourceExec metrics explanation Thanks for your response. I have a follow-up question. For `time_scanning_total`, > /// Sum of time between when the [`FileStream`] requests data from /// the stream and when a [`Record

Re: [PR] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16604: URL: https://github.com/apache/datafusion/pull/16604#discussion_r2173581271 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1615,11 +1617,27 @@ impl DataFrame { /// # } /// ``` pub fn explain(self, verbose: bool, analyze

[PR] fix: support scalar function nested in get_field [datafusion]

2025-06-28 Thread via GitHub
chenkovsky opened a new pull request, #16610: URL: https://github.com/apache/datafusion/pull/16610 ## Which issue does this PR close? - Closes #16607. ## Rationale for this change unparser for get_field will check the first parameter. currently it only allows column.

Re: [I] SQL Unparser Can Not Process Struct Of Struct Access When Generating Sql [datafusion]

2025-06-28 Thread via GitHub
chenkovsky commented on issue #16607: URL: https://github.com/apache/datafusion/issues/16607#issuecomment-3016207543 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] improve rust workflows without cache [datafusion-ballista]

2025-06-28 Thread via GitHub
Huy1Ng opened a new pull request, #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275 # Which issue does this PR close? Part of #1128. # Rationale for this change In the issue # What changes are included in this PR? Improve the rust workflows by fol

Re: [I] It should be disallowed to specify both order_by and within_group. [datafusion]

2025-06-28 Thread via GitHub
alamb closed issue #16596: It should be disallowed to specify both order_by and within_group. URL: https://github.com/apache/datafusion/issues/16596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] fix: disallow specify both order_by and within_group [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16606: URL: https://github.com/apache/datafusion/pull/16606#issuecomment-3016152969 Thank you @watchingthewheelsgo and @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix: disallow specify both order_by and within_group [datafusion]

2025-06-28 Thread via GitHub
alamb merged PR #16606: URL: https://github.com/apache/datafusion/pull/16606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Use compression type in CSV file suffices [datafusion]

2025-06-28 Thread via GitHub
theirix opened a new pull request, #16609: URL: https://github.com/apache/datafusion/pull/16609 ## Which issue does this PR close? - Closes #16260. ## Rationale for this change As mentioned in that issue, it's reasonable to autodetect a file suffix. For example,

[PR] Chore: Implement BloomFilterMightContain as a ScalarUDFImpl [datafusion-comet]

2025-06-28 Thread via GitHub
tglanz opened a new pull request, #1954: URL: https://github.com/apache/datafusion-comet/pull/1954 ## Which issue does this PR close? - #1952 - #1953 ## Rationale for this change Described in #1819 ## What changes are included in this PR? - Move bloom

[I] Chore: Move BloomFilterAgg to spark-expr crate [datafusion-comet]

2025-06-28 Thread via GitHub
tglanz opened a new issue, #1953: URL: https://github.com/apache/datafusion-comet/issues/1953 Supports #1952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[I] Chore: Implement BloomFilterMightContain as ScalarUDFImpl [datafusion-comet]

2025-06-28 Thread via GitHub
tglanz opened a new issue, #1952: URL: https://github.com/apache/datafusion-comet/issues/1952 ### What is the problem the feature request solves? Described in #1819 ### Describe the potential solution _No response_ ### Additional context _No response_ --

Re: [PR] Add support for Arrow Dictionary type in Substrait [datafusion]

2025-06-28 Thread via GitHub
jkosh44 commented on PR #16608: URL: https://github.com/apache/datafusion/pull/16608#issuecomment-3016026935 @gabotechs FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: Ignore a test case fails on Miri [datafusion-comet]

2025-06-28 Thread via GitHub
andygrove merged PR #1951: URL: https://github.com/apache/datafusion-comet/pull/1951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager [datafusion-comet]

2025-06-28 Thread via GitHub
Iskander14yo commented on issue #864: URL: https://github.com/apache/datafusion-comet/issues/864#issuecomment-3016021200 For anyone wondering (seems like HDFS issue?): Set the _full path_ to the jar (with `hdfs://`) for `spark.jars` param and just _the name of jar_ for both `spark.driver

[PR] Add support for Arrow Time types in Substrait [datafusion]

2025-06-28 Thread via GitHub
jkosh44 opened a new pull request, #16608: URL: https://github.com/apache/datafusion/pull/16608 ## Which issue does this PR close? - Closes #16273. ## Rationale for this change This commit adds support for the Arrow Dictionary type in Substrait plans. ## What chang

Re: [PR] Add support for Arrow Time types in Substrait [datafusion]

2025-06-28 Thread via GitHub
jkosh44 commented on code in PR #16558: URL: https://github.com/apache/datafusion/pull/16558#discussion_r2173512225 ## datafusion/substrait/src/logical_plan/producer/types.rs: ## @@ -360,6 +372,11 @@ mod tests { round_trip_type(DataType::Timestamp(TimeUnit::Nanoseco

Re: [PR] Add support for Arrow Time types in Substrait [datafusion]

2025-06-28 Thread via GitHub
jkosh44 commented on code in PR #16558: URL: https://github.com/apache/datafusion/pull/16558#discussion_r2173511981 ## datafusion/substrait/src/logical_plan/producer/types.rs: ## @@ -360,6 +372,11 @@ mod tests { round_trip_type(DataType::Timestamp(TimeUnit::Nanoseco

Re: [PR] Add support for Arrow Duration type in Substrait [datafusion]

2025-06-28 Thread via GitHub
jkosh44 commented on PR #16503: URL: https://github.com/apache/datafusion/pull/16503#issuecomment-3016010500 @gabotechs This response from Substrait makes me a little nervous about this approach: https://github.com/substrait-io/substrait/issues/822#issuecomment-3008350100 Duration do

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Dictionary(Int32, Utf8) [datafusion]

2025-06-28 Thread via GitHub
jkosh44 commented on issue #16273: URL: https://github.com/apache/datafusion/issues/16273#issuecomment-3016000944 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Avoid recompute CTEs (common table expressions) / share input plans [datafusion]

2025-06-28 Thread via GitHub
findepi commented on issue #8777: URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015995295 Maintaining streaming processing is very useful in the number of circumstances: a query with LIMIT, a query with TopN over sorted input data, an interactive query. I don't know

Re: [PR] fix: disallow specify both order_by and within_group [datafusion]

2025-06-28 Thread via GitHub
findepi commented on code in PR #16606: URL: https://github.com/apache/datafusion/pull/16606#discussion_r2173497813 ## datafusion/sql/src/expr/function.rs: ## @@ -227,6 +227,10 @@ impl SqlToRel<'_, S> { OVER is for window functions, whereas WITHIN GROUP is for

[PR] build(deps): bump arrow from 55.1.0 to 55.2.0 [datafusion-python]

2025-06-28 Thread via GitHub
dependabot[bot] opened a new pull request, #1174: URL: https://github.com/apache/datafusion-python/pull/1174 Bumps [arrow](https://github.com/apache/arrow-rs) from 55.1.0 to 55.2.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow's releases. ar

Re: [PR] Fix `limit` in subqueries [datafusion-sqlparser-rs]

2025-06-28 Thread via GitHub
Dimchikkk commented on PR #1899: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1899#issuecomment-3015983460 > @Dimchikkk I don't have a plan myself -- I think it will be driven by someone who needs features of sqlparser in a new version of DataFusion. Perhaps you can help out

Re: [PR] perf: Optimize `AvgDecimalGroupsAccumulator` [datafusion-comet]

2025-06-28 Thread via GitHub
leung-ming commented on PR #1893: URL: https://github.com/apache/datafusion-comet/pull/1893#issuecomment-3015975515 performance looks improved on my laptop (i7-10710U) before: ``` aggregate/avg_decimal_datafusion time: [1.5631 ms 1.5829 ms 1.6037 ms]

Re: [I] Browser-accessible official DataFusion playground / DataFusion fiddle [datafusion]

2025-06-28 Thread via GitHub
alamb commented on issue #13818: URL: https://github.com/apache/datafusion/issues/13818#issuecomment-3015973967 > Moved, it's already on https://github.com/datafusion-contrib/datafusion-fiddle. I made a small revamp before moving it, so it should look a bit better now. Nice! it is al

Re: [PR] fix: support within_group [datafusion]

2025-06-28 Thread via GitHub
alamb commented on code in PR #16538: URL: https://github.com/apache/datafusion/pull/16538#discussion_r2173481689 ## datafusion/sql/src/expr/function.rs: ## @@ -404,6 +404,11 @@ impl SqlToRel<'_, S> { } (!within_group.is_empty()).then_so

Re: [PR] fix: disallow specify both order_by and within_group [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16606: URL: https://github.com/apache/datafusion/pull/16606#issuecomment-3015971875 Thank you @watchingthewheelsgo 🙏 -- I kicked off the tests! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Fix join precedence for non-snowflake queries [datafusion-sqlparser-rs]

2025-06-28 Thread via GitHub
iffyio merged PR #1905: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Wrong join precedence parsing for non-Snowflake dialects (nested joins parsed incorrectly) [datafusion-sqlparser-rs]

2025-06-28 Thread via GitHub
iffyio closed issue #1904: Wrong join precedence parsing for non-Snowflake dialects (nested joins parsed incorrectly) URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Fix join precedence for non-snowflake queries [datafusion-sqlparser-rs]

2025-06-28 Thread via GitHub
iffyio commented on code in PR #1905: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1905#discussion_r2173481353 ## src/dialect/mod.rs: ## @@ -278,6 +278,34 @@ pub trait Dialect: Debug + Any { false } +/// Indicates whether the dialect supports

Re: [I] Avoid recompute CTEs (common table expressions) / share input plans [datafusion]

2025-06-28 Thread via GitHub
suibianwanwank commented on issue #8777: URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015967757 Hi, I’d like to take a try at this task. My plan is to first support `CTE` with the WITH ... AS MATERIALIZED syntax. After that, we can explore broader optimizations

Re: [PR] fix: Ignore a test case fails on Miri [datafusion-comet]

2025-06-28 Thread via GitHub
leung-ming commented on code in PR #1951: URL: https://github.com/apache/datafusion-comet/pull/1951#discussion_r2173445457 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -2681,7 +2681,7 @@ mod tests { assert_eq!(casted.value(0), 4200); // https://

Re: [PR] Feat: support map_from_arrays [datafusion-comet]

2025-06-28 Thread via GitHub
kazantsev-maksim commented on code in PR #1932: URL: https://github.com/apache/datafusion-comet/pull/1932#discussion_r2173441584 ## spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] feat: Supports array_union [datafusion-comet]

2025-06-28 Thread via GitHub
drexler-sky commented on PR #1945: URL: https://github.com/apache/datafusion-comet/pull/1945#issuecomment-3015781373 Thanks @andygrove @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Browser-accessible official DataFusion playground / DataFusion fiddle [datafusion]

2025-06-28 Thread via GitHub
gabotechs commented on issue #13818: URL: https://github.com/apache/datafusion/issues/13818#issuecomment-3015777584 Moved, it's already on https://github.com/datafusion-contrib/datafusion-fiddle. I made a small revamp before moving it, so it should look a bit better now. Note: deploy

Re: [PR] feat: Add from_unixtime support [datafusion-comet]

2025-06-28 Thread via GitHub
andygrove commented on code in PR #1943: URL: https://github.com/apache/datafusion-comet/pull/1943#discussion_r2173423255 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1179,6 +1179,29 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Feat: support map_from_arrays [datafusion-comet]

2025-06-28 Thread via GitHub
andygrove commented on code in PR #1932: URL: https://github.com/apache/datafusion-comet/pull/1932#discussion_r2173421986 ## spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
Dandandan commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015708840 > Here is a proposed alternative: > > * [Add comments to ClickBench queries about setting binary_as_string  #16605](https://github.com/apache/datafusion/pull/16605) > >

Re: [I] Release Comet 0.9.0 (June/July 2025) [datafusion-comet]

2025-06-28 Thread via GitHub
andygrove commented on issue #1856: URL: https://github.com/apache/datafusion-comet/issues/1856#issuecomment-3015648326 I plan on creating the first release candidate on Monday -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] fix: Ignore a test case fails on Miri [datafusion-comet]

2025-06-28 Thread via GitHub
andygrove commented on code in PR #1951: URL: https://github.com/apache/datafusion-comet/pull/1951#discussion_r2173395678 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -2681,7 +2681,7 @@ mod tests { assert_eq!(casted.value(0), 4200); // https://g

[I] SQL Unparser Can Not Process Struct Of Struct Access When Generating Sql [datafusion]

2025-06-28 Thread via GitHub
hmadison opened a new issue, #16607: URL: https://github.com/apache/datafusion/issues/16607 ### Describe the bug When attempting to invoke `plan_to_sql` on a plan which includes accessing a field inside of a nested structure, the invocation fails with the following error: ```

Re: [PR] Implementation for regex_instr [datafusion]

2025-06-28 Thread via GitHub
nirnayroy commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-3015598680 Hi @blaginin , thanks for the help and suggestions for improvement. I have addressed the requested changes. Please have another look. > Tests are failing. If that helps, yo

Re: [PR] Implementation for regex_instr [datafusion]

2025-06-28 Thread via GitHub
nirnayroy commented on code in PR #15928: URL: https://github.com/apache/datafusion/pull/15928#discussion_r2173375931 ## datafusion/functions/src/regex/regexpinstr.rs: ## @@ -0,0 +1,804 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [I] Logical plan creation for Substrait plans with aggregate relations [datafusion]

2025-06-28 Thread via GitHub
chenkovsky commented on issue #16590: URL: https://github.com/apache/datafusion/issues/16590#issuecomment-3015556166 here's a PR #16161 , that leaves more structure in logical plan after resolving the grouping expr. I will try to solve substrait creation problem next step. -- This is an

[PR] fix: disallow specify both order_by and within_group [datafusion]

2025-06-28 Thread via GitHub
watchingthewheelsgo opened a new pull request, #16606: URL: https://github.com/apache/datafusion/pull/16606 ## Which issue does this PR close? - Closes #16596. ## Rationale for this change ## What changes are included in this PR? raise e

Re: [PR] Implementation for regex_instr [datafusion]

2025-06-28 Thread via GitHub
nirnayroy commented on code in PR #15928: URL: https://github.com/apache/datafusion/pull/15928#discussion_r2173329256 ## datafusion/functions/benches/regx.rs: ## @@ -127,6 +128,46 @@ fn criterion_benchmark(c: &mut Criterion) { }) }); +c.bench_function("regexp

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-06-28 Thread via GitHub
korowa commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173258366 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -729,10 +716,26 @@ struct NestedLoopJoinStream { right_side_ordered: bool, /// Current st

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015414174 Here is a proposed alternative: - https://github.com/apache/datafusion/pull/16605 > As mentioned earlier, I worder though if most of the query performance might be solved by m

Re: [PR] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16604: URL: https://github.com/apache/datafusion/pull/16604#discussion_r2173316232 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1615,11 +1617,27 @@ impl DataFrame { /// # } /// ``` pub fn explain(self, verbose: bool, analyze

[PR] Add comments to ClickBench queries about setting binary_as_string [datafusion]

2025-06-28 Thread via GitHub
alamb opened a new pull request, #16605: URL: https://github.com/apache/datafusion/pull/16605 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16591 - Closes https://github.com/apache/datafusion/pull/16599 ## Rationale for this c

Re: [PR] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
alamb commented on code in PR #16604: URL: https://github.com/apache/datafusion/pull/16604#discussion_r2173310059 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1615,11 +1617,27 @@ impl DataFrame { /// # } /// ``` pub fn explain(self, verbose: bool, analyze: bool

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
Dandandan commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015377418 I agree, we should default by doing the correct thing. the binary_as_string is a nice thing for fixing the benchmark, but by default we shouldn't do it. As mentioned earlier,

Re: [PR] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on PR #16604: URL: https://github.com/apache/datafusion/pull/16604#issuecomment-3015305319 Testing result, it looks good: ```rust cargo run --profile release-nonlto --target aarch64-apple-darwin --bin dfbench -- clickbench --queries-path ./benchmarks/querie

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015303000 > > Now it default to false, but i am not sure if it will make other things broken. > > Yeah I think it will break other things -- it isn't correct in general to treat bin

Re: [I] Improve performance of ClickBench Q21 by removing the cast [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on issue #16591: URL: https://github.com/apache/datafusion/issues/16591#issuecomment-3015301156 A side topic for debugging the benchmark: https://github.com/apache/datafusion/issues/16603 I submit a PR try to make benchmark debug mode to use tree format for

[PR] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas opened a new pull request, #16604: URL: https://github.com/apache/datafusion/pull/16604 ## Which issue does this PR close? - Closes [#16603](https://github.com/apache/datafusion/issues/16603) ## Rationale for this change When debugging the issue: https://github

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015291713 > Now it default to false, but i am not sure if it will make other things broken. Yeah I think it will break other things -- it isn't correct in general to treat binary columns

Re: [PR] [datafusion-spark] Implement spark `luhn_check` function [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16580: URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3015286049 Thank you @tlm365 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [datafusion-spark] Implement spark `luhn_check` function [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16580: URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3015286001 FYI @andygrove and @shehabgamin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3015285765 Thanks again @corwinjoy / @adamreeve and everyone else. This is great -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-06-28 Thread via GitHub
alamb closed issue #15216: Support integration with Parquet modular encryption URL: https://github.com/apache/datafusion/issues/15216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-28 Thread via GitHub
alamb merged PR #16351: URL: https://github.com/apache/datafusion/pull/16351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on issue #16603: URL: https://github.com/apache/datafusion/issues/16603#issuecomment-3015281832 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[I] Support explain tree format debug for benchmark debug [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas opened a new issue, #16603: URL: https://github.com/apache/datafusion/issues/16603 ### Is your feature request related to a problem or challenge? Currently, our benchmark debug is not using explain tree format, this ticket will improve it to tree format. ### Describ

Re: [I] Wrong results when `pushdown_filters` is enabled (starting in 48.0.0) [datafusion]

2025-06-28 Thread via GitHub
alamb commented on issue #16588: URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3015280286 Thanks @ianthetechie -- I have marked this issue as a regression in 48 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[I] Add a document page for all available metrics [datafusion]

2025-06-28 Thread via GitHub
2010YOUY01 opened a new issue, #16602: URL: https://github.com/apache/datafusion/issues/16602 ### Is your feature request related to a problem or challenge? Original discussion question: https://github.com/apache/datafusion/discussions/16572 There are many fine grained metrics

Re: [D] DataSourceExec metrics explanation [datafusion]

2025-06-28 Thread via GitHub
GitHub user 2010YOUY01 added a comment to the discussion: DataSourceExec metrics explanation `time_elapsed_opening` for example, unfortunately we have to search for this name in the codebase, and then follow several indirections to find the comment to explain this metrics: https://github.com/

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015250746 Further debugging now, it only happen when we using ```rust ./datafusion-cli -c "" ``` But not happened for internal datafusion-cli run: ```rust ./da

Re: [PR] fix: Ignore a test case fails on Miri [datafusion-comet]

2025-06-28 Thread via GitHub
codecov-commenter commented on PR #1951: URL: https://github.com/apache/datafusion-comet/pull/1951#issuecomment-3015235423 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1951?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
zhuqi-lucas commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015221855 > Actually I am now torn about this as it will further diverge datafusion-cli and the core library. > > Maybe we can just solve the human error part with comments in the qu

[I] is it possible to make a async UDTF that read rows from other database? [datafusion]

2025-06-28 Thread via GitHub
l1t1 opened a new issue, #16601: URL: https://github.com/apache/datafusion/issues/16601 I read the guide of https://datafusion.apache.org/library-user-guide/functions/adding-udfs.html, and only find **adding-a-scalar-async-udf** and **writing-the-udtf**. ref: https://github.com/apache

Re: [I] suggest compile binaries with dynamic libs [datafusion]

2025-06-28 Thread via GitHub
l1t1 commented on issue #16600: URL: https://github.com/apache/datafusion/issues/16600#issuecomment-3015204515 > I think you can use `cargo build --profile release-nonlto` which will turn off link time optimization and reduce final link time and memory usage > > In terms of taking lot

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015200737 Actually I am now torn about this as it will further diverge datafusion-cli and the core library. Maybe we can just solve the human error part with comments in the queries. I'l

Re: [I] suggest compile binaries with dynamic libs [datafusion]

2025-06-28 Thread via GitHub
alamb commented on issue #16600: URL: https://github.com/apache/datafusion/issues/16600#issuecomment-3015195619 I think you can use `cargo build --profile release-nonlto` which will turn off link time optimization and reduce final link time and memory usage In terms of taking lots of

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015194305 Given how much we/I use datafusion-cli to test benchmark performance (clickbench in particular) I think this is a good change to help -- This is an automated message from the Apache

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-28 Thread via GitHub
alamb commented on code in PR #16599: URL: https://github.com/apache/datafusion/pull/16599#discussion_r2173214983 ## datafusion-cli/src/main.rs: ## @@ -171,7 +171,13 @@ async fn main_inner() -> Result<()> { env::set_current_dir(p).unwrap(); }; -let session_co

Re: [PR] Allow usage of table functions in relations [datafusion]

2025-06-28 Thread via GitHub
osipovartem commented on PR #16571: URL: https://github.com/apache/datafusion/pull/16571#issuecomment-3015191081 Let me fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-28 Thread via GitHub
leung-ming commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3015182461 > > I am not implemented a new dragonbox, I just copy it, add 4 `pub` to expose the decimal interface. > > Could this be done in the original crate? I understand that t

[I] suggest compile binaries with dynamic libs [datafusion]

2025-06-28 Thread via GitHub
l1t1 opened a new issue, #16600: URL: https://github.com/apache/datafusion/issues/16600 ### Is your feature request related to a problem or challenge? when I compile benchmarks binaries with `cargo build --release` and use `mold` as the linker, it occurs following errors ``` (s

Re: [PR] Allow usage of table functions in relations [datafusion]

2025-06-28 Thread via GitHub
alamb commented on PR #16571: URL: https://github.com/apache/datafusion/pull/16571#issuecomment-3015178652 ![Screenshot 2025-06-28 at 6 47 54  AM](https://github.com/user-attachments/assets/7f6045ea-9bdc-4bd6-bb36-d95eb19d3cac) I can't merge this PR because it has a conflict that must

Re: [PR] Fix join precedence for non-snowflake queries [datafusion-sqlparser-rs]

2025-06-28 Thread via GitHub
Dimchikkk commented on PR #1905: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1905#issuecomment-3015171544 Thanks for the review, @iffyio - I've addressed your feedback and it’s ready for another round when you are. -- This is an automated message from the Apache Git Servi

Re: [PR] chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` [datafusion]

2025-06-28 Thread via GitHub
Samyak2 commented on PR #16500: URL: https://github.com/apache/datafusion/pull/16500#issuecomment-3015164616 Rebased on latest main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Support `from_unixtime(ts, [fmt])` [datafusion]

2025-06-28 Thread via GitHub
kazuyukitanimura commented on issue #16577: URL: https://github.com/apache/datafusion/issues/16577#issuecomment-3015103616 I just realized we can just do ``` to_char(from_unixtime(expression[, timezone]), format) ``` However, `to_char` has this problem https://github.com/apache/d