[PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-01-21 Thread via GitHub
cj-zhukov opened a new pull request, #14227: URL: https://github.com/apache/datafusion/pull/14227 ## Which issue does this PR close? Closes #14173. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Move `EnforceSorting` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
berkaysynnada commented on issue #14185: URL: https://github.com/apache/datafusion/issues/14185#issuecomment-2606493603 I am taking a final look on https://github.com/apache/datafusion/pull/14219, and https://github.com/apache/datafusion/pull/14190, and I'll merge them if not see a problem.

Re: [PR] Update logical-types to main [datafusion]

2025-01-21 Thread via GitHub
tobixdev commented on PR #14202: URL: https://github.com/apache/datafusion/pull/14202#issuecomment-2606480100 > I am now working on a different branch of mine to ensure that the merge commit is properly handled. Here is the diff between said branch and main which only contains the changes w

Re: [PR] test: interval analysis unit tests [datafusion]

2025-01-21 Thread via GitHub
berkaysynnada merged PR #14189: URL: https://github.com/apache/datafusion/pull/14189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] test: interval analysis unit tests [datafusion]

2025-01-21 Thread via GitHub
berkaysynnada commented on code in PR #14189: URL: https://github.com/apache/datafusion/pull/14189#discussion_r1924806005 ## datafusion/physical-expr/src/analysis.rs: ## @@ -246,3 +246,124 @@ fn calculate_selectivity( acc * cardinality_ratio(&initial.interval, &targ

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-01-21 Thread via GitHub
berkaysynnada commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2606461304 I'll also be reviewing this in detail today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] fix concat ws simplify [datafusion]

2025-01-21 Thread via GitHub
jonahgao merged PR #14213: URL: https://github.com/apache/datafusion/pull/14213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Bug when `concat_ws` simplified to `concat` [datafusion]

2025-01-21 Thread via GitHub
jonahgao closed issue #14212: Bug when `concat_ws` simplified to `concat` URL: https://github.com/apache/datafusion/issues/14212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Minor: use the minimum fetch [datafusion]

2025-01-21 Thread via GitHub
xudong963 commented on PR #14221: URL: https://github.com/apache/datafusion/pull/14221#issuecomment-2606355222 > Thanks @xudong963. Can you also add a small unit test? Thanks, I'll do it later -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] fix concat ws simplify [datafusion]

2025-01-21 Thread via GitHub
cht42 commented on code in PR #14213: URL: https://github.com/apache/datafusion/pull/14213#discussion_r1924687592 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -465,6 +465,11 @@ SELECT concat_ws('|','a',NULL,NULL) a +query T +SELECT concat_ws('','a','b','c') Re

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
parthchandra commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606277066 Sorry about the merge issues guys. git merge seemed to add a couple of files that had been moved/renamed every time I merged from main (and I am pretty sure I removed them

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.best_effort_filter [datafusion]

2025-01-21 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1924674153 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_r

Re: [PR] extract expressions to folders based on spark grouping [datafusion-comet]

2025-01-21 Thread via GitHub
rluvaton closed pull request #1206: extract expressions to folders based on spark grouping URL: https://github.com/apache/datafusion-comet/pull/1206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] fix concat ws simplify [datafusion]

2025-01-21 Thread via GitHub
cht42 commented on code in PR #14213: URL: https://github.com/apache/datafusion/pull/14213#discussion_r1924657387 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -465,6 +465,11 @@ SELECT concat_ws('|','a',NULL,NULL) a +query T +SELECT concat_ws('','a','b','c') Re

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.best_effort_filter [datafusion]

2025-01-21 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1924641365 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_r

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on code in PR #14223: URL: https://github.com/apache/datafusion/pull/14223#discussion_r1924626247 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -777,29 +777,20 @@ pub fn binary_numeric_coercion( (_, Float32) | (Float32, _) => Some(Float

Re: [PR] Support multiple results comparison in benchmark scripts [datafusion]

2025-01-21 Thread via GitHub
Eason0729 commented on PR #14196: URL: https://github.com/apache/datafusion/pull/14196#issuecomment-2606171896 I spent some time trying to make it(compare.py) more readable, and I ended up with original code. Let me know if there is a better way to do this. Anyway, I thinks it read

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.best_effort_filter [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1924616969 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_re

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.best_effort_filter [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1924615742 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_re

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.best_effort_filter [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on PR #14194: URL: https://github.com/apache/datafusion/pull/14194#issuecomment-2606146741 > Hi @jonahgao , I've fixed the previous workflow failure. Would you be able to approve the latest workflow please? Or would you recommend a set of checks I should do offline first?

Re: [PR] test: interval analysis unit tests [datafusion]

2025-01-21 Thread via GitHub
hiltontj commented on code in PR #14189: URL: https://github.com/apache/datafusion/pull/14189#discussion_r1924611242 ## datafusion/physical-expr/src/analysis.rs: ## @@ -246,3 +246,124 @@ fn calculate_selectivity( acc * cardinality_ratio(&initial.interval, &target.in

[I] Variant on `AnalysisContext` to represent empty-set [datafusion]

2025-01-21 Thread via GitHub
hiltontj opened a new issue, #14226: URL: https://github.com/apache/datafusion/issues/14226 ### Is your feature request related to a problem or challenge? The [`AnalysisContext`](https://github.com/apache/datafusion/blob/2aff98e002ce6d48008b8bbe2b38ee644a6d5c20/datafusion/physical-exp

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606146267 > @andygrove @parthchandra Is the change in `docs/source/contributor-guide/benchmarking.md` relevant? This was a merge issue. Will be fixed in https://github.com/apache

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606143968 > This looks `CometScanExec` gets used regardless `COMET_EXEC_ENABLED` and `COMET_NATIVE_SCAN_IMPL` checks The CometScanExec gets replaced later on: ```scala

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606145521 > @andygrove @parthchandra Why are we adding files like `native/spark-expr/src/strings.rs`? There was some refactoring in the past few days in main that resulted in the

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
kazuyukitanimura commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606141084 @andygrove @parthchandra @comphead I think we should revert this change and redo the merge at this point -- This is an automated message from the Apache Git Servic

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606140765 Follow on PR to fix merge issues - https://github.com/apache/datafusion-comet/pull/1320 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
kazuyukitanimura commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606140490 @andygrove @parthchandra ``` // data source V1 case scanExec @ FileSourceScanExec( HadoopFsRelation(_, partitionSchema, _

[PR] chore: Fix merge conflicts from merging comet-parquet-exec into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove opened a new pull request, #1320: URL: https://github.com/apache/datafusion-comet/pull/1320 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
kazuyukitanimura commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606127743 @andygrove @parthchandra Why are we adding files like `native/spark-expr/src/strings.rs`? -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] fix: Improve testing for array_remove and fallback to Spark for unsupported types [datafusion-comet]

2025-01-21 Thread via GitHub
viirya commented on code in PR #1308: URL: https://github.com/apache/datafusion-comet/pull/1308#discussion_r1924592604 ## fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala: ## @@ -60,13 +64,20 @@ object Main { val conf = new Conf(args.toIndexedSeq) conf.subc

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
kazuyukitanimura commented on PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318#issuecomment-2606119962 @andygrove @parthchandra Is the change in `docs/source/contributor-guide/benchmarking.md` relevant? -- This is an automated message from the Apache Git Service. To

Re: [PR] JoinOptimization: Add build side pushdown to probe side [datafusion]

2025-01-21 Thread via GitHub
github-actions[bot] closed pull request #13054: JoinOptimization: Add build side pushdown to probe side URL: https://github.com/apache/datafusion/pull/13054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] fix: Improve testing for array_remove and fallback to Spark for unsupported types [datafusion-comet]

2025-01-21 Thread via GitHub
viirya commented on code in PR #1308: URL: https://github.com/apache/datafusion-comet/pull/1308#discussion_r1924585115 ## fuzz-testing/src/main/scala/org/apache/comet/fuzz/QueryRunner.scala: ## @@ -64,8 +65,12 @@ object QueryRunner { val sparkRows = df.collect()

Re: [PR] fix: Improve testing for array_remove and fallback to Spark for unsupported types [datafusion-comet]

2025-01-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1308: URL: https://github.com/apache/datafusion-comet/pull/1308#discussion_r1924581676 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2660,4 +2660,19 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] Feat: Support arrays_overlap function [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on code in PR #1312: URL: https://github.com/apache/datafusion-comet/pull/1312#discussion_r1924572159 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2300,6 +2300,12 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] Feat: Support arrays_overlap function [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on code in PR #1312: URL: https://github.com/apache/datafusion-comet/pull/1312#discussion_r1924572763 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2568,4 +2568,21 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] build(deps): bump pprof from 0.13.0 to 0.14.0 in /native [datafusion-comet]

2025-01-21 Thread via GitHub
viirya merged PR #1319: URL: https://github.com/apache/datafusion-comet/pull/1319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Automate updating sqllogictest updates [datafusion]

2025-01-21 Thread via GitHub
Omega359 commented on issue #14158: URL: https://github.com/apache/datafusion/issues/14158#issuecomment-2606064028 So I have a bit of an issue with this. I don't know how to implement this without pointing to a branch in my fork of sqllogictest-rs. The code in that branch is not appropriate

Re: [PR] fix concat ws simplify [datafusion]

2025-01-21 Thread via GitHub
Omega359 commented on code in PR #14213: URL: https://github.com/apache/datafusion/pull/14213#discussion_r1924551796 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -465,6 +465,11 @@ SELECT concat_ws('|','a',NULL,NULL) a +query T +SELECT concat_ws('','a','b','c')

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.best_effort_filter [datafusion]

2025-01-21 Thread via GitHub
jamxia155 commented on PR #14194: URL: https://github.com/apache/datafusion/pull/14194#issuecomment-2606046447 Hi @jonahgao , I've fixed the previous workflow failure. Would you be able to approve the latest workflow please? Or would you recommend a set of checks I should do offline first?

Re: [PR] Update logical-types to main [datafusion]

2025-01-21 Thread via GitHub
jayzhan211 commented on PR #14202: URL: https://github.com/apache/datafusion/pull/14202#issuecomment-2606044074 > Thanks! Should I open a new PR so that we can try it with the new branch? What do you mean new branch -- This is an automated message from the Apache Git Service. To res

Re: [I] Result mismatch with vanilla spark in hash function with decimal input [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on issue #1294: URL: https://github.com/apache/datafusion-comet/issues/1294#issuecomment-2606013700 I found more differences through fuzz testing. Query is `SELECT a, xxhash64(a)` for `Decimal(36,18)` input. ``` !== Correct Answer - 100 ==

Re: [PR] feat: remove DataFusion pyarrow feat [datafusion-python]

2025-01-21 Thread via GitHub
kylebarron commented on PR #1000: URL: https://github.com/apache/datafusion-python/pull/1000#issuecomment-2605913460 > By removing the `pyarrow` dependency of DataFusion we can update `pyo3` in without requiring corresponding updates to the DataFusion core repository. FWIW this is a

Re: [PR] Feat: Support array_join function [datafusion-comet]

2025-01-21 Thread via GitHub
erenavsarogullari commented on PR #1290: URL: https://github.com/apache/datafusion-comet/pull/1290#issuecomment-2605910996 Thanks @andygrove. Sure, i will have a look in today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] build(deps): bump pprof from 0.13.0 to 0.14.0 in /native [datafusion-comet]

2025-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #1319: URL: https://github.com/apache/datafusion-comet/pull/1319 Bumps [pprof](https://github.com/tikv/pprof-rs) from 0.13.0 to 0.14.0. Changelog Sourced from https://github.com/tikv/pprof-rs/blob/master/CHANGELOG.md";>pprof's changelog.

Re: [PR] Feat: Support array_join function [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1290: URL: https://github.com/apache/datafusion-comet/pull/1290#issuecomment-2605895841 @erenavsarogullari could you fix the conflicts on this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] extract expressions to folders based on spark grouping [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1206: URL: https://github.com/apache/datafusion-comet/pull/1206#issuecomment-2605894349 @rluvaton ok to close this PR now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Feat: Support array_intersect function [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove merged PR #1271: URL: https://github.com/apache/datafusion-comet/pull/1271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: merge comet-parquet-exec into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove closed pull request #1317: chore: merge comet-parquet-exec into main URL: https://github.com/apache/datafusion-comet/pull/1317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] chore: merge comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove merged PR #1318: URL: https://github.com/apache/datafusion-comet/pull/1318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Move `EnforceDistribution` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
logan-keede commented on PR #14190: URL: https://github.com/apache/datafusion/pull/14190#issuecomment-2605878868 cc @alamb @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-01-21 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2605797934 Thank you @mertak-synnada -- I plan to study this PR carefully tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Made imdb download (data_imdb) function atomic [datafusion]

2025-01-21 Thread via GitHub
alamb commented on code in PR #14225: URL: https://github.com/apache/datafusion/pull/14225#discussion_r1924394309 ## benchmarks/bench.sh: ## @@ -536,23 +536,52 @@ data_imdb() { done if [ "$convert_needed" = true ]; then -if [ ! -f "${imdb_dir}/imdb.tgz" ]; th

Re: [I] Move `EnforceSorting` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
alamb commented on issue #14185: URL: https://github.com/apache/datafusion/issues/14185#issuecomment-2605784105 > Sorry I could not look into this afterwards and currently working. I think my PR is ready for merge so maybe we can merge that one and then rebease from main. Otherwise, I can l

[PR] fix: do not compile `keda.proto` if feature not used. [datafusion-ballista]

2025-01-21 Thread via GitHub
milenkovicm opened a new pull request, #1168: URL: https://github.com/apache/datafusion-ballista/pull/1168 # Which issue does this PR close? Closes None. # Rationale for this change Scheduler compiles `keda.proto` even if it's disabled, compilation requires `protoc` ins

[PR] wip: Another attempt at merging comet-parquet-exec branch into main [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove opened a new pull request, #1318: URL: https://github.com/apache/datafusion-comet/pull/1318 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Support deltalake [datafusion-ballista]

2025-01-21 Thread via GitHub
milenkovicm commented on issue #456: URL: https://github.com/apache/datafusion-ballista/issues/456#issuecomment-2605730109 latest ballista can be extended to support delta table, an example can be found at https://github.com/milenkovicm/ballista_delta -- This is an automated message from

[PR] chore: merge from main [datafusion-comet]

2025-01-21 Thread via GitHub
parthchandra opened a new pull request, #1317: URL: https://github.com/apache/datafusion-comet/pull/1317 merge from main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Merge remote-tracking branch 'apache/main' into comet-parquet-exec - 20240121 [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove merged PR #1316: URL: https://github.com/apache/datafusion-comet/pull/1316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Merge remote-tracking branch 'apache/main' into comet-parquet-exec - 20240121 [datafusion-comet]

2025-01-21 Thread via GitHub
parthchandra commented on PR #1316: URL: https://github.com/apache/datafusion-comet/pull/1316#issuecomment-2605596625 @andygrove @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] chore: Merge remote-tracking branch 'apache/main' into comet-parquet-exec - 20240121 [datafusion-comet]

2025-01-21 Thread via GitHub
parthchandra opened a new pull request, #1316: URL: https://github.com/apache/datafusion-comet/pull/1316 Bring up to date with `main` There were no files changed to resolve the merge conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Feat: Support array_intersect function [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on PR #1271: URL: https://github.com/apache/datafusion-comet/pull/1271#issuecomment-2605560537 > > Thanks @erenavsarogullari. It would be great to have help with this. I will try and add some more notes to the issue with suggestions for how we can improve coverage. >

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-21 Thread via GitHub
Spaarsh closed pull request #14129: Making the data_imdb and clickbench_1 functions atomic. URL: https://github.com/apache/datafusion/pull/14129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Making the data_imdb and clickbench_1 functions atomic. [datafusion]

2025-01-21 Thread via GitHub
Spaarsh commented on PR #14129: URL: https://github.com/apache/datafusion/pull/14129#issuecomment-2605437829 The suggested changes have been implemented and have been committed via another PR #14225. Hence I am closing this PR. -- This is an automated message from the Apache Git Service.

[PR] Made imdb download (data_imdb) function atomic [datafusion]

2025-01-21 Thread via GitHub
Spaarsh opened a new pull request, #14225: URL: https://github.com/apache/datafusion/pull/14225 ## Which issue does this PR close? Closes #14128 ## Rationale for this change Due to non-atomic downloads, the user would need to manually remove files/folders created by

Re: [PR] chore: [comet-parquet-exec] enable native scan by default (again) [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove merged PR #1302: URL: https://github.com/apache/datafusion-comet/pull/1302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
Dandandan closed pull request #14195: Faster reverse() string function for ASCII-only case URL: https://github.com/apache/datafusion/pull/14195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
Dandandan commented on code in PR #14195: URL: https://github.com/apache/datafusion/pull/14195#discussion_r1924121585 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -119,12 +119,21 @@ fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>( ) -> Result { le

[PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
UBarney opened a new pull request, #14195: URL: https://github.com/apache/datafusion/pull/14195 ## Which issue does this PR close? Closes #12445. ## Rationale for this change See the issue. ## What changes are included in this PR? + If all charac

[I] Profile memory usage and add guidance to the tuning guide [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove opened a new issue, #1315: URL: https://github.com/apache/datafusion-comet/issues/1315 ### What is the problem the feature request solves? We do not yet have any documentation explaining how much off-heap memory Comet uses compared to Spark. We should profile this and add do

Re: [I] Move `EnforceSorting` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
logan-keede commented on issue #14185: URL: https://github.com/apache/datafusion/issues/14185#issuecomment-2605311570 No worries, I did it myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Only support escape literals for Postgres, Redshift and generic dialect [datafusion-sqlparser-rs]

2025-01-21 Thread via GitHub
hansott opened a new pull request, #1674: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1674 Example in MySQL: https://github.com/user-attachments/assets/7c4d652e-d78f-4b8d-88b5-b8d7f010db97"; /> -- This is an automated message from the Apache Git Service. To resp

Re: [I] Move `EnforceSorting` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
buraksenn commented on issue #14185: URL: https://github.com/apache/datafusion/issues/14185#issuecomment-2605292627 Sorry I could not look into this afterwards and currently working. I think my PR is ready for merge so maybe we can merge that one and then rebease from main. Otherwise, I can

Re: [PR] Added documentation for the Spaceship Operator (<=>) [datafusion]

2025-01-21 Thread via GitHub
comphead merged PR #14214: URL: https://github.com/apache/datafusion/pull/14214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Add documentation for `<=>` operator [datafusion]

2025-01-21 Thread via GitHub
comphead closed issue #14203: Add documentation for `<=>` operator URL: https://github.com/apache/datafusion/issues/14203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Move `EnforceDistribution` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
logan-keede commented on PR #14190: URL: https://github.com/apache/datafusion/pull/14190#issuecomment-2605221257 To move tests to `datafusion/core/tests/physical_optimizer`, I choose to make a few required function public, I am not sure if that is detrimental to code quality or not. Al

Re: [PR] Added documentation for the Spaceship Operator (<=>) [datafusion]

2025-01-21 Thread via GitHub
Spaarsh commented on code in PR #14214: URL: https://github.com/apache/datafusion/pull/14214#discussion_r1924039540 ## docs/source/user-guide/sql/operators.md: ## @@ -207,6 +208,48 @@ Greater Than or Equal To +--+ ``` +(op_spaceship)= + +### `<=>` + +Thre

Re: [PR] Added documentation for the Spaceship Operator (<=>) [datafusion]

2025-01-21 Thread via GitHub
comphead commented on code in PR #14214: URL: https://github.com/apache/datafusion/pull/14214#discussion_r1924035829 ## docs/source/user-guide/sql/operators.md: ## @@ -207,6 +208,48 @@ Greater Than or Equal To +--+ ``` +(op_spaceship)= + +### `<=>` + +Thr

Re: [PR] Added documentation for the Spaceship Operator (<=>) [datafusion]

2025-01-21 Thread via GitHub
comphead commented on code in PR #14214: URL: https://github.com/apache/datafusion/pull/14214#discussion_r1924035483 ## docs/source/user-guide/sql/operators.md: ## @@ -110,6 +110,7 @@ Modulo (remainder) - [<= (less than or equal to)](#op_le) - [> (greater than)](#op_gt) - [>=

Re: [PR] Merge SortMergeJoin filtered batches into larger batches [datafusion]

2025-01-21 Thread via GitHub
comphead commented on PR #14160: URL: https://github.com/apache/datafusion/pull/14160#issuecomment-2605200055 > > > BatchCoalescer > > > > > > Thanks @berkaysynnada for your feedback, if I got you right, you prefer to call the `CoalesceBatchesExec` just AFTER the `SortMergeJoinExe

Re: [PR] Move `EnforceDistribution` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-21 Thread via GitHub
logan-keede commented on code in PR #14190: URL: https://github.com/apache/datafusion/pull/14190#discussion_r1924024388 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -48,6 +49,7 @@ futures = { workspace = true } itertools = { workspace = true } log = { workspace = true }

Re: [I] Support Rust UDF [datafusion-ballista]

2025-01-21 Thread via GitHub
milenkovicm commented on issue #993: URL: https://github.com/apache/datafusion-ballista/issues/993#issuecomment-2605113194 I believe at this point ballista provides extension functionality for users to implement this functionality themselves if needed. Implementation is not trivial, hence

Re: [I] Any plan to support flink [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on issue #1311: URL: https://github.com/apache/datafusion-comet/issues/1311#issuecomment-2605048596 Gluten plans on adding Flink support in the future according to their [latest talk](https://www.youtube.com/watch?v=GWTj3INSzPg). https://github.com/user-attachment

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-21 Thread via GitHub
nuno-faria commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2605015538 > Perhaps we should add a sqllogictest test for #14208. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on code in PR #14223: URL: https://github.com/apache/datafusion/pull/14223#discussion_r1923910753 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -777,29 +777,20 @@ pub fn binary_numeric_coercion( (_, Float32) | (Float32, _) => Some(Float

Re: [I] Any plan to support flink [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on issue #1311: URL: https://github.com/apache/datafusion-comet/issues/1311#issuecomment-2604974259 It looks like I was mistaken about Gluten supporting Flink. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] fix concat ws simplify [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on code in PR #14213: URL: https://github.com/apache/datafusion/pull/14213#discussion_r1923884148 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -465,6 +465,11 @@ SELECT concat_ws('|','a',NULL,NULL) a +query T +SELECT concat_ws('','a','b','c')

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-21 Thread via GitHub
nuno-faria commented on code in PR #14223: URL: https://github.com/apache/datafusion/pull/14223#discussion_r1923882417 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -777,29 +777,20 @@ pub fn binary_numeric_coercion( (_, Float32) | (Float32, _) => Some(Flo

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2604947472 Perhaps we should add a sqllogictest test for #14208. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-21 Thread via GitHub
jonahgao commented on code in PR #14223: URL: https://github.com/apache/datafusion/pull/14223#discussion_r1923866675 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -777,29 +777,20 @@ pub fn binary_numeric_coercion( (_, Float32) | (Float32, _) => Some(Float

Re: [PR] fix: Improve testing for array_remove and fallback to Spark for unsupported types [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove commented on code in PR #1308: URL: https://github.com/apache/datafusion-comet/pull/1308#discussion_r1921585675 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -141,4 +146,154 @@ class DataGenerator(r: Random) { Range(0, num).map(_ => generateR

Re: [PR] fix: [comet-parquet-exec] Fix regression in supported types [datafusion-comet]

2025-01-21 Thread via GitHub
andygrove merged PR #1309: URL: https://github.com/apache/datafusion-comet/pull/1309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] RFC: Should we remove pyarrow feature from datafusion core [datafusion]

2025-01-21 Thread via GitHub
Omega359 commented on issue #14197: URL: https://github.com/apache/datafusion/issues/14197#issuecomment-2604873558 agreed, remove. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
UBarney commented on code in PR #14195: URL: https://github.com/apache/datafusion/pull/14195#discussion_r1923793794 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -119,12 +119,21 @@ fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>( ) -> Result { let

Re: [PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
UBarney commented on code in PR #14195: URL: https://github.com/apache/datafusion/pull/14195#discussion_r1923792688 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -119,12 +119,21 @@ fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>( ) -> Result { let

Re: [PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
UBarney commented on code in PR #14195: URL: https://github.com/apache/datafusion/pull/14195#discussion_r1923793794 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -119,12 +119,21 @@ fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>( ) -> Result { let

Re: [PR] Faster reverse() string function for ASCII-only case [datafusion]

2025-01-21 Thread via GitHub
UBarney commented on code in PR #14195: URL: https://github.com/apache/datafusion/pull/14195#discussion_r1923792688 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -119,12 +119,21 @@ fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>( ) -> Result { let

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-01-21 Thread via GitHub
mertak-synnada commented on code in PR #14224: URL: https://github.com/apache/datafusion/pull/14224#discussion_r1923766886 ## datafusion/core/src/datasource/data_source.rs: ## @@ -0,0 +1,264 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] chore: fix executor build issue on release [datafusion-ballista]

2025-01-21 Thread via GitHub
andygrove merged PR #1167: URL: https://github.com/apache/datafusion-ballista/pull/1167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

  1   2   >