Re: [PR] feat: job id is incremental [datafusion-ballista]

2025-06-01 Thread via GitHub
Dandandan commented on PR #1267: URL: https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926944302 > hey @Dandandan it does not, the change is focused more on aligning with spark, which to my knowledge does not have multi scheduler setup. > > I'm consider using uli

Re: [PR] feat: job id is incremental [datafusion-ballista]

2025-06-01 Thread via GitHub
milenkovicm commented on PR #1267: URL: https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926963201 when going through logs, for example, it makes it easier to reason about. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Fix: GROUPING SETS accept values without parenthesis [datafusion-sqlparser-rs]

2025-06-01 Thread via GitHub
iffyio commented on code in PR #1867: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1867#discussion_r2118991085 ## src/parser/mod.rs: ## @@ -10045,7 +10055,7 @@ impl<'a> Parser<'a> { } if self.parse_keywords(&[Keyword::GROUPING, Keyword::

Re: [PR] feat: Improve fetch partition performance, support skip validation arrow ipc files [datafusion-ballista]

2025-06-01 Thread via GitHub
Dandandan commented on PR #1216: URL: https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2926981538 As far as I can see, we don't have to validate the IPC files: * Ballista has control over writing the output * In a power down scenario where the file is being writ

Re: [PR] Add change to VARCHAR in the upgrade guide [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16216: URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2926983181 > I remember back then in Oracle days, there was VARCHAR and VARCHAR2 data types, just thinking aloud if it can be reused like VARCHAR is UTF8 ArrowType, VARCHAR2 is UTF8View Th

Re: [PR] feat: job id is incremental [datafusion-ballista]

2025-06-01 Thread via GitHub
milenkovicm commented on PR #1267: URL: https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926982414 there is issue with having job id tied to physical directory, which may make mess when scheduler is restarted without restarting executors, making possibility to overlap

Re: [PR] feat: job id is incremental [datafusion-ballista]

2025-06-01 Thread via GitHub
Dandandan commented on PR #1267: URL: https://github.com/apache/datafusion-ballista/pull/1267#issuecomment-2926982786 Hm yeah. Not anything against it, but just my thoughts :). I think `ULID` would be preferable over an atomic id. -- This is an automated message from the Apache Git Servi

[PR] WIP: Test DataFusion with experimental IncrementalRecordBatchBuilder [datafusion]

2025-06-01 Thread via GitHub
alamb opened a new pull request, #16222: URL: https://github.com/apache/datafusion/pull/16222 This PR is for testing DataFusion with the code in the following PR - https://github.com/apache/arrow-rs/pull/7513 This is the second of 2 experiments: 1. Is `Does ClickBench` performan

Re: [PR] WIP: Test DataFusion with experimental Parquet Filter Pushdown [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16222: URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927009365 šŸ¤– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] WIP: Test DataFusion with experimental Parquet Filter Pushdown [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16222: URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927065804 šŸ¤–: Benchmark completed Details ``` Comparing HEAD and alamb_test_actual_pushdown Benchmark clickbench_extended.json ---

Re: [PR] WIP: Test DataFusion with experimental Parquet Filter Pushdown [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on PR #16222: URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927075657 > šŸ¤–: Benchmark completed > > Details > > ``` > Comparing HEAD and alamb_test_actual_pushdown > > Benchmark clickbench_extended.json >

Re: [PR] WIP: Test DataFusion with experimental IncrementalRecordBatchBuilder [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16208: URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2927076431 I ran q24 locally and did see a small slowdown and did some profiling As expected filtering is about 30% of the overall execution time of the filtering time, about 1/2 goes to c

[PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan opened a new pull request, #16223: URL: https://github.com/apache/datafusion/pull/16223 ## Which issue does this PR close? - Closes #. ## Rationale for this change Recently, I found `interleave_batches` to be faster than the existing code. that takes in

Re: [PR] fix: metadata of join schema [datafusion]

2025-06-01 Thread via GitHub
alamb commented on code in PR #16221: URL: https://github.com/apache/datafusion/pull/16221#discussion_r2119245089 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1626,12 +1626,19 @@ pub fn build_join_schema( join_type, left.fields().len(), ); -l

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16207: URL: https://github.com/apache/datafusion/pull/16207#issuecomment-2927412038 I plan to merge this tomorrow so it can be included in DataFusion 48.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16195: URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2927412259 @gabotechs / @LiaCastaneda please ping me when you think this PR is ready for a review / merge Thank you for the help getting it ready -- This is an automated message from th

Re: [PR] fix: metadata of join schema [datafusion]

2025-06-01 Thread via GitHub
chenkovsky commented on code in PR #16221: URL: https://github.com/apache/datafusion/pull/16221#discussion_r2119256983 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1626,12 +1626,19 @@ pub fn build_join_schema( join_type, left.fields().len(), ); -

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan closed pull request #16223: Concatenate inside hash repartition URL: https://github.com/apache/datafusion/pull/16223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Use `interleave` to speed up hash repartitioning [datafusion]

2025-06-01 Thread via GitHub
Dandandan opened a new pull request, #15768: URL: https://github.com/apache/datafusion/pull/15768 ## Which issue does this PR close? Addresses https://github.com/apache/datafusion/issues/7957, https://github.com/apache/datafusion/issues/6822, https://github.com/apache/datafus

Re: [PR] Chore: implement bit_not as ScalarUDFImpl [datafusion-comet]

2025-06-01 Thread via GitHub
kazantsev-maksim commented on code in PR #1825: URL: https://github.com/apache/datafusion-comet/pull/1825#discussion_r2119396171 ## native/proto/src/proto/expr.proto: ## @@ -72,18 +72,17 @@ message Expr { NormalizeNaNAndZero normalize_nan_and_zero = 45; TruncDate trunc

[I] Add support for compound identifiers in tuple parsing [datafusion]

2025-06-01 Thread via GitHub
hozan23 opened a new issue, #16224: URL: https://github.com/apache/datafusion/issues/16224 Following the PR #11896, I’m going to open another PR for supporting compound identifiers when parsing tuples -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] Support compound identifier when parsing tuples [datafusion]

2025-06-01 Thread via GitHub
hozan23 opened a new pull request, #16225: URL: https://github.com/apache/datafusion/pull/16225 ## Which issue does this PR close? - Closes #16224 ## Rationale for this change We would like to support adding table name qualifiers to columns inside tuples. Currently, this

Re: [PR] feat: Translate Hadoop S3A configurations to object_store configurations [datafusion-comet]

2025-06-01 Thread via GitHub
parthchandra commented on code in PR #1817: URL: https://github.com/apache/datafusion-comet/pull/1817#discussion_r2119749301 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromS3Suite.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Add DataFrame API Documentation for DataFusion Python [datafusion-python]

2025-06-01 Thread via GitHub
renato2099 commented on PR #1132: URL: https://github.com/apache/datafusion-python/pull/1132#issuecomment-2927926602 As a user, these docs seem great! Looking forward to have them merged! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Snowflake: TIMESTAMP precision regression [datafusion-sqlparser-rs]

2025-06-01 Thread via GitHub
nu commented on issue #1861: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1861#issuecomment-2927948906 I'm trying to take a look into that, and I'm trying to evaluate if a new type shall be declared inside `src/ast/data_type.rs` or an existing one shall be reused.

Re: [I] Support RightMark join for `SortMergeJoin` [datafusion]

2025-06-01 Thread via GitHub
jonathanc-n commented on issue #16226: URL: https://github.com/apache/datafusion/issues/16226#issuecomment-2928034844 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[I] Support RightMark join for `SortMergeJoin` [datafusion]

2025-06-01 Thread via GitHub
jonathanc-n opened a new issue, #16226: URL: https://github.com/apache/datafusion/issues/16226 ### Is your feature request related to a problem or challenge? Mentioned as a TODO statement here: https://github.com/apache/datafusion/pull/16083. ### Describe the solution you'd lik

Re: [PR] feat: Translate Hadoop S3A configurations to object_store configurations [datafusion-comet]

2025-06-01 Thread via GitHub
parthchandra commented on PR #1817: URL: https://github.com/apache/datafusion-comet/pull/1817#issuecomment-2928171165 > > I have another thought on this. Any number of users have developed custom `AWSCredentialsProvider`s in Java but we would not have corresponding implementations in Rust

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-01 Thread via GitHub
jonathanc-n commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2119831572 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1126,6 +1153,28 @@ where .collect() } +pub(crate) fn get_mark_indices( +range: &Range, +

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-01 Thread via GitHub
adriangb commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2928411444 Sadly I doubt there's a correct answer. It might be the opposite for a local SSD vs object storage. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-01 Thread via GitHub
jonathanc-n commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2928463577 > Regarding the lack of support for RightMark joins in some join operators, I believe it would be best to return an error in the constructor of those operators if they do not sup

Re: [PR] feat: Translate Hadoop S3A configurations to object_store configurations [datafusion-comet]

2025-06-01 Thread via GitHub
parthchandra commented on PR #1817: URL: https://github.com/apache/datafusion-comet/pull/1817#issuecomment-2928173195 One more thought. Would you be able to write some documentation on configuring/using this? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-01 Thread via GitHub
jonathanc-n commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2119682441 ## datafusion/sql/src/unparser/plan.rs: ## @@ -738,21 +739,38 @@ impl Unparser<'_> { let negated = match join.join_type {

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-01 Thread via GitHub
etseidl commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2928051534 > One challenge / tradeoff that would be interesting/required is that doing another async load to read more of the metdata will be very bad if that has to actually go to object

Re: [PR] [WIP] Remove `COMET_SHUFFLE_FALLBACK_TO_COLUMNAR` config [datafusion-comet]

2025-06-01 Thread via GitHub
coderfender commented on PR #1736: URL: https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2928666274 Working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP] Remove `COMET_SHUFFLE_FALLBACK_TO_COLUMNAR` config [datafusion-comet]

2025-06-01 Thread via GitHub
coderfender commented on PR #1736: URL: https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2928669883 Working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Default to collecting statistics when creating LIstingTables [datafusion]

2025-06-01 Thread via GitHub
brayanjuls commented on issue #16158: URL: https://github.com/apache/datafusion/issues/16158#issuecomment-2928912517 I am sorry for the delay on a resolution to this issue. I have been busy at work and the workload will remain the same at least for the next 3 weeks so I prefer to release th

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2928856563 > > Only those streams that call poll_next themselves in a loop, and as a consequence may block for an extended period of time, would need to do this. Are there that many of thos

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-01 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2926864588 > Only those streams that call poll_next themselves in a loop, and as a consequence may block for an extended period of time, would need to do this. Are there that many of those?

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2926814928 Created a arrow-rs issue, we can implement the interface first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] fix: metadata of join schema [datafusion]

2025-06-01 Thread via GitHub
chenkovsky opened a new pull request, #16221: URL: https://github.com/apache/datafusion/pull/16221 ## Which issue does this PR close? - Closes #15754. ## Rationale for this change some optimization rules will swap left and right plan. then the metadata of optimized p

Re: [PR] feat: `ClusterState` does not cache session contexts [datafusion-ballista]

2025-06-01 Thread via GitHub
milenkovicm merged PR #1226: URL: https://github.com/apache/datafusion-ballista/pull/1226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] `JobStateEventStream` does not emit events related to Session [datafusion-ballista]

2025-06-01 Thread via GitHub
milenkovicm closed issue #1220: `JobStateEventStream` does not emit events related to Session URL: https://github.com/apache/datafusion-ballista/issues/1220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore(deps): update to datafusion 47.0.0 [datafusion-ballista]

2025-06-01 Thread via GitHub
milenkovicm merged PR #1250: URL: https://github.com/apache/datafusion-ballista/pull/1250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-06-01 Thread via GitHub
shehabgamin commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2926895014 > [@shehabgamin](https://github.com/shehabgamin) [@alamb](https://github.com/alamb) I created an epic in Comet for implementing our current expressions as `ScalarUDFImpl` ra

Re: [PR] Use `interleave` to speed up hash repartitioning [datafusion]

2025-06-01 Thread via GitHub
Dandandan closed pull request #15768: Use `interleave` to speed up hash repartitioning URL: https://github.com/apache/datafusion/pull/15768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Chore: implement bit_not as ScalarUDFImpl [datafusion-comet]

2025-06-01 Thread via GitHub
andygrove commented on code in PR #1825: URL: https://github.com/apache/datafusion-comet/pull/1825#discussion_r2119347211 ## native/proto/src/proto/expr.proto: ## @@ -72,18 +72,17 @@ message Expr { NormalizeNaNAndZero normalize_nan_and_zero = 45; TruncDate truncDate =

Re: [PR] Chore: implement bit_not as ScalarUDFImpl [datafusion-comet]

2025-06-01 Thread via GitHub
codecov-commenter commented on PR #1825: URL: https://github.com/apache/datafusion-comet/pull/1825#issuecomment-2927584168 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1825?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan commented on PR #16223: URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927495956 FYI @alamb this relates to your quest to remove `CoalesceBatches` (this doesn't yet remove `concat` but it shows the potential for optimization). -- This is an automated message

Re: [PR] Chore: implement bit_count as ScalarUDFImpl [datafusion-comet]

2025-06-01 Thread via GitHub
codecov-commenter commented on PR #1826: URL: https://github.com/apache/datafusion-comet/pull/1826#issuecomment-2927586200 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1826?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16223: URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927590363 šŸ¤–: Benchmark completed Details ``` Comparing HEAD and concat_in_repartition Benchmark clickbench_extended.json

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan commented on PR #16223: URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927599722 > šŸ¤–: Benchmark completed > > Details > > ``` > Comparing HEAD and concat_in_repartition > > Benchmark clickbench_extended.json > --

Re: [PR] Add change to VARCHAR in the upgrade guide [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16216: URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2927523703 Thanks @comphead and @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan commented on PR #16223: URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927612968 One commit was missing, but not sure that explains the difference between my result and this one. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2926779798 > So it might actually be the case that the changed code is a bit slower for this case. In the query there is only little data to copy (so concat batches -> concat sort keys does

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2926803211 > Isn't that somewhat unavoidable when you're dealing with a cooperative scheduler? There's no way for tokio to preempt. I agree, so i added the most urgent and possible ca

Re: [PR] Add change to VARCHAR in the upgrade guide [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on PR #16216: URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2926807164 > lgtm thanks @alamb > > I remember back then in Oracle days, there was VARCHAR and VARCHAR2 data types, just thinking aloud if it can be reused like VARCHAR is UTF8 ArrowT

[PR] Remove use of deprecated dict_ordered in datafusion-proto (#16218) [datafusion]

2025-06-01 Thread via GitHub
cj-zhukov opened a new pull request, #16220: URL: https://github.com/apache/datafusion/pull/16220 ## Which issue does this PR close? - Closes #16218. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan closed pull request #16223: Concatenate inside hash repartition URL: https://github.com/apache/datafusion/pull/16223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] WIP: Test DataFusion with experimental Parquet Filter Pushdown [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16222: URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927088025 > The clickbench only has several cases with real regression > 20%, and i believe those cases can be improved by combined with adaptive, i think we are at good state. I agree --

[PR] Chore: implement bit_count as ScalarUDFImpl [datafusion-comet]

2025-06-01 Thread via GitHub
kazantsev-maksim opened a new pull request, #1826: URL: https://github.com/apache/datafusion-comet/pull/1826 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1819 ## Rationale for this change See https://github.com/apache/datafu

Re: [PR] WIP: Test DataFusion with experimental Parquet Filter Pushdown [datafusion]

2025-06-01 Thread via GitHub
zhuqi-lucas commented on PR #16222: URL: https://github.com/apache/datafusion/pull/16222#issuecomment-2927122992 > #16208 (comment) https://github.com/apache/arrow-rs/pull/7524#issuecomment-2888412242 Thank you @alamb , from previous result, it will help Q14 Q24 Q30 Q31 , which

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-01 Thread via GitHub
ctsk commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2927498848 Alrighty! Regarding the lack of support for RightMark joins in some join operators, I believe it would be best to return an error in the constructor of those operators if they do

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-01 Thread via GitHub
ctsk commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2119313686 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1126,6 +1153,28 @@ where .collect() } +pub(crate) fn get_mark_indices( +range: &Range, +inp

Re: [PR] Add change to VARCHAR in the upgrade guide [datafusion]

2025-06-01 Thread via GitHub
alamb merged PR #16216: URL: https://github.com/apache/datafusion/pull/16216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add change to VARCHAR in the upgrade guide [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16216: URL: https://github.com/apache/datafusion/pull/16216#issuecomment-2927523426 Had to merge this so we had at least one commit today šŸ˜… -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
alamb commented on PR #16223: URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927525203 šŸ¤– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan commented on PR #16223: URL: https://github.com/apache/datafusion/pull/16223#issuecomment-2927734943 . let me try some other approach later - buffering inputs for each output partition until it reaches the target batch size (just like coalescebatches). perhaps the extra copy for s

Re: [PR] Concatenate inside hash repartition [datafusion]

2025-06-01 Thread via GitHub
Dandandan closed pull request #16223: Concatenate inside hash repartition URL: https://github.com/apache/datafusion/pull/16223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Add FFI CatalogProvider and SchemaProvider [datafusion-python]

2025-06-01 Thread via GitHub
renato2099 commented on issue #1091: URL: https://github.com/apache/datafusion-python/issues/1091#issuecomment-2927812643 Hi @kevinjqliu , @timsaucer , Here is an initial PR for this https://github.com/apache/datafusion-python/pull/1137. Let me know if there is anything else to be d

[PR] Exposing FFI to python [datafusion-python]

2025-06-01 Thread via GitHub
renato2099 opened a new pull request, #1137: URL: https://github.com/apache/datafusion-python/pull/1137 # Which issue does this PR close? Closes #1091 # Rationale for this change Similar to exposing FFI for TableProviders, this PR exposes the capability for exposing Catalo