dependabot[bot] opened a new pull request, #15470:
URL: https://github.com/apache/datafusion/pull/15470
Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.6.0 to
1.6.1.
Commits
See full diff in https://github.com/smithy-lang/smithy-rs/commits";>compare view
zhuqi-lucas opened a new issue, #15471:
URL: https://github.com/apache/datafusion/issues/15471
### Describe the bug
The average time compute for clickbench query should not inside the query
iterator.
I was mistakenly added inside the iterator.
### To Reproduce
_N
xudong963 commented on issue #10336:
URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2760506147
Fyi, I'm working on it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
zhuqi-lucas commented on PR #15472:
URL: https://github.com/apache/datafusion/pull/15472#issuecomment-2760512944
cc @xudong963 @2010YOUY01
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
zhuqi-lucas commented on issue #15471:
URL: https://github.com/apache/datafusion/issues/15471#issuecomment-2760505849
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
xudong963 opened a new pull request, #15473:
URL: https://github.com/apache/datafusion/pull/15473
## Which issue does this PR close?
- Closes
https://github.com/apache/datafusion/issues/10336#issuecomment-2758082825
## Rationale for this change
As @surema
xudong963 commented on code in PR #15473:
URL: https://github.com/apache/datafusion/pull/15473#discussion_r2018151341
##
datafusion/datasource/src/file_scan_config.rs:
##
@@ -575,6 +575,95 @@ impl FileScanConfig {
})
}
+/// Splits file groups into new groups
kosiew opened a new pull request, #1086:
URL: https://github.com/apache/datafusion-python/pull/1086
## Which issue does this PR close?
Partial fix for #1078
## Rationale for this change
This PR adds configurable display settings for `DataFrame` representations
in the Pyt
ctsk commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2760574284
I think one issue is that the short-circuit logic is not handling cases
where the the `rhs` contains NULLs. E.g. `true OR NULL` needs to evaluate to
`NULL`
--
This is an automated me
zhuqi-lucas opened a new pull request, #15472:
URL: https://github.com/apache/datafusion/pull/15472
## Which issue does this PR close?
- Closes [#15471](https://github.com/apache/datafusion/issues/15471)
## Rationale for this change
the average time for clickbench query c
xudong963 merged PR #15470:
URL: https://github.com/apache/datafusion/pull/15470
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
2010YOUY01 merged PR #15472:
URL: https://github.com/apache/datafusion/pull/15472
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dat
2010YOUY01 closed issue #15471: The average time compute for clickbench query
is wrong
URL: https://github.com/apache/datafusion/issues/15471
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
kosiew opened a new pull request, #1087:
URL: https://github.com/apache/datafusion-python/pull/1087
# Which issue does this PR close?
Partial fix for #1078
# Rationale for this change
> Split up some of the html generation into a set of helper functions.
The render
niebayes commented on issue #15456:
URL: https://github.com/apache/datafusion/issues/15456#issuecomment-2760761409
The line number in the error message is the row index of a certain record
batch, not the line number in the csv file. I have filed an issue to arrow-rs
for making this error me
niebayes commented on issue #15456:
URL: https://github.com/apache/datafusion/issues/15456#issuecomment-2760766577
> why there are two head rows
I didn't find this. You might find the cause.
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
acking-you opened a new pull request, #15475:
URL: https://github.com/apache/datafusion/pull/15475
## Which issue does this PR close?
- Closes #15465 .
## Rationale for this change
## What changes are included in this PR?
## Are these change
acking-you commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2761137454
> I think one issue is that the short-circuit logic is not handling cases
where the the `rhs` contains NULLs. E.g. `true OR NULL` needs to evaluate to
`NULL`
Thank you very
alamb commented on issue #1077:
URL:
https://github.com/apache/datafusion-python/issues/1077#issuecomment-2742876701
FYI @timsaucer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
comphead commented on PR #15476:
URL: https://github.com/apache/datafusion/pull/15476#issuecomment-2762263935
> Note that this does break for users of HashJoinExec that
>
> * Use the CollectLeft mode, with >1 partition on the build side AND
> * Construct their physical plan without
codecov-commenter commented on PR #1577:
URL:
https://github.com/apache/datafusion-comet/pull/1577#issuecomment-2762138134
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1577?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
ctsk commented on PR #15476:
URL: https://github.com/apache/datafusion/pull/15476#issuecomment-2762302514
Before this PR, if someone hand-wired a CollectLeft HashJoin where the left
child has more than one output partition, the HashJoin would automatically add
a CoalescePartitions exec. Thi
friendlymatthew commented on code in PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#discussion_r2019270254
##
datafusion/functions/src/datetime/to_char.rs:
##
@@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) ->
Result {
let result = forma
parthchandra commented on code in PR #1550:
URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2019242229
##
spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala:
##
@@ -490,8 +490,7 @@ object CometScanExec extends DataTypeSupport {
// TO
alamb commented on issue #15037:
URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2762326990
@adriangb and I had a discussion about
https://github.com/apache/datafusion/pull/15301
here are some notes:
## Usecases:
- TopK dynamic filter pushdown
- Prune f
mbutrovich commented on PR #1566:
URL:
https://github.com/apache/datafusion-comet/pull/1566#issuecomment-2762276704
> Looks good, do we still need to wait for arrow-rs based on [#1566
(comment)](https://github.com/apache/datafusion-comet/pull/1566#issuecomment-2748737890)
?
We can m
comphead commented on PR #15467:
URL: https://github.com/apache/datafusion/pull/15467#issuecomment-276152
Thanks @xudong963
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
alamb merged PR #15352:
URL: https://github.com/apache/datafusion/pull/15352
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
kazuyukitanimura commented on PR #1566:
URL:
https://github.com/apache/datafusion-comet/pull/1566#issuecomment-2761992952
Looks good, do we still need to wait for arrow-rs based on
https://github.com/apache/datafusion-comet/pull/1566#issuecomment-2748737890 ?
--
This is an automated mess
suibianwanwank commented on issue #15406:
URL: https://github.com/apache/datafusion/issues/15406#issuecomment-2762027162
Hi, @2010YOUY01. I have read and tried to understand both
SortMergeJoinStream and GroupedHashAggregateStream (though I still have some
uncertainties). I have some initial
jinwenjie123 commented on issue #458:
URL:
https://github.com/apache/datafusion-comet/issues/458#issuecomment-2762497216
Hi @andygrove
May I ask why we decide not support RangePartitioning ? and will it be
supported in the near future ?
Thanks
--
This is an automated message from
ion-elgreco commented on issue #15338:
URL: https://github.com/apache/datafusion/issues/15338#issuecomment-2762411530
> - Related discussion: https://github.com/apache/arrow-rs/issues/7176
>
> I think @kosiew may have a PR up that is related
> - https://github.com/apache/datafusion/
Omega359 commented on code in PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#discussion_r2019195514
##
datafusion/functions/src/datetime/to_char.rs:
##
@@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) ->
Result {
let result = formatter.va
Omega359 commented on code in PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#discussion_r2019198451
##
datafusion/functions/src/datetime/to_char.rs:
##
@@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) ->
Result {
let result = formatter.va
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2762301099
From discussion with Andrew here are a couple notes:
- The most granular update frequency of the filters is when the TopK itself
updates, so we should switch from polling to pushi
logan-keede commented on code in PR #15459:
URL: https://github.com/apache/datafusion/pull/15459#discussion_r2019291249
##
datafusion/catalog/src/memory/table.rs:
##
@@ -0,0 +1,377 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license
acking-you commented on PR #15475:
URL: https://github.com/apache/datafusion/pull/15475#issuecomment-2761888661
> Looks good to me. Since we're only ordering by this it shouldn't matter
that we order by an integer instead of a proper timestamp, ordering is
equivalent.
Thank you very
alamb closed issue #15456: [Bug] datafusion-cli may fail to read csv files
URL: https://github.com/apache/datafusion/issues/15456
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
parthchandra commented on PR #15481:
URL: https://github.com/apache/datafusion/pull/15481#issuecomment-2761890159
There is also some profiling information in
https://datafusion.apache.org/comet/contributor-guide/profiling_native_code.html
I'm presuming this will replace that?
How does
westhide opened a new pull request, #1216:
URL: https://github.com/apache/datafusion-ballista/pull/1216
# Which issue does this PR close?
Closes #1189.
# Rationale for this change
# What changes are included in this PR?
# Are there any user-facing
ctsk commented on PR #15476:
URL: https://github.com/apache/datafusion/pull/15476#issuecomment-2761891586
Note that this does break for users of HashJoinExec that
- Use the CollectLeft mode, with >1 partition on the build side AND
- Construct their physical plan without running EnforceD
Dandandan commented on code in PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#discussion_r2019008151
##
datafusion/physical-optimizer/src/coalesce_batches.rs:
##
@@ -92,3 +92,73 @@ impl PhysicalOptimizerRule for CoalesceBatches {
true
}
}
+
+/// R
Dandandan commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2761889602
That makes a lot of sense!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
ivankelly commented on issue #15028:
URL: https://github.com/apache/datafusion/issues/15028#issuecomment-276185
Excellent analysis folks. Parquet row groups size makes a lot of sense since
the rows are large. We can tune that way down since our use case isn't
columnar. How do I get the
alamb commented on issue #15456:
URL: https://github.com/apache/datafusion/issues/15456#issuecomment-2761888196
Turns out this is a bug in the generator --
https://github.com/clflushopt/tpchgen-rs/issues/73#issuecomment-2761885245
--
This is an automated message from the Apache Git Servic
adriangb commented on PR #15475:
URL: https://github.com/apache/datafusion/pull/15475#issuecomment-2761917991
> > Looks good to me. Since we're only ordering by this it shouldn't matter
that we order by an integer instead of a proper timestamp, ordering is
equivalent.
>
> Thank you v
comphead commented on PR #15481:
URL: https://github.com/apache/datafusion/pull/15481#issuecomment-2761920912
> There is also some profiling information in
https://datafusion.apache.org/comet/contributor-guide/profiling_native_code.html
I'm presuming this will replace that? How does samply
comphead commented on code in PR #15469:
URL: https://github.com/apache/datafusion/pull/15469#discussion_r2019237566
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -416,21 +409,23 @@ impl ExternalSorter {
Some(self.spill_manager.create_in_progress_file("So
alan910127 commented on code in PR #15482:
URL: https://github.com/apache/datafusion/pull/15482#discussion_r2019310584
##
datafusion/core/tests/expr_api/mod.rs:
##
@@ -330,12 +330,12 @@ async fn test_create_physical_expr_coercion() {
create_expr_test(lit(1i32).eq(col("id"))
alan910127 opened a new pull request, #15482:
URL: https://github.com/apache/datafusion/pull/15482
## Which issue does this PR close?
- Closes #15161.
## Rationale for this change
Currently, DataFusion handles comparisons between numbers and string
litera
alamb commented on issue #1576:
URL:
https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2762075206
> @andygrove @alamb would y'all be able to help here? I saw that the
prebuilt JARs were tested for 3.5.4 and upwards. Are they backwards compatible?
Sorry I don't know
alamb commented on PR #15413:
URL: https://github.com/apache/datafusion/pull/15413#issuecomment-2762447424
Run extended tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
jsai28 opened a new issue, #15483:
URL: https://github.com/apache/datafusion/issues/15483
Would there be any interest in building a data quality framework like [Great
Expectations](https://github.com/great-expectations/great_expectationshttps://github.com/great-expectations/great_expectation
comphead commented on PR #14333:
URL: https://github.com/apache/datafusion/pull/14333#issuecomment-2762462451
depends on https://github.com/apache/datafusion/pull/14967
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
westhide commented on issue #1189:
URL:
https://github.com/apache/datafusion-ballista/issues/1189#issuecomment-2761307615
References:
[Benchmarks for Arrow IPC
reader](https://github.com/apache/arrow-rs/pull/7091)
--
This is an automated message from the Apache Git Service.
To respond
berkaysynnada commented on PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2761294822
@irenjj `AggregateFunctionExpr` has `with_new_expressions()` API. As
datafusion hasn't implemented it yet, you didn't have difficulty rewriting the
`human_display` according to
goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2761416899
> You should be able to get the test back by also setting
`datafusion.optimizer.hash_join_single_partition_threshold` to `0` / a low
value.
Thanks. It works. I also added th
LiaCastaneda commented on issue #15477:
URL: https://github.com/apache/datafusion/issues/15477#issuecomment-2761931268
I think @ologlogn will take it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
Kontinuation commented on issue #15028:
URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2761940822
> Excellent analysis folks. Parquet row groups size makes a lot of sense
since the rows are large. We can tune that way down since our use case isn't
columnar. How do I get
westhide opened a new pull request, #1217:
URL: https://github.com/apache/datafusion-ballista/pull/1217
# Which issue does this PR close?
Closes N/A.
# Rationale for this change
# What changes are included in this PR?
# Are there any user-facing ch
westhide commented on issue #1189:
URL:
https://github.com/apache/datafusion-ballista/issues/1189#issuecomment-2761293050
/take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
irenjj commented on PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2761472370
> @irenjj `AggregateFunctionExpr` has `with_new_expressions()` API. As
datafusion hasn't implemented it yet, you didn't have difficulty rewriting the
`human_display` according to the
mbutrovich commented on code in PR #1511:
URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2018762152
##
native/core/src/execution/shuffle/shuffle_writer.rs:
##
@@ -422,27 +432,29 @@ impl ShuffleRepartitioner {
.collect::>>()?;
Dandandan commented on PR #15475:
URL: https://github.com/apache/datafusion/pull/15475#issuecomment-2761610618
> Should we update the same query in the clickbench repo as well?
Yes, and we might rerun the queries as well (as `to_timestamp_seconds` takes
some time itself as well).
--
blaginin commented on PR #15480:
URL: https://github.com/apache/datafusion/pull/15480#issuecomment-2762056261
can you merge main into this branch please? to remove the diff
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
kazuyukitanimura commented on code in PR #1555:
URL: https://github.com/apache/datafusion-comet/pull/1555#discussion_r2019103734
##
native/core/src/parquet/mod.rs:
##
@@ -641,6 +640,8 @@ pub unsafe extern "system" fn
Java_org_apache_comet_parquet_Native_initRecordBat
sessi
qstommyshu commented on PR #15480:
URL: https://github.com/apache/datafusion/pull/15480#issuecomment-2762061935
I'll do a last commit to resolve those comments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
alamb commented on PR #15352:
URL: https://github.com/apache/datafusion/pull/15352#issuecomment-2762070065
Thanks again @blaginin @mertak-synnada
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
alamb commented on PR #15463:
URL: https://github.com/apache/datafusion/pull/15463#issuecomment-2762078550
Thanks @comphead and @xudong963
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
alamb merged PR #15463:
URL: https://github.com/apache/datafusion/pull/15463
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on PR #15450:
URL: https://github.com/apache/datafusion/pull/15450#issuecomment-2762082875
> > BTW @oznur-synnada I wonder if you have time to update the page with
other recent blog content 🤔
>
> You mean this? #15440
Yes! As well as
https://datafusion.ap
alamb commented on PR #15480:
URL: https://github.com/apache/datafusion/pull/15480#issuecomment-2762080799
🌶️
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscrib
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006767851
##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -1655,4 +1656,46 @@ mod tests {
assert_eq!(calls.len(), 2);
assert_eq!(calls,
comphead opened a new pull request, #15481:
URL: https://github.com/apache/datafusion/pull/15481
## Which issue does this PR close?
- Closes #.
## Rationale for this change
Introduce cross platform Samply profiler for DataFusion and benchmarks
## W
acking-you commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2761813612
> I think one issue is that the short-circuit logic is not handling cases
where the the `rhs` contains NULLs. E.g. `true OR NULL` needs to evaluate to
`NULL`
After taking a
Dandandan commented on issue #15478:
URL: https://github.com/apache/datafusion/issues/15478#issuecomment-2762997345
> One downside: Increased memory usage.
>
> The hash join build side stores the RecordBatches in a vector before
building the hash table. This vector will grow larger. I
Omega359 commented on PR #15413:
URL: https://github.com/apache/datafusion/pull/15413#issuecomment-2762816764
Looks like it failed?
https://github.com/apache/datafusion/actions/runs/14139465370/job/39618247236
--
This is an automated message from the Apache Git Service.
To respond to the
Dandandan commented on issue #15465:
URL: https://github.com/apache/datafusion/issues/15465#issuecomment-2762989314
Another discrepancy I found in the queries is the "EventDate"::INT::DATE"
casting. Is this something we could remove as well? Maybe would be good to look
at all further that a
unknown-no commented on issue #1215:
URL:
https://github.com/apache/datafusion-ballista/issues/1215#issuecomment-2763011354
Related [WASM UDFs](https://github.com/apache/datafusion/issues/9326)
--
This is an automated message from the Apache Git Service.
To respond to the message, please
parthchandra commented on issue #1576:
URL:
https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2762922710
This particular API is not a public API and we use it to so we can verify
the metrics in tests. Maybe we can disable its use in non test environments?
--
This is an
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2763043386
@alamb I've achieved 2/3 goals:
- I added wrapping of a `DynamicFilterSource` in a `PhysicalExpr` such that
it can dynamically update itself to prune rows using filter pushdown _e
jayzhan211 commented on code in PR #15482:
URL: https://github.com/apache/datafusion/pull/15482#discussion_r2019668440
##
datafusion/sqllogictest/test_files/push_down_filter.slt:
##
@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a],
full_filters=[t.a != Int32(100)]
xudong963 commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2763074215
> For your planning purposes I will be away the week of April 21 -- so
perhaps we can start testing a week earlier (week of April 7 so we have time to
complete / fix issues pr
xudong963 commented on code in PR #15432:
URL: https://github.com/apache/datafusion/pull/15432#discussion_r2019681642
##
datafusion/core/src/datasource/statistics.rs:
##
@@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit(
Ok((result_files, statistics))
}
-fn ad
shehabgamin commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2762517221
Happy to test whenever!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
github-actions[bot] commented on PR #14323:
URL: https://github.com/apache/datafusion/pull/14323#issuecomment-2763013966
Thank you for your contribution. Unfortunately, this pull request is stale
because it has been open 60 days with no activity. Please remove the stale
label or comment or
jayzhan211 commented on PR #15457:
URL: https://github.com/apache/datafusion/pull/15457#issuecomment-2763109194
> count(*) actually doesnt depend on any column on input logically
count(*) need to know the row number of the column
--
This is an automated message from the Apache Git S
alamb commented on code in PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#discussion_r2019123741
##
benchmarks/queries/clickbench/README.md:
##
@@ -120,13 +122,42 @@ LIMIT 10;
```
Results look like
-
+```
+-+-+---+--+
Dandandan merged PR #15475:
URL: https://github.com/apache/datafusion/pull/15475
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
andygrove commented on issue #458:
URL:
https://github.com/apache/datafusion-comet/issues/458#issuecomment-2762962087
I discussed this feature with @mbutrovich recently and he may have
additional thoughts on this topic.
--
This is an automated message from the Apache Git Service.
To resp
viirya commented on issue #458:
URL:
https://github.com/apache/datafusion-comet/issues/458#issuecomment-2763002946
The implementation issue or difference for RangePartitioning other than
other partitioning like HashPartitioning, is that it involves some sampling
operations that perform wit
parthchandra commented on code in PR #1566:
URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2019634486
##
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala:
##
@@ -1460,6 +1460,59 @@ class ParquetReadV1Suite extends ParquetReadSuite with
ctsk commented on PR #15418:
URL: https://github.com/apache/datafusion/pull/15418#issuecomment-2761412282
> You mean coalesce_partitions_if_needed() call is redundant in datafusion?
I don't think that's the case, but if it is so, why don't we remove that line?
I wanted to keep the PR
alamb commented on issue #15456:
URL: https://github.com/apache/datafusion/issues/15456#issuecomment-2761675489
Nice find @chenkovsky -- so looks like there is some bug in the data
generator after all.
--
This is an automated message from the Apache Git Service.
To respond to the message
Dandandan commented on code in PR #15473:
URL: https://github.com/apache/datafusion/pull/15473#discussion_r2018823379
##
datafusion/datasource/benches/split_groups_by_statistics.rs:
##
@@ -0,0 +1,178 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
mkgada commented on issue #1576:
URL:
https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2761689234
@jinwenjie123 appreciate your response! I am using one of the pre-built
JARs, I will not be able to switch to Spark 3.4.x since our cluster was
recently upgraded to 3.5.x an
qstommyshu opened a new pull request, #15480:
URL: https://github.com/apache/datafusion/pull/15480
## Which issue does this PR close?
- Closes #15398. Related #15444
## Rationale for this change
## What changes are included in this PR?
Migr
qstommyshu commented on PR #15480:
URL: https://github.com/apache/datafusion/pull/15480#issuecomment-2761715713
Hi @alamb and @blaginin
Part2 of the substrait tests migration is done as well. Please take a look
when you have time :)
The only tests that cannot be changed to `in
l0kr opened a new pull request, #1577:
URL: https://github.com/apache/datafusion-comet/pull/1577
## Which issue does this PR close?
Closes #936.
## Rationale for this change
Previous PR went stale so I wanted to move it forward.
## What changes are included
adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2004247883
##
datafusion/common/src/config.rs:
##
@@ -590,6 +590,9 @@ config_namespace! {
/// during aggregations, if possible
pub enable_topk_aggregation:
1 - 100 of 157 matches
Mail list logo