l0kr commented on PR #1577:
URL:
https://github.com/apache/datafusion-comet/pull/1577#issuecomment-2761641254
Previous PR: #958
I noticed that collecting dataframe directly from parquet causes an error.
From what I see the problem lies in converting Columnar back to Row. If that's
some
xudong963 commented on code in PR #15473:
URL: https://github.com/apache/datafusion/pull/15473#discussion_r2018831008
##
datafusion/datasource/benches/split_groups_by_statistics.rs:
##
@@ -0,0 +1,178 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
adriangb commented on PR #15475:
URL: https://github.com/apache/datafusion/pull/15475#issuecomment-2761603385
Should we update the same query in the clickbench repo as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
adriangb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760349060
I created https://github.com/pydantic/datafusion/pull/13/files to discuss
that idea further. It's promising in some ways but also has some issues, I left
TODOs and comments.
--
T
Copilot commented on code in PR #15473:
URL: https://github.com/apache/datafusion/pull/15473#discussion_r2018163460
##
datafusion/datasource/src/file_scan_config.rs:
##
@@ -575,6 +575,95 @@ impl FileScanConfig {
})
}
+/// Splits file groups into new groups ba
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761504336
> Things that serialize a PhysicalExpr across the wire, e.g.
https://github.com/XiangpengHao/liquid-cache does it via
[serialize_physical_expr](https://github.com/apache/datafusion/blo
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761489660
Thank you very much @adriangb -- given the new (warranted) complexity this
feature is likely to add to DataFusion, and the fact if done right it can serve
as the foundation for many a
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2761509767
> I would still keep the methods on ExecutionPlan to do the pushdown instead
of the optimizer rule unless I'm wrong about optimizer rules not being able to
deal with LiquidCacheClientE
blaginin commented on PR #14955:
URL: https://github.com/apache/datafusion/pull/14955#issuecomment-2762094667
hey @Standing-Man https://github.com/apache/datafusion/pull/15352 just got
merged so this pr may be easier to finish fyi 🌻
--
This is an automated message from the Apache Git Serv
andygrove commented on issue #1576:
URL:
https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2762091836
Comet 0.7.0 supported 3.5.0 through 3.5.4 but not 3.5.5 which had breaking
changes to internal apis. I am on vacation this week but it looks like 3.5.5
support was added
alamb commented on issue #15096:
URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2762085363
> Small improvement, i think becasue it's parquet format, mostly we already
load it as the Utf8View for benchmark:
Yes I would expect no change for the clickbench be
adriangb commented on PR #15475:
URL: https://github.com/apache/datafusion/pull/15475#issuecomment-2762100250
@Dandandan do you think we can merge this here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
alamb closed issue #15246: Migrate datasource tests to `insta`
URL: https://github.com/apache/datafusion/issues/15246
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubsc
alamb commented on PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2762327907
I also wrote up some notes here:
https://github.com/apache/datafusion/issues/15037#issuecomment-2762326990
--
This is an automated message from the Apache Git Service.
To respond to
Omega359 commented on code in PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#discussion_r2019547635
##
datafusion/functions/src/datetime/to_char.rs:
##
@@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) ->
Result {
let result = formatter.va
qstommyshu opened a new pull request, #15484:
URL: https://github.com/apache/datafusion/pull/15484
## Which issue does this PR close?
- Closes #15397 .
## Rationale for this change
## What changes are included in this PR?
Migrated tests in `data
Omega359 commented on PR #15413:
URL: https://github.com/apache/datafusion/pull/15413#issuecomment-2762810733
> Run extended tests
I see it did trigger but I somehow was expecting feedback in the comments
--
This is an automated message from the Apache Git Service.
To respond to the
parthchandra commented on code in PR #1575:
URL: https://github.com/apache/datafusion-comet/pull/1575#discussion_r2019608751
##
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala:
##
@@ -1460,6 +1460,25 @@ class ParquetReadV1Suite extends ParquetReadSuite with
alan910127 commented on code in PR #15482:
URL: https://github.com/apache/datafusion/pull/15482#discussion_r2019692309
##
datafusion/sqllogictest/test_files/push_down_filter.slt:
##
@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a],
full_filters=[t.a != Int32(100)]
jayzhan211 opened a new issue, #15485:
URL: https://github.com/apache/datafusion/issues/15485
### Is your feature request related to a problem or challenge?
The QUALIFY clause is used to filter the results of [WINDOW
functions](https://duckdb.org/docs/stable/sql/functions/window_funct
github-actions[bot] closed pull request #14200: Fuse `CASE(a > 0, b / a)`
URL: https://github.com/apache/datafusion/pull/14200
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
github-actions[bot] commented on PR #14028:
URL: https://github.com/apache/datafusion/pull/14028#issuecomment-2763014013
Thank you for your contribution. Unfortunately, this pull request is stale
because it has been open 60 days with no activity. Please remove the stale
label or comment or
xudong963 commented on PR #15480:
URL: https://github.com/apache/datafusion/pull/15480#issuecomment-2763036538
cc @blaginin
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
Kontinuation commented on code in PR #1573:
URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2019693023
##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,48 @@ trait ShimCometScanExec {
protected def is
chenkovsky commented on issue #15456:
URL: https://github.com/apache/datafusion/issues/15456#issuecomment-2760736404
```
grep -n "p_partkey" part.csv
```
why there are two head rows
```
1:p_partkey,p_name,p_mfgr,p_brand,p_type,p_size,p_container,p_retailprice,p_commen
niebayes opened a new issue, #15474:
URL: https://github.com/apache/datafusion/issues/15474
The `UnionExec` operator has two helper functions for merging statistics
from multiple input operators.
Such a functionality is also needed by my project.
Can we provide a public helper func
berkaysynnada commented on PR #15418:
URL: https://github.com/apache/datafusion/pull/15418#issuecomment-2760833915
> I think coalesce added before `execute`?
>
> However just noticed the execute currently happens in main thread instead
of `future::once`. But it shouldn't be an issue a
Dandandan commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2760774554
> #15339 It looks like the join plan is being changed.
You should be able to get the test back by also setting
`hash_join_single_partition_threshold` to `0` / a low value.
berkaysynnada commented on PR #15418:
URL: https://github.com/apache/datafusion/pull/15418#issuecomment-2760838692
> This should not trigger for physical plans generated by datafusion, since
the EnforceDistribution pass already adds that CoalescePartitionsExec.
You mean `coalesce_part
2010YOUY01 commented on PR #1511:
URL:
https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2760428379
> While reading this I also wonder if we would be able to hook into [DF's
new SpillManager](https://github.com/apache/datafusion/pull/15355). That's a
task for another PR, bu
zhuqi-lucas commented on PR #15447:
URL: https://github.com/apache/datafusion/pull/15447#issuecomment-2760854661
First of all, i create the reproducer benchmark PR for arrow-rs:
https://github.com/apache/arrow-rs/pull/7351
| Benchmark | Utf8 Time (µs)| Ut
zhuqi-lucas commented on PR #15447:
URL: https://github.com/apache/datafusion/pull/15447#issuecomment-2760778885
After thinking more, i think the better way is to improve it in arrow-rs, so
we will benefit more about the utf8view regression cases, created the ticket
for arrow-rs:
htt
berkaysynnada commented on PR #15450:
URL: https://github.com/apache/datafusion/pull/15450#issuecomment-2760926069
> BTW @oznur-synnada I wonder if you have time to update the page with other
recent blog content 🤔
You mean this? https://github.com/apache/datafusion/pull/15440
--
Th
zhuqi-lucas commented on PR #15472:
URL: https://github.com/apache/datafusion/pull/15472#issuecomment-2760531078
cc @2010YOUY01 @xudong963
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
acking-you commented on issue #11212:
URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2761086840
Thank you very much for your reply. These are some updates on this issue.
@alamb:
1. I have added the extended SQL in this PR #15462, you can check the
details there:[SQL
2010YOUY01 opened a new pull request, #15469:
URL: https://github.com/apache/datafusion/pull/15469
## Which issue does this PR close?
related to https://github.com/apache/datafusion/issues/15372
## Rationale for this change
One external sort query will panic d
Copilot commented on code in PR #15469:
URL: https://github.com/apache/datafusion/pull/15469#discussion_r2018081822
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -416,21 +409,23 @@ impl ExternalSorter {
Some(self.spill_manager.create_in_progress_file("Sor
xudong963 commented on code in PR #15473:
URL: https://github.com/apache/datafusion/pull/15473#discussion_r2018207072
##
datafusion/datasource/benches/split_groups_by_statistics.rs:
##
@@ -0,0 +1,178 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
ctsk commented on PR #15418:
URL: https://github.com/apache/datafusion/pull/15418#issuecomment-2760648004
> Modifying the plan post-execute() feels a bit off to me. Does it seem like
a smell to you as well?
This should not trigger for physical plans generated by datafusion, since
the
xudong963 commented on code in PR #15473:
URL: https://github.com/apache/datafusion/pull/15473#discussion_r2018226169
##
datafusion/datasource/benches/split_groups_by_statistics.rs:
##
@@ -0,0 +1,178 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
2010YOUY01 commented on code in PR #15409:
URL: https://github.com/apache/datafusion/pull/15409#discussion_r2018202146
##
datafusion/datasource/src/memory.rs:
##
@@ -440,6 +443,35 @@ impl DataSource for MemorySourceConfig {
}
}
+fn repartitioned(
Review Comm
2010YOUY01 commented on code in PR #15409:
URL: https://github.com/apache/datafusion/pull/15409#discussion_r2018210999
##
datafusion/datasource/src/memory.rs:
##
@@ -902,4 +1130,319 @@ mod tests {
Ok(())
}
+
+fn batch(row_size: usize) -> RecordBatch {
+
Dandandan commented on code in PR #15441:
URL: https://github.com/apache/datafusion/pull/15441#discussion_r2018254293
##
datafusion/optimizer/src/eliminate_sort.rs:
##
@@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license
Omega359 commented on issue #15360:
URL: https://github.com/apache/datafusion/issues/15360#issuecomment-2759069227
I can't say I agree with you there but I'm only one voice. I'd rather see
the effort put into other areas tbh.
--
This is an automated message from the Apache Git Service.
To
Dandandan commented on code in PR #15266:
URL: https://github.com/apache/datafusion/pull/15266#discussion_r2007105841
##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue {
}
}
+struct FirstPrimitiveGroupsAccu
comphead merged PR #15467:
URL: https://github.com/apache/datafusion/pull/15467
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
LiaCastaneda opened a new issue, #15477:
URL: https://github.com/apache/datafusion/issues/15477
### Is your feature request related to a problem or challenge?
👋 Min/Max accumulator for List type is missing on datafusion.
### Describe the solution you'd like
The logic is
ctsk opened a new pull request, #15479:
URL: https://github.com/apache/datafusion/pull/15479
Relates to Issue: #15478
## Rationale for this change
The blocking operators (HJ buid side, Aggregation) are often planned on top
of a RepartitionExec with a CoalesceBatchesExec in-betw
ctsk opened a new pull request, #15476:
URL: https://github.com/apache/datafusion/pull/15476
## Which issue does this PR close?
- Closes #.
## Rationale for this change
## What changes are included in this PR?
## Are these changes tested?
ctsk commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2762008525
You're absolutely right, I got my logic wrong there. Embarrasing!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and us
acking-you commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2762014991
> You're absolutely right, I got my logic wrong there. Embarrasing!
It's okay. You've also taught me a lot. When I first started writing this, I
really didn't consider the c
kazuyukitanimura merged PR #1575:
URL: https://github.com/apache/datafusion-comet/pull/1575
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsub
zuston commented on issue #15477:
URL: https://github.com/apache/datafusion/issues/15477#issuecomment-2761897910
If you don’t want to fix, I’m glad to fix this as my first pr. @LiaCastaneda
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
alamb merged PR #15444:
URL: https://github.com/apache/datafusion/pull/15444
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
kazuyukitanimura commented on PR #1575:
URL:
https://github.com/apache/datafusion-comet/pull/1575#issuecomment-2762019121
Merged, thanks @wForget
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
acking-you commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2762023923
Hello @alamb, the optimization SQL and documentation related to this PR have
been completed, and all tests have passed. We may need to formally verify the
performance, but I'm not
alamb commented on code in PR #15480:
URL: https://github.com/apache/datafusion/pull/15480#discussion_r2019086527
##
datafusion/substrait/tests/cases/emit_kind_tests.rs:
##
@@ -53,15 +54,15 @@ mod tests {
let ctx = add_plan_schemas_to_ctx(SessionContext::new(), &proto_p
101 - 157 of 157 matches
Mail list logo