duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918429534
yep, it should be merged after every point is clear, to reduce review burden
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
berkaysynnada commented on code in PR #16166:
URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113294832
##
datafusion/datasource/src/file_format.rs:
##
@@ -120,7 +121,26 @@ pub trait FileFormatFactory: Sync + Send + GetExt +
fmt::Debug {
&self,
hendrikmakait opened a new pull request, #16207:
URL: https://github.com/apache/datafusion/pull/16207
## Which issue does this PR close?
- Closes #16199.
## What changes are included in this PR?
* Add a test for the size of `Expr`
* Change `Expr::WindowFunction
berkaysynnada commented on PR #16166:
URL: https://github.com/apache/datafusion/pull/16166#issuecomment-2918419765
> Thank you for this contribution @berkaysynnada and @mertak-synnada
>
> I am a little confused about the new structure and exactly what problem is
being solved with this
berkaysynnada commented on code in PR #16166:
URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113286434
##
datafusion/common/src/config.rs:
##
@@ -1612,42 +1623,241 @@ impl TableOptions {
};
e.0.set(key, value)
}
+}
-/// Initializes
alamb commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112282534
##
datafusion/datasource/src/test_util.rs:
##
@@ -81,6 +83,8 @@ impl FileSource for MockSource {
fn file_type(&self) -> &str {
"mock"
}
+
+imp
andygrove closed pull request #1804: chore: manual "git bisect" to try and
determine when CI failures started
URL: https://github.com/apache/datafusion-comet/pull/1804
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
andygrove merged PR #1792:
URL: https://github.com/apache/datafusion-comet/pull/1792
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@
timsaucer commented on issue #1131:
URL:
https://github.com/apache/datafusion-python/issues/1131#issuecomment-2916920418
Ok to close this issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
xudong963 commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2112300128
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
Review Comment:
>
xudong963 merged PR #16157:
URL: https://github.com/apache/datafusion/pull/16157
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
pepijnve commented on issue #16193:
URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916952966
🤔 testing on my machine your adapted version of the code still just keeps on
running. ctrl-c does nothing. The only change I've made is to replace
`tokio::test` with `tokio::ma
kazantsev-maksim commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112325535
##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with
Ada
adriangb commented on code in PR #16139:
URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112282032
##
datafusion/common/src/pruning.rs:
##
@@ -122,3 +126,1002 @@ pub trait PruningStatistics {
values: &HashSet,
) -> Option;
}
+
+/// Prune files ba
paleolimbot commented on issue #936:
URL:
https://github.com/apache/datafusion-python/issues/936#issuecomment-2916913205
Just two things I was involved in that may be useful here:
- `cudf::from_arrow()`:
https://github.com/rapidsai/cudf/blob/2789fa83d943649b982493d68bbba852f848d82c/c
xudong963 commented on issue #15771:
URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2916890212
@alamb, how about starting test next week?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
rluvaton commented on PR #1793:
URL:
https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916913507
I still think there is a bug here:
For this test (when running on main):
```scala
test("debug datafusion native filter") {
val schema = StructType(
Seq(
xudong963 commented on PR #16139:
URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2916919380
Sorry for late, I'll check tomorrow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
timsaucer opened a new issue, #1136:
URL: https://github.com/apache/datafusion-python/issues/1136
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
As a user, if I have written a query that takes a long time, I want to be
able t
adriangb commented on PR #16139:
URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2916950224
> Sorry for late, I'll check tomorrow (feel free to directly invite me to
review by the button, then I'll notice more)
I'm not able to request reviews. I think only commiters
alamb commented on issue #15771:
URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2916986710
> [@alamb](https://github.com/alamb), how about starting test next week?
I think that would be a great idea. Thanks @xudong963
--
This is an automated message from the Ap
pepijnve commented on issue #16193:
URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2917010803
Just tested on Linux. With `USE_TASK = false` I see this
```
Running query; will time out after 5 seconds
InfiniteStream::poll_next 1 times
InfiniteStream::po
alamb commented on PR #14286:
URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2917020530
BTW the SpawnService is what should be used now:
https://github.com/apache/arrow-rs-object-store/pull/332
Sadly, the docs are broken for the current version of object_store so I
alamb merged PR #16093:
URL: https://github.com/apache/datafusion/pull/16093
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
kylebarron commented on issue #1136:
URL:
https://github.com/apache/datafusion-python/issues/1136#issuecomment-2917052106
See
https://pyo3.rs/v0.25.0/faq.html#ctrl-c-doesnt-do-anything-while-my-rust-code-is-executing
and
https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#method.ch
alamb closed issue #16088: RepartitionExec not immediately propagating
`.execute()` calls to children
URL: https://github.com/apache/datafusion/issues/16088
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
adriangb commented on issue #16188:
URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2917056155
Thank you @kosiew.
Clearly what we have now needs work but I think I'd like to defer cleaning
this up until some other folks try to implement more things with these APIs
andygrove commented on PR #1793:
URL:
https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916340347
Thanks @rluvaton but this PR does not appear to help with the CI issue (the
tests are still failing - see
https://github.com/apache/datafusion-comet/actions/runs/15278587551/j
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2916186751
Another solution is using CoalescePartitionsExec to wrapper:
```rust
diff --git a/datafusion/physical-plan/src/coalesce_partitions.rs
b/datafusion/physical-plan/src/
rluvaton commented on code in PR #1793:
URL: https://github.com/apache/datafusion-comet/pull/1793#discussion_r2111820738
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -2406,19 +2406,19 @@ object QueryPlanSerde extends Logging with
CometExprShim {
zhuqi-lucas commented on issue #16193:
URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916217497
Hi @alamb , i believe we also can do the clickbench benchmark for this PR.
But i am not confident about the result since it seems we will always add some
overhead to aggrega
rluvaton commented on PR #1793:
URL:
https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916409442
I thought everything that came from JVM is reusing buffers
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
Copilot commented on code in PR #16203:
URL: https://github.com/apache/datafusion/pull/16203#discussion_r2112167277
##
datafusion/functions-nested/src/extract.rs:
##
@@ -225,6 +225,23 @@ where
return Ok(Arc::new(NullArray::new(array.len(;
}
+if let DataTy
alamb commented on PR #16142:
URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2916703395
I'll plan to merge this tomorrow unless I hear anything different
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
comphead commented on code in PR #1806:
URL: https://github.com/apache/datafusion-comet/pull/1806#discussion_r2112158414
##
.github/workflows/pr_build_linux_spark4.yml:
##
@@ -50,10 +49,53 @@ jobs:
java_version: [17]
test-target: [java]
spark-version:
adriangb commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112147587
##
datafusion/datasource/src/test_util.rs:
##
@@ -81,6 +83,8 @@ impl FileSource for MockSource {
fn file_type(&self) -> &str {
"mock"
}
+
+
comphead commented on code in PR #16203:
URL: https://github.com/apache/datafusion/pull/16203#discussion_r2112176250
##
datafusion/functions-nested/src/extract.rs:
##
@@ -225,6 +225,23 @@ where
return Ok(Arc::new(NullArray::new(array.len(;
}
+if let DataT
alamb commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2112173381
##
datafusion/execution/src/disk_manager.rs:
##
@@ -91,6 +177,11 @@ pub struct DiskManager {
}
impl DiskManager {
+/// Creates a builder for [DiskManager]
+
andygrove opened a new pull request, #1808:
URL: https://github.com/apache/datafusion-comet/pull/1808
## Which issue does this PR close?
N/A
## Rationale for this change
Avoid adding unnecessary copies. Thanks to @rluvaton for noticing this issue
in https
comphead commented on code in PR #16203:
URL: https://github.com/apache/datafusion/pull/16203#discussion_r211218
##
datafusion/functions-nested/src/extract.rs:
##
@@ -225,6 +225,23 @@ where
return Ok(Arc::new(NullArray::new(array.len(;
}
+if let DataT
zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2916219267
Hi @alamb , i believe we also can do the clickbench benchmark for this PR.
But i am not confident about the result since it seems we will always add some
overhead to aggregate. T
onlyjackfrost commented on code in PR #16181:
URL: https://github.com/apache/datafusion/pull/16181#discussion_r2112068788
##
datafusion/datasource/src/source.rs:
##
@@ -58,8 +58,61 @@ use datafusion_physical_plan::filter_pushdown::{
/// Requires `Debug` to assist debugging
///
onlyjackfrost commented on PR #16181:
URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2916577526
@alamb @xudong963
I've updated the diagram based on the feedback. thanks you @xudong963 =D
--
This is an automated message from the Apache Git Service.
To respond to the
zhuqi-lucas commented on issue #16193:
URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916774947
@pepijnve It works for me, the change code is here:
```rust
tokio = { workspace = true, features = ["macros", "signal"]}
```
```rust
use arrow::arra
comphead commented on code in PR #16203:
URL: https://github.com/apache/datafusion/pull/16203#discussion_r2112201561
##
datafusion/functions-nested/src/extract.rs:
##
@@ -225,6 +225,23 @@ where
return Ok(Arc::new(NullArray::new(array.len(;
}
+if let DataT
zhuqi-lucas commented on issue #16193:
URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916779875
Interesting, it seems give me an example which we can use in datafusion-cli
to support cancel quickly!
--
This is an automated message from the Apache Git Service.
To res
alamb commented on PR #16201:
URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917076809
Thank you @liamzwbao -- this looks good to me. I'll start some benchmarks
on this PR and as long as that looks good this PR looks nice to me
Thanks again
--
This is an automa
alamb commented on PR #16093:
URL: https://github.com/apache/datafusion/pull/16093#issuecomment-2917050839
Looks all good to me, so let's go! 🚀
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
alamb commented on PR #16201:
URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917078781
🤖 `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
alamb commented on PR #16139:
URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2917083133
> I'm not able to request reviews. I think only commiters can do that and
I'm not a commiter (yet).
I think you will need to do the gitbox thing with your apache account (when
i
duongcongtoai commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112798304
##
datafusion/expr/src/logical_plan/tree_node.rs:
##
@@ -400,6 +403,8 @@ impl LogicalPlan {
mut f: F,
) -> Result {
match self {
+
duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917657542
> The results are a little inconsistent. __scalar_sq_2."avg(e3.salary)",
__scalar_sq_2.dept_id are not valid fields in the above context. Ideally, all
the field in e1, e2 and e
duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917689696
So here are my thoughts (this plan is to split the work in smaller PRs)
while avoid breaking things as much as possible:
1. we introduce 3 optimizors, declared in the order b
logan-keede commented on issue #11201:
URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2917688776
The comet implementation already has a `PhysicalExpr` for cast. I was
thinking if we could make it datafusion compatible(perhaps it already is) and
while making physical exp
mbutrovich commented on code in PR #1809:
URL: https://github.com/apache/datafusion-comet/pull/1809#discussion_r2112818183
##
common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java:
##
@@ -321,8 +321,6 @@ public void init() throws Throwable {
}
long[]
mbutrovich opened a new pull request, #1809:
URL: https://github.com/apache/datafusion-comet/pull/1809
## Which issue does this PR close?
Partially address #1542.
## Rationale for this change
## What changes are included in this PR?
We valid
mbutrovich commented on code in PR #1809:
URL: https://github.com/apache/datafusion-comet/pull/1809#discussion_r2112818597
##
common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java:
##
@@ -613,7 +611,10 @@ public void close() throws IOException {
@SuppressWarn
mbutrovich opened a new issue, #1810:
URL: https://github.com/apache/datafusion-comet/issues/1810
NativeBatchReader calls `checkParquetType` on all of the columns on every
invocation of `loadNextBatch`. I tried moving it up to `init` but some Spark
SQL tests expect the exceptions that this
duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917717164
or an easiest way is to have a large feature branch :thinking:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
logan-keede commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917741777
cc @alamb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
logan-keede commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917721651
> Beware that this error is thrown after the planning stage has completed,
and it is expected because the current limitation of subquery decorrelation.
Oh I was under the i
duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917730231
true, i've just realized it. Looks like a feature branch for us to work on
is the way then?
--
This is an automated message from the Apache Git Service.
To respond to the
parthchandra merged PR #1785:
URL: https://github.com/apache/datafusion-comet/pull/1785
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr.
irenjj commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917853644
It looks like @duongcongtoai addressed the depth issue in #16016. Maybe
this PR can be merged with #16016 to better verify the depth-related problem?
--
This is an automated messag
irenjj commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112941516
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -287,6 +287,105 @@ pub enum LogicalPlan {
Unnest(Unnest),
/// A variadic query (e.g. "Recursive CTEs")
ctsk opened a new issue, #16206:
URL: https://github.com/apache/datafusion/issues/16206
### Describe the bug
An unfortunate pattern in the hash join implementation leads to excessive
Arc-cloning: Assume the build-side carries a string-view column as a payload.
Let N be the number of
parthchandra commented on PR #1785:
URL:
https://github.com/apache/datafusion-comet/pull/1785#issuecomment-2917762700
@andygrove @mbutrovich
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
andygrove commented on PR #1793:
URL:
https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2917789948
> I still think there is a bug here:
>
> For this test (when running on main):
>
> ```scala
> test("debug datafusion native filter") {
> val schema = Struc
parthchandra commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112886589
##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with
Adaptiv
codecov-commenter commented on PR #1809:
URL:
https://github.com/apache/datafusion-comet/pull/1809#issuecomment-2917796361
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1809?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
codecov-commenter commented on PR #1765:
URL:
https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2917808380
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1765?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
coderfender opened a new pull request, #1811:
URL: https://github.com/apache/datafusion-comet/pull/1811
## Which issue does this PR close?
Closes #.
## Rationale for this change
## What changes are included in this PR?
## How are these chang
parthchandra commented on PR #1765:
URL:
https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2917912411
@mbutrovich looks like this is causing ci failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
parthchandra merged PR #1809:
URL: https://github.com/apache/datafusion-comet/pull/1809
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr.
codecov-commenter commented on PR #1811:
URL:
https://github.com/apache/datafusion-comet/pull/1811#issuecomment-2917941265
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1811?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
101 - 175 of 175 matches
Mail list logo