ashdnazg commented on PR #15654:
URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2800542302
I do reproduce it here on ubuntu - when I run the test through the runner it
takes much more time (or hangs entirely) than without.
Just to see what happens, I tried to run th
suremarc commented on PR #15683:
URL: https://github.com/apache/datafusion/pull/15683#issuecomment-2800524158
> That is, users should ensure that the output ordering is correct.
One of the users as of now is `ListingTable`, which I don't believe makes
any such guarantees, so we would
2010YOUY01 commented on code in PR #15594:
URL: https://github.com/apache/datafusion/pull/15594#discussion_r2041433804
##
datafusion/core/src/execution/context/mod.rs:
##
@@ -1036,13 +1040,73 @@ impl SessionContext {
variable, value, ..
} = stmt;
-
2010YOUY01 commented on code in PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#discussion_r2041421543
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -202,27 +204,99 @@ impl TopK {
})
.collect::>>()?;
+// selected indices
+
GitHub user camuel added a comment to the discussion: DISCUSSION: Anyone around
for the Databricks Data & AI Summit in San Francisco June 9β12?
I am local and willing to help organize, let me know how to be useful, also
attending Databricks Data & AI Summit
GitHub link:
https://github.com/ap
rluvaton commented on PR #15466:
URL: https://github.com/apache/datafusion/pull/15466#issuecomment-2800169545
Happy to improve performance π I got more in my chamber
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041251862
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -759,12 +761,51 @@ impl ExternalSorter {
if self.runtime.disk_manager.tmp_files_enabled() {
Dandandan commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2041413480
##
datafusion/physical-plan/src/sorts/sort_filters.rs:
##
@@ -0,0 +1,297 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributo
2010YOUY01 commented on PR #15654:
URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2800461046
> Extended test takes longer time and couldn't finish in 6hr after this
change
>
>
https://github.com/apache/datafusion/actions/runs/14419458859/job/40440288212
I fo
Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800459456
> I'll take a look tomorrow! Why do we have to use only the first column? Is
it just to break up the change into smaller units? We had multi-column support
working in the now close
ashdnazg commented on PR #15654:
URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2800458013
@jayzhan211 :hankey: :frowning_face:
On it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
kosiew commented on PR #15648:
URL: https://github.com/apache/datafusion/pull/15648#issuecomment-2800444256
Closing this.
@acking-you improves this significantly in #15694
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
kosiew closed pull request #15648: Optimize BinaryExpr Evaluation with
Short-Circuiting for AND/OR Operators
URL: https://github.com/apache/datafusion/pull/15648
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
kosiew commented on code in PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#discussion_r2041391176
##
datafusion/physical-expr/src/expressions/binary.rs:
##
@@ -811,58 +822,164 @@ impl BinaryExpr {
}
}
+enum ShortCircuitStrategy<'a> {
+None,
+Retu
kosiew commented on code in PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#discussion_r2041391176
##
datafusion/physical-expr/src/expressions/binary.rs:
##
@@ -811,58 +822,164 @@ impl BinaryExpr {
}
}
+enum ShortCircuitStrategy<'a> {
+None,
+Retu
kosiew commented on code in PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#discussion_r2041328132
##
datafusion/physical-expr/src/expressions/binary.rs:
##
@@ -811,58 +822,164 @@ impl BinaryExpr {
}
}
+enum ShortCircuitStrategy<'a> {
+None,
+Retu
2010YOUY01 commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041385875
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -765,6 +765,25 @@ impl ExternalSorter {
Ok(())
}
+
+/// Reserves memory to be able to a
2010YOUY01 commented on PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#issuecomment-2800424077
Thank you, it looks good to me ππΌ
Let's make the CI pass, I think we can change the assertion type for
simplicity here, and do a separate PR for this utility
`DataFusionError:
2010YOUY01 commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041381039
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -759,12 +761,51 @@ impl ExternalSorter {
if self.runtime.disk_manager.tmp_files_enabled() {
2010YOUY01 commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2800408050
> > > Also, to have a fully working larger than memory sort, you need to
spill in
> > >
https://github.com/apache/datafusion/blob/362fcdfc7b9e00cb6126a0cbc41c9abb2637c563/dataf
2010YOUY01 commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041372025
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -431,12 +422,16 @@ impl ExternalSorter {
let batches_to_spill = std::mem::take(globally_sorted_bat
2010YOUY01 commented on code in PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#discussion_r2041368520
##
datafusion/common/src/config.rs:
##
@@ -337,6 +337,13 @@ config_namespace! {
/// batches and merged.
pub sort_in_place_threshold_bytes: usi
adriangb commented on PR #15057:
URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800379042
> Another question is, isn't the filter created based on table schema? And
then the batch is read as file schema and casted to table schema and is
evaluated by filter.
Yes th
jayzhan211 commented on PR #15057:
URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800373423
> PhysicalExpr::with_schema
This method is too general and it is unclear what we need to do with the
provided schema for each PhysicalExpr, it is not a good idea.
> I
tespent commented on issue #1103:
URL:
https://github.com/apache/datafusion-python/issues/1103#issuecomment-2800371392
> I am concerned about the table providers, though. I think any
implementation will need to get the table provider to provide record batches
efficiently.
A small co
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041234887
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -765,6 +765,25 @@ impl ExternalSorter {
Ok(())
}
+
+/// Reserves memory to be able to accom
GitHub user camuel edited a comment on the discussion: DISCUSSION: Anyone
around for the Databricks Data & AI Summit in San Francisco June 9β12?
I am local and willing to help organize, let me know how to be useful, also
attending Databricks Data & AI Summit.
GitHub link:
https://github.com
xudong963 commented on PR #15661:
URL: https://github.com/apache/datafusion/pull/15661#issuecomment-2800239553
> do you have time to look at that one?
Sure, sorry for the late reply, I had a headache this weekend, so off my
computer.
--
This is an automated message from the Apache
GitHub user camuel edited a comment on the discussion: DISCUSSION: Anyone
around for the Databricks Data & AI Summit in San Francisco June 9β12?
I believe most, if not all, DataFusion meetups in San Francisco have been
kindly hosted by Jeff Huber at the Chroma offices. This time might be simil
GitHub user camuel added a comment to the discussion: DISCUSSION: Anyone around
for the Databricks Data & AI Summit in San Francisco June 9β12?
I believe most, if not all, DataFusion meetups in San Francisco have been
kindly hosted by Jeff Huber at the Chrome offices. This time might be simila
xudong963 commented on PR #15539:
URL: https://github.com/apache/datafusion/pull/15539#issuecomment-2800254350
@berkaysynnada Thanks for your review, i'll continue it this week.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
jayzhan211 commented on PR #15654:
URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2800243988
Extended test takes longer time and couldn't finish in 6hr after this change
https://github.com/apache/datafusion/actions/runs/14419458859/job/40440288212
--
This is an au
xudong963 commented on issue #15689:
URL: https://github.com/apache/datafusion/issues/15689#issuecomment-2800240401
@alamb Thank you for writing the issue in detail
@friendlymatthew Thank you for taking it
--
This is an automated message from the Apache Git Service.
To respond t
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041251862
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -759,12 +761,51 @@ impl ExternalSorter {
if self.runtime.disk_manager.tmp_files_enabled() {
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041254566
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -759,12 +761,51 @@ impl ExternalSorter {
if self.runtime.disk_manager.tmp_files_enabled() {
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041251481
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1552,6 +1593,62 @@ mod tests {
Ok(())
}
+#[tokio::test]
+async fn test_batch_reservati
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041251481
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -1552,6 +1593,62 @@ mod tests {
Ok(())
}
+#[tokio::test]
+async fn test_batch_reservati
rluvaton commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041239380
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -759,12 +761,51 @@ impl ExternalSorter {
if self.runtime.disk_manager.tmp_files_enabled() {
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041234887
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -765,6 +765,25 @@ impl ExternalSorter {
Ok(())
}
+
+/// Reserves memory to be able to accom
adriangb commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800153336
I'll take a look tomorrow! Why do we have to use only the first column? Is
it just to break up the change into smaller units? We had multi-column support
working in the now closed P
rluvaton commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041222937
##
datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs:
##
@@ -753,3 +765,226 @@ async fn test_single_mode_aggregate_with_spill() ->
Result<()> {
Ok(())
}
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041222385
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -765,6 +765,25 @@ impl ExternalSorter {
Ok(())
}
+
+/// Reserves memory to be able to accom
comphead commented on PR #15696:
URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2800125353
Yeah, that was actually my question having the warnings without being
returned to the end user, who is supposed to react on the warnings? π€
--
This is an automated message from t
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041181467
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -529,6 +523,12 @@ impl ExternalSorter {
/// Sorts the in-memory batches and merges them into a single sort
kumarlokesh commented on issue #14452:
URL: https://github.com/apache/datafusion/issues/14452#issuecomment-2800107028
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
rluvaton opened a new pull request, #15700:
URL: https://github.com/apache/datafusion/pull/15700
## Which issue does this PR close?
- Closes #14692.
## Rationale for this change
We need merge sort that does not fail with out of memory
## What changes are included in th
rluvaton commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2800097814
> > Also, to have a fully working larger than memory sort, you need to spill
in
> >
https://github.com/apache/datafusion/blob/362fcdfc7b9e00cb6126a0cbc41c9abb2637c563/datafusion/
rluvaton commented on code in PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#discussion_r2041196922
##
datafusion/common/src/config.rs:
##
@@ -337,6 +337,13 @@ config_namespace! {
/// batches and merged.
pub sort_in_place_threshold_bytes: usize
NickCrews commented on issue #1106:
URL:
https://github.com/apache/datafusion-python/issues/1106#issuecomment-2800092336
I don't super understand #1103, but I think that is maybe the inverse: that
issue is about allowing devs to provide datafusion with ways to access they
custom databases/
NickCrews closed issue #1106: Question: is there a way to get the current
catalog or database?
URL: https://github.com/apache/datafusion-python/issues/1106
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
acking-you commented on PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2800087698
The relevant bug fixes have been completed, and corresponding performance
tests have been conducted. The results show that pre-selection has achieved
significant gains! @Dandandan
comphead commented on code in PR #1643:
URL: https://github.com/apache/datafusion-comet/pull/1643#discussion_r2041190729
##
native/core/src/execution/shuffle/map.rs:
##
@@ -2832,13 +2833,13 @@ pub fn append_map_elements(
}
#[allow(clippy::field_reassign_with_default)]
-pub f
codecov-commenter commented on PR #1643:
URL:
https://github.com/apache/datafusion-comet/pull/1643#issuecomment-2800074535
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1643?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800082794
@adriangb FYI CI is passing, it's ready for review.
I had to make some changes to the filter that is applied to respect
lexicographic ordering (which made Q7 lose the speedup), b
changsun20 commented on PR #15696:
URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2800081010
> Thanks @changsun20 wondering if its possible to test those warnings in
integration slt test files?
Thank you for the thoughtful question, @comphead. I appreciate your focu
logan-keede commented on PR #15680:
URL: https://github.com/apache/datafusion/pull/15680#issuecomment-2800055667
cc @eliaperantoni
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
Dandandan opened a new issue, #15699:
URL: https://github.com/apache/datafusion/issues/15699
### Is your feature request related to a problem or challenge?
TopK can be optimized by filtering on the max value before converting the
arrays to row-format (which is slow).
##
comphead opened a new issue, #1645:
URL: https://github.com/apache/datafusion-comet/issues/1645
Map arm branches takes too much of space in this file, proposing to move Map
arm branches into a separate file.
Ideally to investigate how those branches can be rewritten in macros
_
DerGut commented on code in PR #15692:
URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041181467
##
datafusion/physical-plan/src/sorts/sort.rs:
##
@@ -529,6 +523,12 @@ impl ExternalSorter {
/// Sorts the in-memory batches and merges them into a single sort
akurmustafa commented on issue #15665:
URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2800060351
Hi @lalaorya I didn't reproduce the plans you generated locally, so my
thoughts might be wrong or misleading. However, here are my thoughts regarding
your problem:
> What
comphead commented on code in PR #1643:
URL: https://github.com/apache/datafusion-comet/pull/1643#discussion_r2041175431
##
native/core/src/execution/shuffle/row.rs:
##
@@ -904,7 +904,7 @@ pub(crate) fn append_field(
append_map_element!(StringBuilder, Decima
comphead opened a new issue, #1644:
URL: https://github.com/apache/datafusion-comet/issues/1644
### Describe the bug
Sometimes build failed with
```
warning: spurious network error (3 tries remaining): [7] Could not connect
to server (Failed to connect to index.crates.io port 4
comphead opened a new pull request, #1643:
URL: https://github.com/apache/datafusion-comet/pull/1643
## Which issue does this PR close?
Closes #1633 .
## Rationale for this change
MapBuilder by default uses nullable columns to represent Map entries.
Overriding th
rluvaton commented on code in PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#discussion_r2041170126
##
datafusion/common/src/config.rs:
##
@@ -337,6 +337,13 @@ config_namespace! {
/// batches and merged.
pub sort_in_place_threshold_bytes: usize
comphead merged PR #15695:
URL: https://github.com/apache/datafusion/pull/15695
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
comphead commented on PR #15695:
URL: https://github.com/apache/datafusion/pull/15695#issuecomment-2800045143
Thanks @jayzhan211 and @berkaysynnada
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
comphead closed issue #12376: first_value and last_value should have identical
signatures
URL: https://github.com/apache/datafusion/issues/12376
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
rluvaton commented on code in PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#discussion_r2041170126
##
datafusion/common/src/config.rs:
##
@@ -337,6 +337,13 @@ config_namespace! {
/// batches and merged.
pub sort_in_place_threshold_bytes: usize
Dandandan opened a new issue, #15698:
URL: https://github.com/apache/datafusion/issues/15698
### Is your feature request related to a problem or challenge?
In the PR https://github.com/apache/datafusion/pull/15697 we added support
for filtering input values early on to speed up TopK e
adriangb commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800033871
@Dandandan will be happy to review once CI is passing π
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
timsaucer commented on issue #1106:
URL:
https://github.com/apache/datafusion-python/issues/1106#issuecomment-2800014421
Probably a duplicate for
https://github.com/apache/datafusion-python/issues/1103
@NickCrews please let me know if that issue doesnβt answer your needs
--
This i
Omega359 commented on PR #13527:
URL: https://github.com/apache/datafusion/pull/13527#issuecomment-2800015976
I've spent some time looking at using ExecutionProps for this and while I
think it'll work it's still a lot of churn. That churn is largely because of
two reasons:
1. We woul
milenkovicm opened a new issue, #1238:
URL: https://github.com/apache/datafusion-ballista/issues/1238
At the moment we have three different task distribution strategies
- binding
- round robin
- consistent hashing
I believe we should open scheduler interface exposing p
NickCrews opened a new issue, #1106:
URL: https://github.com/apache/datafusion-python/issues/1106
Hi! I'm working on the datafusion backend for ibis. Specifically, I'm
working on PR https://github.com/ibis-project/ibis/pull/2. Most of the
backends for ibis, such as postgres, duckdb, sql
adriangb commented on PR #15057:
URL: https://github.com/apache/datafusion/pull/15057#issuecomment-282196
I would like to resume this work.
Some thoughts should the rewrite happen via a new trait as I'm currently
doing, or should we add a method `PhysicalExpr::with_schema`?
If
milenkovicm closed issue #579: Improve the way to pass through configurations
to datafusion
URL: https://github.com/apache/datafusion-ballista/issues/579
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
milenkovicm closed issue #578: starts_with function is serialised as UDF
URL: https://github.com/apache/datafusion-ballista/issues/578
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
milenkovicm commented on issue #578:
URL:
https://github.com/apache/datafusion-ballista/issues/578#issuecomment-2799989526
I believe this is not the case anymore, will close it. Please re-open if
still issue
--
This is an automated message from the Apache Git Service.
To respond to the m
milenkovicm commented on issue #1227:
URL:
https://github.com/apache/datafusion-ballista/issues/1227#issuecomment-2799989081
issues which may be relevant to `flight-sql`:
- #941
- #839
--
This is an automated message from the Apache Git Service.
To respond to the message, p
milenkovicm closed issue #633: add create_dataframe method to BallistaContest
URL: https://github.com/apache/datafusion-ballista/issues/633
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
milenkovicm commented on issue #633:
URL:
https://github.com/apache/datafusion-ballista/issues/633#issuecomment-2799988635
Ballista uses `SessionContext` from datafusion most methods exposed by
`SessionContext` should be supported. Closing this as outdated. Please re-open
if still needed
milenkovicm commented on issue #886:
URL:
https://github.com/apache/datafusion-ballista/issues/886#issuecomment-2799986505
will close this issue, ballista does not provide deployment scripts anymore
--
This is an automated message from the Apache Git Service.
To respond to the message, p
milenkovicm closed issue #886: Deployment on AWS
URL: https://github.com/apache/datafusion-ballista/issues/886
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e
milenkovicm commented on PR #1230:
URL:
https://github.com/apache/datafusion-ballista/pull/1230#issuecomment-2799983961
@mmooyyii & @joaoferrao would you mind have a look at this PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799975208
> Nice! We can even wire it up with the filter pushdown so that if an
operator under us "absorbs" the filter (eg it got pushed down to the scan) we
skip doing this internally.
adriangb commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799971429
Nice! We can even wire it up with the filter pushdown so that if an operator
under us "absorbs" the filter (eg it got pushed down to the scan) we skip doing
this internally.
--
T
Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799968795
> If I understand correctly, the ideas to basically do the same thing we're
going to do for the dynamic filters but essentially do the filtering inside of
top K to avoid some extra
GitHub user alamb added a comment to the discussion: San Francisco DataFusion
Meetup scheduled for 9/25
We are organizing another one here:
https://github.com/apache/datafusion/discussions/15657
GitHub link:
https://github.com/apache/datafusion/discussions/11972#discussioncomment-12819584
-
GitHub user alamb added a comment to the discussion: DISCUSSION: Anyone around
for the Databricks Data & AI Summit in San Francisco June 9β12?
Timing Thread:
* maybe we could shoot for Monday June 9 as that will be before the Data and AI
summit events get going full steam
GitHub link:
https:
GitHub user alamb added a comment to the discussion: DISCUSSION: Anyone around
for the Databricks Data & AI Summit in San Francisco June 9β12?
I will also be traveling to attend and would love to help make it happen!
cc @mwlyed @ameyc @emgeee @mwylde maybe you would be around and interested i
timsaucer commented on issue #1103:
URL:
https://github.com/apache/datafusion-python/issues/1103#issuecomment-2799942428
This is *very* good feedback. I think the catalog provider and schema
provider will be relatively easy to do to provide both pure python and rust-ffi
versions. I am conc
Dandandan commented on code in PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#discussion_r2041118426
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -202,24 +204,93 @@ impl TopK {
})
.collect::>>()?;
+// selected indices
+
Dandandan commented on code in PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#discussion_r2041118282
##
datafusion/physical-plan/src/topk/mod.rs:
##
@@ -202,24 +204,93 @@ impl TopK {
})
.collect::>>()?;
+// selected indices
+
Dandandan opened a new pull request, #15697:
URL: https://github.com/apache/datafusion/pull/15697
## Which issue does this PR close?
- Closes #.
## Rationale for this change
This optimizes our TopK by filtering early based on the threshold values,
avoidin
yongda-fan commented on issue #993:
URL:
https://github.com/apache/datafusion-ballista/issues/993#issuecomment-2799909943
Ya i agree, with example
https://github.com/apache/datafusion-ballista/blob/main/examples/examples/custom-executor.rs,
one can easily inject rust UDFs.
--
This is an
yongda-fan closed issue #993: Support Rust UDF
URL: https://github.com/apache/datafusion-ballista/issues/993
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-m
alamb commented on PR #15610:
URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2799903999
I plan to re-review this tomorrow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
alamb commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2799902720
Thanks @jayzhan211 for the approval and for the discussion. I'll plan to
merge https://github.com/apache/datafusion/pull/15466 tomorrow then unless we
want to discuss it further.
Dandandan closed pull request #15690: Specialize join matching when values in
map are unique
URL: https://github.com/apache/datafusion/pull/15690
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
tespent commented on issue #1103:
URL:
https://github.com/apache/datafusion-python/issues/1103#issuecomment-2799841561
@timsaucer This is wonderful! However, I think FFI CatalogProvider is not
enough for my needs, since I'm looking for *pure python-written*
CatalogProvider and SchemaProvid
100 matches
Mail list logo