Congratulations! Excellent work!
On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu wrote:
> Absolutely thrilled to see the project going open-source! Huge congrats to
> Chao and the entire team on this milestone!
>
> Yufei
>
>
> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote:
>
>> Hi all,
>>
>> We are
Absolutely thrilled to see the project going open-source! Huge congrats to
Chao and the entire team on this milestone!
Yufei
On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote:
> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via lever
Sure thanks for clarification. I gather what you are alluding to is -- in
a distributed environment, when one does operations that involve shuffling
or repartitioning of data, the order in which this data is processed across
partitions is not guaranteed. So when repartitioning a dataframe, the dat
Apologies if it wasn't clear, I was meaning the difficulty of debugging,
not floating point precision :)
On Wed, Feb 14, 2024 at 2:03 AM Mich Talebzadeh
wrote:
> Hi Jack,
>
> " most SQL engines suffer from the same issue... ""
>
> Sure. This behavior is not a bug, but rather a consequence o
This looks really cool :) Out of interest what are the differences in the
approach between this and Glutten?
On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote:
> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion a
Hi all,
We are very happy to announce that Project Comet, a plugin to
accelerate Spark query execution via leveraging DataFusion and Arrow,
has now been open sourced under the Apache Arrow umbrella. Please
check the project repo
https://github.com/apache/arrow-datafusion-comet for more details if
Thank you for the update, Jungtaek.
Dongjoon.
On Tue, Feb 13, 2024 at 7:29 AM Jungtaek Lim
wrote:
> Hi,
>
> Just a head-up since I didn't give an update for a week after the last
> update from the discussion thread.
>
> I've been following the automated release process and encountered several
>
This would be helpful for a few use cases. For context my team works in
security space, and customers access data through a wrapper around spark
sql connected to hive metastore.
1. When snapshot (non-partitioned) tables are queried, it’s not clear when
the underlying snapshot was last updated. hav
Hi,
Just a head-up since I didn't give an update for a week after the last
update from the discussion thread.
I've been following the automated release process and encountered several
issues. Maybe I will file JIRA tickets and follow PRs.
Issues I figured out so far are 1) python library version
Hi Jack,
" most SQL engines suffer from the same issue... ""
Sure. This behavior is not a bug, but rather a consequence of the
limitations of floating-point precision. The numbers involved in the
example (see SPIP [SPARK-47024] Sum of floats/doubles may be incorrect
depending on partitioning
10 matches
Mail list logo