There's another PR open to expose this more publicity in Python side (
https://github.com/apache/spark/pull/27331).
To sum up, I would like to make sure we know these below:
- Is this expression only for partition or do we plan to expose this to
replace other existing expressions as some kind of p
My thoughts on your list, would be good to get people who worked on these
issues input. Obviously we can weigh the importance of these vs getting 2.4.5
out that has a bunch of other correctness fixes you mention as well. I think
you have already pinged on most of the jira to get feedback.
SP
Hi, All.
BTW, based on the AS-IS feedbacks,
I updated all open `correctness` and `dataloss` issues like the followings.
1. Raised the issue priority into `Blocker`.
2. Set the target version to `3.0.0`.
It's a time to give more visibility to those issues in order to close or
resolve.
Th
Hi, Tom.
Then, along with the following, do you think we need to hold on 2.4.5
release, too?
> If it's really a correctness issue we should hold 3.0 for it.
Recently,
(1) 2.4.4 delivered 9 correctness patches.
(2) 2.4.5 RC1 aimed to deliver the following 9 correctness patches, too.
+1 for proposal.
Tom
On Tuesday, January 21, 2020, 04:37:04 PM CST, Sean Owen
wrote:
See https://github.com/apache/spark/pull/27307 for some context. We've
had to add, in at least one place, some settings to resolve artifacts
from a mirror besides Maven Central to work around some build
I agree, I think we just need to go through all of them and individual assess
each one. If it's really a correctness issue we should hold 3.0 for it.
On the 2.4 release I didn't see an explanation on
https://issues.apache.org/jira/browse/SPARK-26154 why it can't be back ported,
I think in the
From: Dongjoon Hyun
Date: Wednesday, January 22, 2020 at 1:57 AM
To: Wenchen Fan
Cc: dev
Subject: Re: Correctness and data loss issues
Thank you for checking, Wenchen! Sure, we need to do that.
Another question is "What can we do for 2.4.5 release"?
Some of the fixes cannot be back
Hello,
I'm looking into starting work on this ticket
https://issues.apache.org/jira/browse/SPARK-27249
which involves adding an API for transforming Datasets. In the comments
I have a question about whether or not this ticket is still necessary.
Could someone please review and advise.