It seems safer to skip the arrow 0.10.0 upgrade for Spark 2.4 and leave it to Spark 3.0, so that we have more time to test. Any objections?
On Fri, Aug 10, 2018 at 11:53 PM shane knapp <skn...@berkeley.edu> wrote: > quick update from my end: > > SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu) > > SPARK-23874 (arrow -> 0.10.0) now depends on SPARK-25079 (python 3.5 > upgrade) > > both SPARK-25087 and SPARK-25079 are in progress and i'm very very > hesitant to do these upgrades before the code freeze/branch cut. i've done > a TON of testing, but even as of yesterday afternoon, i'm still uncovering > bugs and things that need fixing both on the infrastructure side and spark > itself. > > h/t sean owen for helping out on SPARK-24950 > > On Wed, Aug 8, 2018 at 10:51 AM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> I'm inclined to agree. Just saying that it is not a regression doesn't >> really cut it when it is a now known data correctness issue. We need >> something a lot more than nothing before releasing 2.4.0. At a barest >> minimum, that has to be much more complete and publicly highlighted >> documentation of the issue so that users are less likely to stumble into >> this unaware; but really we need to fix at least the most common cases of >> this bug. Backports to maintenance branches are also probably in order. >> >> On Wed, Aug 8, 2018 at 7:06 AM Imran Rashid <iras...@cloudera.com.invalid> >> wrote: >> >>> On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >>>> >>>> SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>: >>>> Shuffle+Repartition >>>> on an RDD could lead to incorrect answers >>>> It turns out to be a very complicated issue, there is no consensus >>>> about what is the right fix yet. Likely to miss it in Spark 2.4 because >>>> it's a long-standing issue, not a regression. >>>> >>> >>> This is a really serious data loss bug. Yes its very complex, but we >>> absolutely have to fix this, I really think it should be in 2.4. >>> Has worked on it stopped? >>> >> > > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >