Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
heung > >> wrote: > >>> > >>> Re shading - same argument I’ve made earlier today in a PR... > >>> > >>> (Context- in many cases Spark has light or indirect dependencies but > >>> bringing them into the process breaks users cod

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
e shading - same argument I’ve made earlier today in a PR... >>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> ____ >

Re: Spark 2.4.2

2019-04-19 Thread Driesprong, Fokko
;>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> -- >>> *From:* Michael Heuer >>> *Sent:* Thursday, Ap

Re: Spark 2.4.2

2019-04-19 Thread Arun Mahadevan
bringing them into the process breaks users code easily) >> >> >> -- >> *From:* Michael Heuer >> *Sent:* Thursday, April 18, 2019 6:41 AM >> *To:* Reynold Xin >> *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen >

Re: Spark 2.4.2

2019-04-18 Thread Wenchen Fan
pendencies but > bringing them into the process breaks users code easily) > > > -- > *From:* Michael Heuer > *Sent:* Thursday, April 18, 2019 6:41 AM > *To:* Reynold Xin > *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen >

Re: Spark 2.4.2

2019-04-18 Thread Felix Cheung
Xin Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; Xiao Li Subject: Re: Spark 2.4.2 +100 On Apr 18, 2019, at 1:48 AM, Reynold Xin mailto:r...@databricks.com>> wrote: We should have shaded all Spark’s dependencies :( On Wed, Apr 17, 2019 at 11:47 PM Sea

Re: Spark 2.4.2

2019-04-18 Thread Michael Heuer
+100 > On Apr 18, 2019, at 1:48 AM, Reynold Xin wrote: > > We should have shaded all Spark’s dependencies :( > > On Wed, Apr 17, 2019 at 11:47 PM Sean Owen > wrote: > For users that would inherit Jackson and use it directly, or whose > dependencies do. Spark itself (w

Re: Spark 2.4.2

2019-04-17 Thread Reynold Xin
We should have shaded all Spark’s dependencies :( On Wed, Apr 17, 2019 at 11:47 PM Sean Owen wrote: > For users that would inherit Jackson and use it directly, or whose > dependencies do. Spark itself (with modifications) should be OK with > the change. > It's risky and normally wouldn't backpor

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
For users that would inherit Jackson and use it directly, or whose dependencies do. Spark itself (with modifications) should be OK with the change. It's risky and normally wouldn't backport, except that I've heard a few times about concerns about CVEs affecting Databind, so wondering who else out t

Re: Spark 2.4.2

2019-04-17 Thread Reynold Xin
For Jackson - are you worrying about JSON parsing for users or internal Spark functionality breaking? On Wed, Apr 17, 2019 at 6:02 PM Sean Owen wrote: > There's only one other item on my radar, which is considering updating > Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
There's only one other item on my radar, which is considering updating Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up a few times now that there are a number of CVEs open for 2.6.7. Cons: not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson behavior non-triv

Re: Spark 2.4.2

2019-04-17 Thread Wenchen Fan
I volunteer to be the release manager for 2.4.2, as I was also going to propose 2.4.2 because of the reverting of SPARK-25250. Is there any other ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the release process today (CST). Thanks, Wenchen On Thu, Apr 18, 2019 at 3:44 AM

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
I think the 'only backport bug fixes to branches' principle remains sound. But what's a bug fix? Something that changes behavior to match what is explicitly supposed to happen, or implicitly supposed to happen -- implied by what other similar things do, by reasonable user expectations, or simply ho

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
Thanks Ryan. To me the "test" for putting things in a maintenance release is really a trade-off between benefit and risk (along with some caveats, like user facing surface should not grow). The benefits here are fairly large (now it is possible to plug in partition aware data sources) and the risk

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue
Spark has a lot of strange behaviors already that we don't fix in patch releases. And bugs aren't usually fixed with a configuration flag to turn on the fix. That said, I don't have a problem with this commit making it into a patch release. This is a small change and looks safe enough to me. I was

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
I would argue that its confusing enough to a user for options from DataFrameWriter to be silently dropped when instantiating the data source to consider this a bug. They asked for partitioning to occur, and we are doing nothing (not even telling them we can't). I was certainly surprised by this b

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue
Is this a bug fix? It looks like a new feature to me. On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust wrote: > Hello All, > > I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 > I was wondering if it > might make sense to