All: here is the backport of changes to update to 2.9.8 from master back to 2.4.
https://github.com/apache/spark/pull/24418

master has been on 2.9.8 for a while, so the concern isnt' Spark so
much. It's that user apps would face the same change of behavior if
they used Jackson in a similar way. I'm moderately in favor of
updating just as it's come up several times, but it's debatable.

On Fri, Apr 19, 2019 at 11:01 AM Alessandro Bellina
<abell...@verizonmedia.com> wrote:
>
> We ran unit tests locally with the patch in 
> https://github.com/apache/spark/pull/21596 on top of our 2.4 branch, so I 
> think that branch-2.4 should pass that part.
>
> As far dependencies, things get murkier. I went through the dependencies 
> where we have exclude clauses for databind. Here is the summary of those 
> versions that were different than 2.9.6 (i.e. kafka-0.10 is using 2.9.6, so 
> not listed below):
>
> hadoop is using jackson version 2.2.3
> org.json4s moved to jackson.version 2.9.8, perhaps we should move to 2.9.8 
> instead? (though it should be compatible with 2.9.6)
> calcite 1.2.0-incubating uses jackson version 2.1.1, but it has moved to 
> 2.9.6 in more recent versions.
> arrow 0.10.0 uses jackson version 2.7.9, but moved to 2.9.8 in version 
> apache-arrow-0.13.0.
> io.fabric8 3.0+ uses jackson version 2.7.4.
>
> The risk to staying with the older version is in user code (or other 
> libraries) not doing proper input data validation. The risk of moving up is 
> we can introduce instability as there are a lot of moving parts. If we can 
> think of a set of tests that can be done here, either manually or automated, 
> I think that would be worthwhile to contribute.
>
> On Thu, Apr 18, 2019 at 9:53 PM Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>> I've cut RC1. If people think we must upgrade Jackson in 2.4, I can cut RC2 
>> shortly.
>>
>> Thanks,
>> Wenchen
>>
>> On Fri, Apr 19, 2019 at 3:32 AM Felix Cheung <felixcheun...@hotmail.com> 
>> wrote:
>>>
>>> Re shading - same argument I’ve made earlier today in a PR...
>>>
>>> (Context- in many cases Spark has light or indirect dependencies but 
>>> bringing them into the process breaks users code easily)
>>>
>>>
>>> ________________________________
>>> From: Michael Heuer <heue...@gmail.com>
>>> Sent: Thursday, April 18, 2019 6:41 AM
>>> To: Reynold Xin
>>> Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; 
>>> Xiao Li
>>> Subject: Re: Spark 2.4.2
>>>
>>> +100
>>>
>>>
>>> On Apr 18, 2019, at 1:48 AM, Reynold Xin <r...@databricks.com> wrote:
>>>
>>> We should have shaded all Spark’s dependencies :(
>>>
>>> On Wed, Apr 17, 2019 at 11:47 PM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>> For users that would inherit Jackson and use it directly, or whose
>>>> dependencies do. Spark itself (with modifications) should be OK with
>>>> the change.
>>>> It's risky and normally wouldn't backport, except that I've heard a
>>>> few times about concerns about CVEs affecting Databind, so wondering
>>>> who else out there might have an opinion. I'm not pushing for it
>>>> necessarily.
>>>>
>>>> On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin <r...@databricks.com> wrote:
>>>> >
>>>> > For Jackson - are you worrying about JSON parsing for users or internal 
>>>> > Spark functionality breaking?
>>>> >
>>>> > On Wed, Apr 17, 2019 at 6:02 PM Sean Owen <sro...@gmail.com> wrote:
>>>> >>
>>>> >> There's only one other item on my radar, which is considering updating
>>>> >> Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up
>>>> >> a few times now that there are a number of CVEs open for 2.6.7. Cons:
>>>> >> not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson
>>>> >> behavior non-trivially. That said back-porting the update PR to 2.4
>>>> >> worked out OK locally. Any strong opinions on this one?
>>>> >>
>>>> >> On Wed, Apr 17, 2019 at 7:49 PM Wenchen Fan <cloud0...@gmail.com> wrote:
>>>> >> >
>>>> >> > I volunteer to be the release manager for 2.4.2, as I was also going 
>>>> >> > to propose 2.4.2 because of the reverting of SPARK-25250. Is there 
>>>> >> > any other ongoing bug fixes we want to include in 2.4.2? If no I'd 
>>>> >> > like to start the release process today (CST).
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Wenchen
>>>> >> >
>>>> >> > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen <sro...@gmail.com> wrote:
>>>> >> >>
>>>> >> >> I think the 'only backport bug fixes to branches' principle remains 
>>>> >> >> sound. But what's a bug fix? Something that changes behavior to 
>>>> >> >> match what is explicitly supposed to happen, or implicitly supposed 
>>>> >> >> to happen -- implied by what other similar things do, by reasonable 
>>>> >> >> user expectations, or simply how it worked previously.
>>>> >> >>
>>>> >> >> Is this a bug fix? I guess the criteria that matches is that 
>>>> >> >> behavior doesn't match reasonable user expectations? I don't know 
>>>> >> >> enough to have a strong opinion. I also don't think there is 
>>>> >> >> currently an objection to backporting it, whatever it's called.
>>>> >> >>
>>>> >> >>
>>>> >> >> Is the question whether this needs a new release? There's no harm in 
>>>> >> >> another point release, other than needing a volunteer release 
>>>> >> >> manager. One could say, wait a bit longer to see what more info 
>>>> >> >> comes in about 2.4.1. But given that 2.4.1 took like 2 months, it's 
>>>> >> >> reasonable to move towards a release cycle again. I don't see 
>>>> >> >> objection to that either (?)
>>>> >> >>
>>>> >> >>
>>>> >> >> The meta question remains: is a 'bug fix' definition even agreed, 
>>>> >> >> and being consistently applied? There aren't correct answers, only 
>>>> >> >> best guesses from each person's own experience, judgment and 
>>>> >> >> priorities. These can differ even when applied in good faith.
>>>> >> >>
>>>> >> >> Sometimes the variance of opinion comes because people have 
>>>> >> >> different info that needs to be surfaced. Here, maybe it's best to 
>>>> >> >> share what about that offline conversation was convincing, for 
>>>> >> >> example.
>>>> >> >>
>>>> >> >> I'd say it's also important to separate what one would prefer from 
>>>> >> >> what one can't live with(out). Assuming one trusts the intent and 
>>>> >> >> experience of the handful of others with an opinion, I'd defer to 
>>>> >> >> someone who wants X and will own it, even if I'm moderately against 
>>>> >> >> it. Otherwise we'd get little done.
>>>> >> >>
>>>> >> >> In that light, it seems like both of the PRs at issue here are not 
>>>> >> >> _wrong_ to backport. This is a good pair that highlights why, when 
>>>> >> >> there isn't a clear reason to do / not do something (e.g. obvious 
>>>> >> >> errors, breaking public APIs) we give benefit-of-the-doubt in order 
>>>> >> >> to get it later.
>>>> >> >>
>>>> >> >>
>>>> >> >> On Wed, Apr 17, 2019 at 12:09 PM Ryan Blue 
>>>> >> >> <rb...@netflix.com.invalid> wrote:
>>>> >> >>>
>>>> >> >>> Sorry, I should be more clear about what I'm trying to say here.
>>>> >> >>>
>>>> >> >>> In the past, Xiao has taken the opposite stance. A good example is 
>>>> >> >>> PR #21060 that was a very similar situation: behavior didn't match 
>>>> >> >>> what was expected and there was low risk. There was a long argument 
>>>> >> >>> and the patch didn't make it into 2.3 (to my knowledge).
>>>> >> >>>
>>>> >> >>> What we call these low-risk behavior fixes doesn't matter. I called 
>>>> >> >>> it a bug on #21060 but I'm applying Xiao's previous definition here 
>>>> >> >>> to make a point. Whatever term we use, we clearly have times when 
>>>> >> >>> we want to allow a patch because it is low risk and helps someone. 
>>>> >> >>> Let's just be clear that that's perfectly fine.
>>>> >> >>>
>>>> >> >>> On Wed, Apr 17, 2019 at 9:34 AM Ryan Blue <rb...@netflix.com> wrote:
>>>> >> >>>>
>>>> >> >>>> How is this a bug fix?
>>>> >> >>>>
>>>> >> >>>> On Wed, Apr 17, 2019 at 9:30 AM Xiao Li <lix...@databricks.com> 
>>>> >> >>>> wrote:
>>>> >> >>>>>
>>>> >> >>>>> Michael and I had an offline discussion about this PR 
>>>> >> >>>>> https://github.com/apache/spark/pull/24365. He convinced me that 
>>>> >> >>>>> this is a bug fix. The code changes of this bug fix are very tiny 
>>>> >> >>>>> and the risk is very low. To avoid any behavior change in the 
>>>> >> >>>>> patch releases, this PR also added a legacy flag whose default 
>>>> >> >>>>> value is off.
>>>> >> >>>>>
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> >>
>>>
>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to