Re: time for Apache Spark 3.0?

vaquar khan Sat, 16 Jun 2018 10:02:59 -0700

Plz ignore last email link (you tube )not sure how it added .
Apologies not sure how to delete it.



On Sat, Jun 16, 2018 at 11:58 AM, vaquar khan <vaquar.k...@gmail.com> wrote:

> +1
>
> https://www.youtube.com/watch?v=-ik7aJ5U6kg
>
> Regards,
> Vaquar khan
>
> On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0.
>>
>>
>> On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan <mri...@gmail.com>
>> wrote:
>>
>>> I agree, I dont see pressing need for major version bump as well.
>>>
>>>
>>> Regards,
>>> Mridul
>>> On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra <m...@clearstorydata.com>
>>> wrote:
>>> >
>>> > Changing major version numbers is not about new features or a vague
>>> notion that it is time to do something that will be seen to be a
>>> significant release. It is about breaking stable public APIs.
>>> >
>>> > I still remain unconvinced that the next version can't be 2.4.0.
>>> >
>>> > On Fri, Jun 15, 2018 at 1:34 AM Andy <andyye...@gmail.com> wrote:
>>> >>
>>> >> Dear all:
>>> >>
>>> >> It have been 2 months since this topic being proposed. Any progress
>>> now? 2018 has been passed about 1/2.
>>> >>
>>> >> I agree with that the new version should be some exciting new
>>> feature. How about this one:
>>> >>
>>> >> 6. ML/DL framework to be integrated as core component and feature.
>>> (Such as Angel / BigDL / ……)
>>> >>
>>> >> 3.0 is a very important version for an good open source project. It
>>> should be better to drift away the historical burden and focus in new area.
>>> Spark has been widely used all over the world as a successful big data
>>> framework. And it can be better than that.
>>> >>
>>> >> Andy
>>> >>
>>> >>
>>> >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin <r...@databricks.com>
>>> wrote:
>>> >>>
>>> >>> There was a discussion thread on scala-contributors about Apache
>>> Spark not yet supporting Scala 2.12, and that got me to think perhaps it is
>>> about time for Spark to work towards the 3.0 release. By the time it comes
>>> out, it will be more than 2 years since Spark 2.0.
>>> >>>
>>> >>> For contributors less familiar with Spark’s history, I want to give
>>> more context on Spark releases:
>>> >>>
>>> >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July
>>> 2016. If we were to maintain the ~ 2 year cadence, it is time to work on
>>> Spark 3.0 in 2018.
>>> >>>
>>> >>> 2. Spark’s versioning policy promises that Spark does not break
>>> stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are
>>> sometimes a necessary evil, and can be done in major releases (e.g. 1.6 to
>>> 2.0, 2.x to 3.0).
>>> >>>
>>> >>> 3. That said, a major version isn’t necessarily the playground for
>>> disruptive API changes to make it painful for users to update. The main
>>> purpose of a major release is an opportunity to fix things that are broken
>>> in the current API and remove certain deprecated APIs.
>>> >>>
>>> >>> 4. Spark as a project has a culture of evolving architecture and
>>> developing major new features incrementally, so major releases are not the
>>> only time for exciting new features. For example, the bulk of the work in
>>> the move towards the DataFrame API was done in Spark 1.3, and Continuous
>>> Processing was introduced in Spark 2.3. Both were feature releases rather
>>> than major releases.
>>> >>>
>>> >>>
>>> >>> You can find more background in the thread discussing Spark 2.0:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-
>>> proposal-for-Spark-2-0-td15122.html
>>> >>>
>>> >>>
>>> >>> The primary motivating factor IMO for a major version bump is to
>>> support Scala 2.12, which requires minor API breaking changes to Spark’s
>>> APIs. Similar to Spark 2.0, I think there are also opportunities for other
>>> changes that we know have been biting us for a long time but can’t be
>>> changed in feature releases (to be clear, I’m actually not sure they are
>>> all good ideas, but I’m writing them down as candidates for consideration):
>>> >>>
>>> >>> 1. Support Scala 2.12.
>>> >>>
>>> >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) deprecated
>>> in Spark 2.x.
>>> >>>
>>> >>> 3. Shade all dependencies.
>>> >>>
>>> >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL
>>> compliant, to prevent users from shooting themselves in the foot, e.g.
>>> “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it
>>> less painful for users to upgrade here, I’d suggest creating a flag for
>>> backward compatibility mode.
>>> >>>
>>> >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL more
>>> standard compliant, and have a flag for backward compatibility.
>>> >>>
>>> >>> 6. Miscellaneous other small changes documented in JIRA already
>>> (e.g. “JavaPairRDD flatMapValues requires function returning Iterable, not
>>> Iterator”, “Prevent column name duplication in temporary view”).
>>> >>>
>>> >>>
>>> >>> Now the reality of a major version bump is that the world often
>>> thinks in terms of what exciting features are coming. I do think there are
>>> a number of major changes happening already that can be part of the 3.0
>>> release, if they make it in:
>>> >>>
>>> >>> 1. Scala 2.12 support (listing it twice)
>>> >>> 2. Continuous Processing non-experimental
>>> >>> 3. Kubernetes support non-experimental
>>> >>> 4. A more flushed out version of data source API v2 (I don’t think
>>> it is realistic to stabilize that in one release)
>>> >>> 5. Hadoop 3.0 support
>>> >>> 6. ...
>>> >>>
>>> >>>
>>> >>>
>>> >>> Similar to the 2.0 discussion, this thread should focus on the
>>> framework and whether it’d make sense to create Spark 3.0 as the next
>>> release, rather than the individual feature requests. Those are important
>>> but are best done in their own separate threads.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>>
>>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783
> Greater Chicago
>



-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago

Re: time for Apache Spark 3.0?

Reply via email to