Re: [discuss] ending support for Java 7 in Spark 2.0

Al Pivonka Thu, 24 Mar 2016 09:41:34 -0700

Thank you for the context Jean...
I appreciate it...


On Thu, Mar 24, 2016 at 12:40 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Al,
>
> Spark 2.0 doesn't mean Spark 1.x will stop. Clearly, new features will go
> on Spark 2.0, but maintenance release can be performed on 1.x branch.
>
> Regards
> JB
>
> On 03/24/2016 05:38 PM, Al Pivonka wrote:
>
>> As an end user (developer) and Cluster Admin.
>> I would have to agree with Koet.
>>
>> To me the real question is timing,  current version is 1.6.1, the
>> question I have is how many more releases till 2.0 and what is the time
>> frame?
>>
>> If you give people six to twelve months to plan and make sure they know
>> (paste it all over the web site) most can plan ahead.
>>
>>
>> Just my two pennies
>>
>>
>>
>>
>>
>> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com
>> <mailto:so...@cloudera.com>> wrote:
>>
>>     (PS CDH5 runs fine with Java 8, but I understand your more general
>>     point.)
>>
>>     This is a familiar context indeed, but in that context, would a group
>>     not wanting to update to Java 8 want to manually put Spark 2.0 into
>>     the mix? That is, if this is a context where the cluster is
>>     purposefully some stable mix of components, would you be updating just
>>     one?
>>
>>     You make a good point about Scala being more library than
>>     infrastructure component. So it can be updated on a per-app basis. On
>>     the one hand it's harder to handle different Scala versions from the
>>     framework side, it's less hard on the deployment side.
>>
>>     On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com
>>     <mailto:ko...@tresata.com>> wrote:
>>      > i think the arguments are convincing, but it also makes me wonder
>>     if i live
>>      > in some kind of alternate universe... we deploy on customers
>>     clusters, where
>>      > the OS, python version, java version and hadoop distro are not
>>     chosen by us.
>>      > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we
>>     simply have
>>      > access to a single proxy machine and launch through yarn. asking
>>     them to
>>      > upgrade java is pretty much out of the question or a 6+ month
>>     ordeal. of the
>>      > 10 client clusters i can think of on the top of my head all of
>>     them are on
>>      > java 7, none are on java 8. so by doing this you would make spark 2
>>      > basically unusable for us (unless most of them have plans of
>>     upgrading in
>>      > near term to java 8, i will ask around and report back...).
>>      >
>>      > on a side note, its particularly interesting to me that spark 2
>>     chose to
>>      > continue support for scala 2.10, because even for us in our very
>>     constricted
>>      > client environments the scala version is something we can easily
>>     upgrade (we
>>      > just deploy a custom build of spark for the relevant scala
>>     version and
>>      > hadoop distro). and because scala is not a dependency of any
>>     hadoop distro
>>      > (so not on classpath, which i am very happy about) we can use
>>     whatever scala
>>      > version we like. also i found the upgrade path from scala 2.10 to
>>     2.11 to be
>>      > very easy, so i have a hard time understanding why anyone would
>>     stay on
>>      > scala 2.10. and finally with scala 2.12 around the corner you
>>     really dont
>>      > want to be supporting 3 versions. so clearly i am missing
>>     something here.
>>      >
>>      >
>>      >
>>      > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré
>>     <j...@nanthrax.net <mailto:j...@nanthrax.net>>
>>
>>      > wrote:
>>      >>
>>      >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end
>>     support of
>>      >> Java 7. It makes sense.
>>      >>
>>      >> Regards
>>      >> JB
>>      >>
>>      >>
>>      >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>>      >>>
>>      >>> About a year ago we decided to drop Java 6 support in Spark
>>     1.5. I am
>>      >>> wondering if we should also just drop Java 7 support in Spark
>>     2.0 (i.e.
>>      >>> Spark 2.0 would require Java 8 to run).
>>      >>>
>>      >>> Oracle ended public updates for JDK 7 in one year ago (Apr
>>     2015), and
>>      >>> removed public downloads for JDK 7 in July 2015. In the past I've
>>      >>> actually been against dropping Java 8, but today I ran into an
>>     issue
>>      >>> with the new Dataset API not working well with Java 8 lambdas,
>>     and that
>>      >>> changed my opinion on this.
>>      >>>
>>      >>> I've been thinking more about this issue today and also talked
>>     with a
>>      >>> lot people offline to gather feedback, and I actually think the
>>     pros
>>      >>> outweighs the cons, for the following reasons (in some rough
>>     order of
>>      >>> importance):
>>      >>>
>>      >>> 1. It is complicated to test how well Spark APIs work for Java
>>     lambdas
>>      >>> if we support Java 7. Jenkins machines need to have both Java 7
>>     and Java
>>      >>> 8 installed and we must run through a set of test suites in 7,
>>     and then
>>      >>> the lambda tests in Java 8. This complicates build
>>     environments/scripts,
>>      >>> and makes them less robust. Without good testing
>>     infrastructure, I have
>>      >>> no confidence in building good APIs for Java 8.
>>      >>>
>>      >>> 2. Dataset/DataFrame performance will be between 1x to 10x
>>     slower in
>>      >>> Java 7. The primary APIs we want users to use in Spark 2.x are
>>      >>> Dataset/DataFrame, and this impacts pretty much everything from
>>     machine
>>      >>> learning to structured streaming. We have made great progress
>>     in their
>>      >>> performance through extensive use of code generation. (In many
>>      >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
>>     compiler
>>      >>> than a MapReduce or query engine.) These optimizations don't
>>     work well
>>      >>> in Java 7 due to broken code cache flushing. This problem has
>>     been fixed
>>      >>> by Oracle in Java 8. In addition, Java 8 comes with better
>>     support for
>>      >>> Unsafe and SIMD.
>>      >>>
>>      >>> 3. Scala 2.12 will come out soon, and we will want to add
>>     support for
>>      >>> that. Scala 2.12 only works on Java 8. If we do support Java 7,
>>     we'd
>>      >>> have a fairly complicated compatibility matrix and testing
>>      >>> infrastructure.
>>      >>>
>>      >>> 4. There are libraries that I've looked into in the past that
>>     support
>>      >>> only Java 8. This is more common in high performance libraries
>>     such as
>>      >>> Aeron (a messaging library). Having to support Java 7 means we
>>     are not
>>      >>> able to use these. It is not that big of a deal right now, but
>> will
>>      >>> become increasingly more difficult as we optimize performance.
>>      >>>
>>      >>>
>>      >>> The downside of not supporting Java 7 is also obvious. Some
>>      >>> organizations are stuck with Java 7, and they wouldn't be able
>>     to use
>>      >>> Spark 2.0 without upgrading Java.
>>      >>>
>>      >>>
>>      >>
>>      >> --
>>      >> Jean-Baptiste Onofré
>>      >> jbono...@apache.org <mailto:jbono...@apache.org>
>>      >> http://blog.nanthrax.net
>>      >> Talend - http://www.talend.com
>>      >>
>>      >>
>>      >>
>>     ---------------------------------------------------------------------
>>      >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>     <mailto:dev-unsubscr...@spark.apache.org>
>>      >> For additional commands, e-mail: dev-h...@spark.apache.org
>>     <mailto:dev-h...@spark.apache.org>
>>      >>
>>      >
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>     <mailto:dev-unsubscr...@spark.apache.org>
>>     For additional commands, e-mail: dev-h...@spark.apache.org
>>     <mailto:dev-h...@spark.apache.org>
>>
>>
>>
>>
>> --
>> Those who say it can't be done, are usually interrupted by those doing it.
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
Those who say it can't be done, are usually interrupted by those doing it.

Re: [discuss] ending support for Java 7 in Spark 2.0

Reply via email to