Thank you for the context Jean... I appreciate it...
On Thu, Mar 24, 2016 at 12:40 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Al, > > Spark 2.0 doesn't mean Spark 1.x will stop. Clearly, new features will go > on Spark 2.0, but maintenance release can be performed on 1.x branch. > > Regards > JB > > On 03/24/2016 05:38 PM, Al Pivonka wrote: > >> As an end user (developer) and Cluster Admin. >> I would have to agree with Koet. >> >> To me the real question is timing, current version is 1.6.1, the >> question I have is how many more releases till 2.0 and what is the time >> frame? >> >> If you give people six to twelve months to plan and make sure they know >> (paste it all over the web site) most can plan ahead. >> >> >> Just my two pennies >> >> >> >> >> >> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com >> <mailto:so...@cloudera.com>> wrote: >> >> (PS CDH5 runs fine with Java 8, but I understand your more general >> point.) >> >> This is a familiar context indeed, but in that context, would a group >> not wanting to update to Java 8 want to manually put Spark 2.0 into >> the mix? That is, if this is a context where the cluster is >> purposefully some stable mix of components, would you be updating just >> one? >> >> You make a good point about Scala being more library than >> infrastructure component. So it can be updated on a per-app basis. On >> the one hand it's harder to handle different Scala versions from the >> framework side, it's less hard on the deployment side. >> >> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com >> <mailto:ko...@tresata.com>> wrote: >> > i think the arguments are convincing, but it also makes me wonder >> if i live >> > in some kind of alternate universe... we deploy on customers >> clusters, where >> > the OS, python version, java version and hadoop distro are not >> chosen by us. >> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we >> simply have >> > access to a single proxy machine and launch through yarn. asking >> them to >> > upgrade java is pretty much out of the question or a 6+ month >> ordeal. of the >> > 10 client clusters i can think of on the top of my head all of >> them are on >> > java 7, none are on java 8. so by doing this you would make spark 2 >> > basically unusable for us (unless most of them have plans of >> upgrading in >> > near term to java 8, i will ask around and report back...). >> > >> > on a side note, its particularly interesting to me that spark 2 >> chose to >> > continue support for scala 2.10, because even for us in our very >> constricted >> > client environments the scala version is something we can easily >> upgrade (we >> > just deploy a custom build of spark for the relevant scala >> version and >> > hadoop distro). and because scala is not a dependency of any >> hadoop distro >> > (so not on classpath, which i am very happy about) we can use >> whatever scala >> > version we like. also i found the upgrade path from scala 2.10 to >> 2.11 to be >> > very easy, so i have a hard time understanding why anyone would >> stay on >> > scala 2.10. and finally with scala 2.12 around the corner you >> really dont >> > want to be supporting 3 versions. so clearly i am missing >> something here. >> > >> > >> > >> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré >> <j...@nanthrax.net <mailto:j...@nanthrax.net>> >> >> > wrote: >> >> >> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end >> support of >> >> Java 7. It makes sense. >> >> >> >> Regards >> >> JB >> >> >> >> >> >> On 03/24/2016 08:27 AM, Reynold Xin wrote: >> >>> >> >>> About a year ago we decided to drop Java 6 support in Spark >> 1.5. I am >> >>> wondering if we should also just drop Java 7 support in Spark >> 2.0 (i.e. >> >>> Spark 2.0 would require Java 8 to run). >> >>> >> >>> Oracle ended public updates for JDK 7 in one year ago (Apr >> 2015), and >> >>> removed public downloads for JDK 7 in July 2015. In the past I've >> >>> actually been against dropping Java 8, but today I ran into an >> issue >> >>> with the new Dataset API not working well with Java 8 lambdas, >> and that >> >>> changed my opinion on this. >> >>> >> >>> I've been thinking more about this issue today and also talked >> with a >> >>> lot people offline to gather feedback, and I actually think the >> pros >> >>> outweighs the cons, for the following reasons (in some rough >> order of >> >>> importance): >> >>> >> >>> 1. It is complicated to test how well Spark APIs work for Java >> lambdas >> >>> if we support Java 7. Jenkins machines need to have both Java 7 >> and Java >> >>> 8 installed and we must run through a set of test suites in 7, >> and then >> >>> the lambda tests in Java 8. This complicates build >> environments/scripts, >> >>> and makes them less robust. Without good testing >> infrastructure, I have >> >>> no confidence in building good APIs for Java 8. >> >>> >> >>> 2. Dataset/DataFrame performance will be between 1x to 10x >> slower in >> >>> Java 7. The primary APIs we want users to use in Spark 2.x are >> >>> Dataset/DataFrame, and this impacts pretty much everything from >> machine >> >>> learning to structured streaming. We have made great progress >> in their >> >>> performance through extensive use of code generation. (In many >> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a >> compiler >> >>> than a MapReduce or query engine.) These optimizations don't >> work well >> >>> in Java 7 due to broken code cache flushing. This problem has >> been fixed >> >>> by Oracle in Java 8. In addition, Java 8 comes with better >> support for >> >>> Unsafe and SIMD. >> >>> >> >>> 3. Scala 2.12 will come out soon, and we will want to add >> support for >> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, >> we'd >> >>> have a fairly complicated compatibility matrix and testing >> >>> infrastructure. >> >>> >> >>> 4. There are libraries that I've looked into in the past that >> support >> >>> only Java 8. This is more common in high performance libraries >> such as >> >>> Aeron (a messaging library). Having to support Java 7 means we >> are not >> >>> able to use these. It is not that big of a deal right now, but >> will >> >>> become increasingly more difficult as we optimize performance. >> >>> >> >>> >> >>> The downside of not supporting Java 7 is also obvious. Some >> >>> organizations are stuck with Java 7, and they wouldn't be able >> to use >> >>> Spark 2.0 without upgrading Java. >> >>> >> >>> >> >> >> >> -- >> >> Jean-Baptiste Onofré >> >> jbono...@apache.org <mailto:jbono...@apache.org> >> >> http://blog.nanthrax.net >> >> Talend - http://www.talend.com >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> <mailto:dev-h...@spark.apache.org> >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> For additional commands, e-mail: dev-h...@spark.apache.org >> <mailto:dev-h...@spark.apache.org> >> >> >> >> >> -- >> Those who say it can't be done, are usually interrupted by those doing it. >> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- Those who say it can't be done, are usually interrupted by those doing it.