Re: [ANNOUNCE] Announcing Apache Spark 2.4.1
Thanks Andrew for reporting this. I just submitted the fix. https://github.com/apache/spark/pull/24304 On Fri, Apr 5, 2019 at 3:21 PM Andrew Melo wrote: > Hello, > > I'm not sure if this is the proper place to report it, but the 2.4.1 > version of the config docs apparently didn't render right into HTML > (scroll down to "Compression and Serialization") > > https://spark.apache.org/docs/2.4.1/configuration.html#available-properties > > By comparison, the 2.4.0 version of the docs renders correctly. > > Cheers > Andrew > > On Fri, Apr 5, 2019 at 7:59 AM DB Tsai wrote: > > > > +user list > > > > We are happy to announce the availability of Spark 2.4.1! > > > > Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4 > > maintenance branch of Spark. We strongly recommend all 2.4.0 users to > > upgrade to this stable release. > > > > In Apache Spark 2.4.1, Scala 2.12 support is GA, and it's no longer > > experimental. > > We will drop Scala 2.11 support in Spark 3.0, so please provide us > feedback. > > > > To download Spark 2.4.1, head over to the download page: > > http://spark.apache.org/downloads.html > > > > To view the release notes: > > https://spark.apache.org/releases/spark-release-2-4-1.html > > > > One more thing: to add a little color to this release, it's the > > largest RC ever (RC9)! > > We tried to incorporate many critical fixes at the last minute, and > > hope you all enjoy it. > > > > We would like to acknowledge all community members for contributing to > > this release. This release would not have been possible without you. > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
Re: [ANNOUNCE] Announcing Apache Spark 2.4.1
On Fri, Apr 5, 2019 at 9:41 AM Jungtaek Lim wrote: > > Thanks Andrew for reporting this. I just submitted the fix. > https://github.com/apache/spark/pull/24304 Thanks! > > On Fri, Apr 5, 2019 at 3:21 PM Andrew Melo wrote: >> >> Hello, >> >> I'm not sure if this is the proper place to report it, but the 2.4.1 >> version of the config docs apparently didn't render right into HTML >> (scroll down to "Compression and Serialization") >> >> https://spark.apache.org/docs/2.4.1/configuration.html#available-properties >> >> By comparison, the 2.4.0 version of the docs renders correctly. >> >> Cheers >> Andrew >> >> On Fri, Apr 5, 2019 at 7:59 AM DB Tsai wrote: >> > >> > +user list >> > >> > We are happy to announce the availability of Spark 2.4.1! >> > >> > Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4 >> > maintenance branch of Spark. We strongly recommend all 2.4.0 users to >> > upgrade to this stable release. >> > >> > In Apache Spark 2.4.1, Scala 2.12 support is GA, and it's no longer >> > experimental. >> > We will drop Scala 2.11 support in Spark 3.0, so please provide us >> > feedback. >> > >> > To download Spark 2.4.1, head over to the download page: >> > http://spark.apache.org/downloads.html >> > >> > To view the release notes: >> > https://spark.apache.org/releases/spark-release-2-4-1.html >> > >> > One more thing: to add a little color to this release, it's the >> > largest RC ever (RC9)! >> > We tried to incorporate many critical fixes at the last minute, and >> > hope you all enjoy it. >> > >> > We would like to acknowledge all community members for contributing to >> > this release. This release would not have been possible without you. >> > >> > - >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
Hi there! I'm trying to run Spark unit tests with the following profiles: And 'core' module fails with the following test failing with NoClassDefFoundError: In the meantime building a distribution works fine when running: Also, there are no problems with running tests using Hadoop 2.7 profile. Does this issue look familiar? Any help appreciated! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
Really sorry for the formatting. Here's the original message: Hi there! I'm trying to run Spark unit tests with the following profiles: ./build/mvn test -Pmesos "-Phadoop-3.1" -Pnetlib-lgpl -Psparkr -Phive -Phive-thriftserver And 'core' module fails with the following test failing with NoClassDefFoundError: HadoopDelegationTokenManagerSuite: - Correctly load default credential providers - disable hive credential provider - using deprecated configurations - verify no credentials are obtained *** RUN ABORTED *** java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:250) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:173) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) at org.apache.spark.deploy.security.HiveDelegationTokenProvider$$anonfun$obtainDelegationTokens$2.apply$mcV$sp(HiveDelegationTokenProvider.scala:114) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340) at org.apache.spark.deploy.security.HiveDelegationTokenProvider.obtainDelegationTokens(HiveDelegationTokenProvider.scala:113) at org.apache.spark.deploy.security.HadoopDelegationTokenManagerSuite$$anonfun$5.apply(HadoopDelegationTokenManagerSuite.scala:98) at org.apache.spark.deploy.security.HadoopDelegationTokenManagerSuite$$anonfun$5.apply(HadoopDelegationTokenManagerSuite.scala:90) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) In the meantime building a distribution works fine when running: ./dev/make-distribution.sh --tgz -Pmesos "-Phadoop-3.1" -Pnetlib-lgpl -Psparkr -Phive -Phive-thriftserver -DskipTests Also, there are no problems with running tests using Hadoop 2.7 profile. Does this issue look familiar? Any help appreciated! On Fri, Apr 5, 2019 at 10:53 AM akirillov wrote: > Hi there! I'm trying to run Spark unit tests with the following profiles: > > And 'core' module fails with the following test failing with > NoClassDefFoundError: > > In the meantime building a distribution works fine when running: > > Also, there are no problems with running tests using Hadoop 2.7 profile. > Does this issue look familiar? Any help appreciated! > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
The hadoop-3 profile doesn't really work yet, not even on master. That's being worked on still. On Fri, Apr 5, 2019 at 10:53 AM akirillov wrote: > > Hi there! I'm trying to run Spark unit tests with the following profiles: > > And 'core' module fails with the following test failing with > NoClassDefFoundError: > > In the meantime building a distribution works fine when running: > > Also, there are no problems with running tests using Hadoop 2.7 profile. > Does this issue look familiar? Any help appreciated! > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ the preferred way would be to use Hadoop-free builds and provide Hadoop dependencies in the classpath, is that correct? On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin wrote: > The hadoop-3 profile doesn't really work yet, not even on master. > That's being worked on still. > > On Fri, Apr 5, 2019 at 10:53 AM akirillov > wrote: > > > > Hi there! I'm trying to run Spark unit tests with the following profiles: > > > > And 'core' module fails with the following test failing with > > NoClassDefFoundError: > > > > In the meantime building a distribution works fine when running: > > > > Also, there are no problems with running tests using Hadoop 2.7 profile. > > Does this issue look familiar? Any help appreciated! > > > > > > > > -- > > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > > > - > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > -- > Marcelo >
Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
Yes, you can try it, though I doubt that will 100% work. Have a look at the "hadoop 3" JIRAs and PRs still in progress on master. On Fri, Apr 5, 2019 at 1:14 PM Anton Kirillov wrote: > > Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ > the preferred way would be to use Hadoop-free builds and provide Hadoop > dependencies in the classpath, is that correct? > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
You can always try. But Hadoop 3 is not yet supported by Spark. On Fri, Apr 5, 2019 at 11:13 AM Anton Kirillov wrote: > > Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ > the preferred way would be to use Hadoop-free builds and provide Hadoop > dependencies in the classpath, is that correct? > > On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin wrote: >> >> The hadoop-3 profile doesn't really work yet, not even on master. >> That's being worked on still. >> >> On Fri, Apr 5, 2019 at 10:53 AM akirillov >> wrote: >> > >> > Hi there! I'm trying to run Spark unit tests with the following profiles: >> > >> > And 'core' module fails with the following test failing with >> > NoClassDefFoundError: >> > >> > In the meantime building a distribution works fine when running: >> > >> > Also, there are no problems with running tests using Hadoop 2.7 profile. >> > Does this issue look familiar? Any help appreciated! >> > >> > >> > >> > -- >> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> > >> > - >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >> >> >> -- >> Marcelo -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf
Hadoop 3 isn't supported yet, not quite even in master. I think the profile there exists for testing at the moment. Others may know a way that it can work but don't think it would out of the box. On Fri, Apr 5, 2019 at 12:53 PM akirillov wrote: > > Hi there! I'm trying to run Spark unit tests with the following profiles: > > And 'core' module fails with the following test failing with > NoClassDefFoundError: > > In the meantime building a distribution works fine when running: > > Also, there are no problems with running tests using Hadoop 2.7 profile. > Does this issue look familiar? Any help appreciated! > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [DISCUSS] Spark Columnar Processing
I just filed SPARK-27396 as the SPIP for this proposal. Please use that JIRA for further discussions. Thanks for all of the feedback, Bobby On Wed, Apr 3, 2019 at 7:15 PM Bobby Evans wrote: > I am still working on the SPIP and should get it up in the next few days. > I have the basic text more or less ready, but I want to get a high-level > API concept ready too just to have something more concrete. I have not > really done much with contributing new features to spark so I am not sure > where a design document really fits in here because from > http://spark.apache.org/improvement-proposals.html and > http://spark.apache.org/contributing.html it does not mention a design > anywhere. I am happy to put one up, but I was hoping the API concept would > cover most of that. > > Thanks, > > Bobby > > On Tue, Apr 2, 2019 at 9:16 PM Renjie Liu wrote: > >> Hi, Bobby: >> Do you have design doc? I'm also interested in this topic and want to >> help contribute. >> >> On Tue, Apr 2, 2019 at 10:00 PM Bobby Evans wrote: >> >>> Thanks to everyone for the feedback. >>> >>> Overall the feedback has been really positive for exposing columnar as a >>> processing option to users. I'll write up a SPIP on the proposed changes >>> to support columnar processing (not necessarily implement it) and then ping >>> the list again for more feedback and discussion. >>> >>> Thanks again, >>> >>> Bobby >>> >>> On Mon, Apr 1, 2019 at 5:09 PM Reynold Xin wrote: >>> I just realized I didn't make it very clear my stance here ... here's another try: I think it's a no brainer to have a good columnar UDF interface. This would facilitate a lot of high performance applications, e.g. GPU-based accelerations for machine learning algorithms. On rewriting the entire internals of Spark SQL to leverage columnar processing, I don't see enough evidence to suggest that's a good idea yet. On Wed, Mar 27, 2019 at 8:10 AM, Bobby Evans wrote: > Kazuaki Ishizaki, > > Yes, ColumnarBatchScan does provide a framework for doing code > generation for the processing of columnar data. I have to admit that I > don't have a deep understanding of the code generation piece, so if I get > something wrong please correct me. From what I had seen only input > formats > currently inherent from ColumnarBatchScan, and from comments in the trait > > /** >* Generate [[ColumnVector]] expressions for our parent to consume > as rows. >* This is called once per [[ColumnarBatch]]. >*/ > > https://github.com/apache/spark/blob/956b52b1670985a67e49b938ac1499ae65c79f6e/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala#L42-L43 > > It appears that ColumnarBatchScan is really only intended to pull out > the data from the batch, and not to process that data in a columnar > fashion. The Loading stage that you mentioned. > > > The SIMDzation or GPUization capability depends on a compiler that > translates native code from the code generated by the whole-stage codegen. > To be able to support vectorized processing Hive stayed with pure java > and let the JVM detect and do the SIMDzation of the code. To make that > happen they created loops to go through each element in a column and > remove > all conditionals from the body of the loops. To the best of my knowledge > that would still require a separate code path like I am proposing to make > the different processing phases generate code that the JVM can compile > down > to SIMD instructions. The generated code is full of null checks for each > element which would prevent the operations we want. Also, the > intermediate > results are often stored in UnsafeRow instances. This is really fast for > row-based processing, but the complexity of how they work I believe would > prevent the JVM from being able to vectorize the processing. If you have > a > better way to take java code and vectorize it we should put it into > OpenJDK > instead of spark so everyone can benefit from it. > > Trying to compile directly from generated java code to something a GPU > can process is something we are tackling but we decided to go a different > route from what you proposed. From talking with several compiler experts > here at NVIDIA my understanding is that IBM in partnership with NVIDIA > attempted in the past to extend the JVM to run at least partially on GPUs, > but it was really difficult to get right, especially with how java does > memory management and memory layout. > > To avoid that complexity we decided to split the JITing up into two > separate pieces. I didn't mention any of this before because this > discussion was intended to just be around the memory layout support, and > not GPU processing. The fi