Re: Possible bug in ClientBase.scala?
Are you setting -Pyarn-alpha? ./sbt/sbt -Pyarn-alpha, followed by "projects", shows it as a module. You should only build yarn-stable *or* yarn-alpha at any given time. I don't remember the modules changing in a while. 'yarn-alpha' is for YARN before it stabilized, circa early Hadoop 2.0.x. 'yarn-stable' is for beta and stable YARN, circa late Hadoop 2.0.x and onwards. 'yarn' is code common to both, so should compile with yarn-alpha. What's the compile error, and are you setting yarn.version? the default is to use hadoop.version, but that defaults to 1.0.4 and there is no such YARN. Unless I missed it, I only see compile errors in yarn-stable, and you are trying to compile vs YARN alpha versions no? On Thu, Jul 17, 2014 at 5:39 AM, Chester Chen wrote: > Looking further, the yarn and yarn-stable are both for the stable version > of Yarn, that explains the compilation errors when using 2.0.5-alpha > version of hadoop. > > the module yarn-alpha ( although is still on SparkBuild.scala), is no > longer there in sbt console. > > >> projects > > [info] In file:/Users/chester/projects/spark/ > > [info]assembly > > [info]bagel > > [info]catalyst > > [info]core > > [info]examples > > [info]graphx > > [info]hive > > [info]mllib > > [info]oldDeps > > [info]repl > > [info]spark > > [info]sql > > [info]streaming > > [info]streaming-flume > > [info]streaming-kafka > > [info]streaming-mqtt > > [info]streaming-twitter > > [info]streaming-zeromq > > [info]tools > > [info]yarn > > [info] * yarn-stable > > > On Wed, Jul 16, 2014 at 5:41 PM, Chester Chen wrote: > >> Hmm >> looks like a Build script issue: >> >> I run the command with : >> >> sbt/sbt clean *yarn/*test:compile >> >> but errors came from >> >> [error] 40 errors found >> >> [error] (*yarn-stable*/compile:compile) Compilation failed >> >> >> Chester >> >> >> On Wed, Jul 16, 2014 at 5:18 PM, Chester Chen >> wrote: >> >>> Hi, Sandy >>> >>> We do have some issue with this. The difference is in Yarn-Alpha and >>> Yarn Stable ( I noticed that in the latest build, the module name has >>> changed, >>> yarn-alpha --> yarn >>> yarn --> yarn-stable >>> ) >>> >>> For example: MRJobConfig.class >>> the field: >>> "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH" >>> >>> >>> In Yarn-Alpha : the field returns java.lang.String[] >>> >>> java.lang.String[] DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH; >>> >>> while in Yarn-Stable, it returns a String >>> >>> java.lang.String DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH; >>> >>> So in ClientBaseSuite.scala >>> >>> The following code: >>> >>> val knownDefMRAppCP: Seq[String] = >>> getFieldValue[*String*, Seq[String]](classOf[MRJobConfig], >>> >>> "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH", >>> Seq[String]())(a => >>> *a.split(",")*) >>> >>> >>> works for yarn-stable, but doesn't work for yarn-alpha. >>> >>> This is the only failure for the SNAPSHOT I downloaded 2 weeks ago. I >>> believe this can be refactored to yarn-alpha module and make different >>> tests according different API signatures. >>> >>> I just update the master branch and build doesn't even compile for >>> Yarn-Alpha (yarn) model. Yarn-Stable compile with no error and test passed. >>> >>> >>> Does the Spark Jenkins job run against yarn-alpha ? >>> >>> >>> >>> >>> >>> Here is output from yarn-alpha compilation: >>> >>> I got the 40 compilation errors. >>> >>> sbt/sbt clean yarn/test:compile >>> >>> Using /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home as >>> default JAVA_HOME. >>> >>> Note, this will be overridden by -java-home if it is set. >>> >>> [info] Loading project definition from >>> /Users/chester/projects/spark/project/project >>> >>> [info] Loading project definition from >>> /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project >>> >>> [warn] Multiple resolvers having different access mechanism configured >>> with same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate >>> project resolvers (`resolvers`) or rename publishing resolver (`publishTo`). >>> >>> [info] Loading project definition from >>> /Users/chester/projects/spark/project >>> >>> NOTE: SPARK_HADOOP_VERSION is deprecated, please use >>> -Dhadoop.version=2.0.5-alpha >>> >>> NOTE: SPARK_YARN is deprecated, please use -Pyarn flag. >>> >>> [info] Set current project to spark-parent (in build >>> file:/Users/chester/projects/spark/) >>> >>> [success] Total time: 0 s, completed Jul 16, 2014 5:13:06 PM >>> >>> [info] Updating {file:/Users/chester/projects/spark/}core... >>> >>> [info] Resolving org.fusesource.jansi#jansi;1.4 ... >>> >>> [info] Done updating. >>> >>> [info] Updating {file:/Users/chester/projects/spark/}yarn... >>> >>> [info] Updating {file:/Users/chester/projects/spark/}yarn-stable... >>> >>> [info] Resolving org.fusesource.jansi#jansi;1.4 ... >>> >>> [info] Done updating. >>> >>> [info] Reso
Re: Possible bug in ClientBase.scala?
To add, we've made some effort to yarn-alpha to work with the 2.0.x line, but this was a time when YARN went through wild API changes. The only line that the yarn-alpha profile is guaranteed to work against is the 0.23 line. On Thu, Jul 17, 2014 at 12:40 AM, Sean Owen wrote: > Are you setting -Pyarn-alpha? ./sbt/sbt -Pyarn-alpha, followed by > "projects", shows it as a module. You should only build yarn-stable > *or* yarn-alpha at any given time. > > I don't remember the modules changing in a while. 'yarn-alpha' is for > YARN before it stabilized, circa early Hadoop 2.0.x. 'yarn-stable' is > for beta and stable YARN, circa late Hadoop 2.0.x and onwards. 'yarn' > is code common to both, so should compile with yarn-alpha. > > What's the compile error, and are you setting yarn.version? the > default is to use hadoop.version, but that defaults to 1.0.4 and there > is no such YARN. > > Unless I missed it, I only see compile errors in yarn-stable, and you > are trying to compile vs YARN alpha versions no? > > On Thu, Jul 17, 2014 at 5:39 AM, Chester Chen > wrote: > > Looking further, the yarn and yarn-stable are both for the stable version > > of Yarn, that explains the compilation errors when using 2.0.5-alpha > > version of hadoop. > > > > the module yarn-alpha ( although is still on SparkBuild.scala), is no > > longer there in sbt console. > > > > > >> projects > > > > [info] In file:/Users/chester/projects/spark/ > > > > [info]assembly > > > > [info]bagel > > > > [info]catalyst > > > > [info]core > > > > [info]examples > > > > [info]graphx > > > > [info]hive > > > > [info]mllib > > > > [info]oldDeps > > > > [info]repl > > > > [info]spark > > > > [info]sql > > > > [info]streaming > > > > [info]streaming-flume > > > > [info]streaming-kafka > > > > [info]streaming-mqtt > > > > [info]streaming-twitter > > > > [info]streaming-zeromq > > > > [info]tools > > > > [info]yarn > > > > [info] * yarn-stable > > > > > > On Wed, Jul 16, 2014 at 5:41 PM, Chester Chen > wrote: > > > >> Hmm > >> looks like a Build script issue: > >> > >> I run the command with : > >> > >> sbt/sbt clean *yarn/*test:compile > >> > >> but errors came from > >> > >> [error] 40 errors found > >> > >> [error] (*yarn-stable*/compile:compile) Compilation failed > >> > >> > >> Chester > >> > >> > >> On Wed, Jul 16, 2014 at 5:18 PM, Chester Chen > >> wrote: > >> > >>> Hi, Sandy > >>> > >>> We do have some issue with this. The difference is in Yarn-Alpha > and > >>> Yarn Stable ( I noticed that in the latest build, the module name has > >>> changed, > >>> yarn-alpha --> yarn > >>> yarn --> yarn-stable > >>> ) > >>> > >>> For example: MRJobConfig.class > >>> the field: > >>> "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH" > >>> > >>> > >>> In Yarn-Alpha : the field returns java.lang.String[] > >>> > >>> java.lang.String[] DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH; > >>> > >>> while in Yarn-Stable, it returns a String > >>> > >>> java.lang.String DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH; > >>> > >>> So in ClientBaseSuite.scala > >>> > >>> The following code: > >>> > >>> val knownDefMRAppCP: Seq[String] = > >>> getFieldValue[*String*, Seq[String]](classOf[MRJobConfig], > >>> > >>> "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH", > >>> Seq[String]())(a => > >>> *a.split(",")*) > >>> > >>> > >>> works for yarn-stable, but doesn't work for yarn-alpha. > >>> > >>> This is the only failure for the SNAPSHOT I downloaded 2 weeks ago. I > >>> believe this can be refactored to yarn-alpha module and make different > >>> tests according different API signatures. > >>> > >>> I just update the master branch and build doesn't even compile for > >>> Yarn-Alpha (yarn) model. Yarn-Stable compile with no error and test > passed. > >>> > >>> > >>> Does the Spark Jenkins job run against yarn-alpha ? > >>> > >>> > >>> > >>> > >>> > >>> Here is output from yarn-alpha compilation: > >>> > >>> I got the 40 compilation errors. > >>> > >>> sbt/sbt clean yarn/test:compile > >>> > >>> Using /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home > as > >>> default JAVA_HOME. > >>> > >>> Note, this will be overridden by -java-home if it is set. > >>> > >>> [info] Loading project definition from > >>> /Users/chester/projects/spark/project/project > >>> > >>> [info] Loading project definition from > >>> > /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project > >>> > >>> [warn] Multiple resolvers having different access mechanism configured > >>> with same name 'sbt-plugin-releases'. To avoid conflict, Remove > duplicate > >>> project resolvers (`resolvers`) or rename publishing resolver > (`publishTo`). > >>> > >>> [info] Loading project definition from > >>> /Users/chester/projects/spark/project > >>> > >>> NOTE: SPARK_HADOOP_VERSION is deprecated, please use > >>> -Dhadoop.version=2.0.5-alpha > >>> > >>> NOT
[VOTE] Release Apache Spark 0.9.2 (RC1)
Please vote on releasing the following candidate as Apache Spark version 0.9.2! The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~meng/spark-0.9.2-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/meng.asc The staging repository for this release can be found at: https://repository.apache.org/service/local/repositories/orgapachespark-1023/content/ The documentation corresponding to this release can be found at: http://people.apache.org/~meng/spark-0.9.2-rc1-docs/ Please vote on releasing this package as Apache Spark 0.9.2! The vote is open until Sunday, July 20, at 11:10 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 0.9.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ === About this release === This release fixes a few high-priority bugs in 0.9.1 and has a variety of smaller fixes. The full list is here: http://s.apache.org/d0t. Some of the more visible patches are: SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys SPARK-1676: HDFS FileSystems continually pile up in the FS cache SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo SPARK-1870: Secondary jars are not added to executor classpath for YARN This is the second maintenance release on the 0.9 line. We plan to make additional maintenance releases as new fixes come in. Best, Xiangrui
Re: Possible bug in ClientBase.scala?
@Sean and @Sandy Thanks for the reply. I used to be able to see yarn-alpha and yarn directories which corresponding to the modules. I guess due to the recent SparkBuild.scala changes, I did not see yarn-alpha (by default) and I thought yarn-alpha is renamed to "yarn" and "yarn-stable" is the old yarn. So I compiled "yarn" against the hadoop.version = 2.0.5-alpha. My mistake. I tried export SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt -Pyarn-alpha yarn-alpha/test the compilation errors are all gone. sbt/sbt -Pyarn-alpha projects does show the yarn-alpha project, I did not realize this is dynamically enabled based on yarn flag. Thanks Sean for pointing that out. To Sandy's point, I am not trying to use alpha version of Yarn. I am experimenting some changes in Yarn Client and refactoring code and just want to make sure I am passing tests for both yarn-alpha and yarn-stable. The yarn-alpha tests actually failing due to the yarn API changes in MRJobConfig class. as I mentioned in earlier email The field DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH returns String in yarn-stable, but returns String array in yarn-alpha API. So the method in ClientBaseSuite.scala val knownDefMRAppCP: Seq[String] = getFieldValue[String, Seq[String]](classOf[MRJobConfig], "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH", Seq[String]())(a => a.split(",")) will fail for yarn-alpha. sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha yarn-alpha/test ... 4/07/17 07:07:16 INFO ClientBase: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties [info] - default Yarn application classpath *** FAILED *** [info] java.lang.ClassCastException: [Ljava.lang.String; cannot be cast to java.lang.String [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$Fixtures$$anonfun$12.apply(ClientBaseSuite.scala:152) [info] at scala.Option.map(Option.scala:145) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.getFieldValue(ClientBaseSuite.scala:180) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$Fixtures$.(ClientBaseSuite.scala:152) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.Fixtures$lzycompute(ClientBaseSuite.scala:141) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.Fixtures(ClientBaseSuite.scala:141) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$1.apply$mcV$sp(ClientBaseSuite.scala:47) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$1.apply(ClientBaseSuite.scala:47) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$1.apply(ClientBaseSuite.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) [info] ... [info] - default MR application classpath *** FAILED *** [info] java.lang.ClassCastException: [Ljava.lang.String; cannot be cast to java.lang.String [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$Fixtures$$anonfun$12.apply(ClientBaseSuite.scala:152) [info] at scala.Option.map(Option.scala:145) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.getFieldValue(ClientBaseSuite.scala:180) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$Fixtures$.(ClientBaseSuite.scala:152) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.Fixtures$lzycompute(ClientBaseSuite.scala:141) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.Fixtures(ClientBaseSuite.scala:141) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$2.apply$mcV$sp(ClientBaseSuite.scala:51) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$2.apply(ClientBaseSuite.scala:51) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$2.apply(ClientBaseSuite.scala:51) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) [info] ... [info] - resultant classpath for an application that defines a classpath for YARN *** FAILED *** [info] java.lang.ClassCastException: [Ljava.lang.String; cannot be cast to java.lang.String [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$Fixtures$$anonfun$12.apply(ClientBaseSuite.scala:152) [info] at scala.Option.map(Option.scala:145) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.getFieldValue(ClientBaseSuite.scala:180) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$Fixtures$.(ClientBaseSuite.scala:152) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.Fixtures$lzycompute(ClientBaseSuite.scala:141) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite.Fixtures(ClientBaseSuite.scala:141) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$3.apply$mcV$sp(ClientBaseSuite.scala:55) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$3.apply(ClientBaseSuite.scala:55) [info] at org.apache.spark.deploy.yarn.ClientBaseSuite$$anonfun$3.apply(ClientBaseSuite.scala:55) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
Compile error when compiling for cloudera
I'm trying to compile the latest code, with the hadoop-version set for 2.0.0-mr1-cdh4.6.0. I'm getting the following error, which I don't get when I don't set the hadoop version: [error] /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: overloaded method constructor NioServerSocketChannelFactory with alternatives: [error] (x$1: java.util.concurrent.Executor,x$2: java.util.concurrent.Executor,x$3: Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory [error] (x$1: java.util.concurrent.Executor,x$2: java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory [error] cannot be applied to () [error] val channelFactory = new NioServerSocketChannelFactory [error]^ [error] one error found I don't know flume from a hole in the wall - does anyone know what I can do to fix this? Thanks, -Nathan -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com
Re: Compile error when compiling for cloudera
This looks like a Jetty version problem actually. Are you bringing in something that might be changing the version of Jetty used by Spark? It depends a lot on how you are building things. Good to specify exactly how your'e building here. On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld wrote: > I'm trying to compile the latest code, with the hadoop-version set for > 2.0.0-mr1-cdh4.6.0. > > I'm getting the following error, which I don't get when I don't set the > hadoop version: > > [error] > /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: > overloaded method constructor NioServerSocketChannelFactory with > alternatives: > [error] (x$1: java.util.concurrent.Executor,x$2: > java.util.concurrent.Executor,x$3: > Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory > [error] (x$1: java.util.concurrent.Executor,x$2: > java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory > [error] cannot be applied to () > [error] val channelFactory = new NioServerSocketChannelFactory > [error]^ > [error] one error found > > > I don't know flume from a hole in the wall - does anyone know what I can do > to fix this? > > > Thanks, > -Nathan > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com
Re: Possible bug in ClientBase.scala?
Looks like a real problem. I see it too. I think the same workaround found in ClientBase.scala needs to be used here. There, the fact that this field can be a String or String[] is handled explicitly. In fact I think you can just call to ClientBase for this? PR it, I say. On Thu, Jul 17, 2014 at 3:24 PM, Chester Chen wrote: > val knownDefMRAppCP: Seq[String] = > getFieldValue[String, Seq[String]](classOf[MRJobConfig], > > "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH", > Seq[String]())(a => a.split(",")) > > will fail for yarn-alpha. > > sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha yarn-alpha/test >
Re: Compile error when compiling for cloudera
My full build command is: ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly I've changed one line in RDD.scala, nothing else. On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen wrote: > This looks like a Jetty version problem actually. Are you bringing in > something that might be changing the version of Jetty used by Spark? > It depends a lot on how you are building things. > > Good to specify exactly how your'e building here. > > On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld > wrote: > > I'm trying to compile the latest code, with the hadoop-version set for > > 2.0.0-mr1-cdh4.6.0. > > > > I'm getting the following error, which I don't get when I don't set the > > hadoop version: > > > > [error] > > > /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: > > overloaded method constructor NioServerSocketChannelFactory with > > alternatives: > > [error] (x$1: java.util.concurrent.Executor,x$2: > > java.util.concurrent.Executor,x$3: > > Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory > > > [error] (x$1: java.util.concurrent.Executor,x$2: > > > java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory > > [error] cannot be applied to () > > [error] val channelFactory = new NioServerSocketChannelFactory > > [error]^ > > [error] one error found > > > > > > I don't know flume from a hole in the wall - does anyone know what I can > do > > to fix this? > > > > > > Thanks, > > -Nathan > > > > > > -- > > Nathan Kronenfeld > > Senior Visualization Developer > > Oculus Info Inc > > 2 Berkeley Street, Suite 600, > > Toronto, Ontario M5A 4J5 > > Phone: +1-416-203-3003 x 238 > > Email: nkronenf...@oculusinfo.com > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com
Re: Compile error when compiling for cloudera
er, that line being in toDebugString, where it really shouldn't affect anything (no signature changes or the like) On Thu, Jul 17, 2014 at 10:58 AM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > My full build command is: > ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly > > > I've changed one line in RDD.scala, nothing else. > > > > On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen wrote: > >> This looks like a Jetty version problem actually. Are you bringing in >> something that might be changing the version of Jetty used by Spark? >> It depends a lot on how you are building things. >> >> Good to specify exactly how your'e building here. >> >> On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld >> wrote: >> > I'm trying to compile the latest code, with the hadoop-version set for >> > 2.0.0-mr1-cdh4.6.0. >> > >> > I'm getting the following error, which I don't get when I don't set the >> > hadoop version: >> > >> > [error] >> > >> /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: >> > overloaded method constructor NioServerSocketChannelFactory with >> > alternatives: >> > [error] (x$1: java.util.concurrent.Executor,x$2: >> > java.util.concurrent.Executor,x$3: >> > Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory >> >> > [error] (x$1: java.util.concurrent.Executor,x$2: >> > >> java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory >> > [error] cannot be applied to () >> > [error] val channelFactory = new NioServerSocketChannelFactory >> > [error]^ >> > [error] one error found >> > >> > >> > I don't know flume from a hole in the wall - does anyone know what I >> can do >> > to fix this? >> > >> > >> > Thanks, >> > -Nathan >> > >> > >> > -- >> > Nathan Kronenfeld >> > Senior Visualization Developer >> > Oculus Info Inc >> > 2 Berkeley Street, Suite 600, >> > Toronto, Ontario M5A 4J5 >> > Phone: +1-416-203-3003 x 238 >> > Email: nkronenf...@oculusinfo.com >> > > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com
Re: Does RDD checkpointing store the entire state in HDFS?
Thank you, TD ! Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Wed, Jul 16, 2014 at 6:53 PM, Tathagata Das wrote: > After every checkpointing interval, the latest state RDD is stored to HDFS > in its entirety. Along with that, the series of DStream transformations > that was setup with the streaming context is also stored into HDFS (the > whole DAG of DStream objects is serialized and saved). > > TD > > > On Wed, Jul 16, 2014 at 5:38 PM, Yan Fang wrote: > > > Hi guys, > > > > am wondering how the RDD checkpointing > > < > https://spark.apache.org/docs/latest/streaming-programming-guide.html#RDD > > Checkpointing> works in Spark Streaming. When I use updateStateByKey, > does > > the Spark store the entire state (at one time point) into the HDFS or > only > > put the transformation into the HDFS? Thank you. > > > > Best, > > > > Fang, Yan > > yanfang...@gmail.com > > +1 (206) 849-4108 > > >
Re: Compile error when compiling for cloudera
CC tmalaska since he touched the line in question. This is a fun one. So, here's the line of code added last week: val channelFactory = new NioServerSocketChannelFactory (Executors.newCachedThreadPool(), Executors.newCachedThreadPool()); Scala parses this as two statements, one invoking a no-arg constructor and one making a tuple for fun. Put it on one line and it's fine. It works with newer Netty since there is a no-arg constructor. It fails with older Netty, which is what you get with older Hadoop. The fix is obvious. I'm away and if nobody beats me to a PR in the meantime, I'll propose one as an addendum to the recent JIRA. Sean * On Thu, Jul 17, 2014 at 3:58 PM, Nathan Kronenfeld wrote: > My full build command is: > ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly > > > I've changed one line in RDD.scala, nothing else. > > > > On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen wrote: > >> This looks like a Jetty version problem actually. Are you bringing in >> something that might be changing the version of Jetty used by Spark? >> It depends a lot on how you are building things. >> >> Good to specify exactly how your'e building here. >> >> On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld >> wrote: >> > I'm trying to compile the latest code, with the hadoop-version set for >> > 2.0.0-mr1-cdh4.6.0. >> > >> > I'm getting the following error, which I don't get when I don't set the >> > hadoop version: >> > >> > [error] >> > >> /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: >> > overloaded method constructor NioServerSocketChannelFactory with >> > alternatives: >> > [error] (x$1: java.util.concurrent.Executor,x$2: >> > java.util.concurrent.Executor,x$3: >> > Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory >> >> > [error] (x$1: java.util.concurrent.Executor,x$2: >> > >> java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory >> > [error] cannot be applied to () >> > [error] val channelFactory = new NioServerSocketChannelFactory >> > [error]^ >> > [error] one error found >> > >> > >> > I don't know flume from a hole in the wall - does anyone know what I can >> do >> > to fix this? >> > >> > >> > Thanks, >> > -Nathan >> > >> > >> > -- >> > Nathan Kronenfeld >> > Senior Visualization Developer >> > Oculus Info Inc >> > 2 Berkeley Street, Suite 600, >> > Toronto, Ontario M5A 4J5 >> > Phone: +1-416-203-3003 x 238 >> > Email: nkronenf...@oculusinfo.com >> > > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com
Re: Compile error when compiling for cloudera
Don't make this change yet. I have a 1642 that needs to get through around the same code. I can make this change after 1642 is through. On Thu, Jul 17, 2014 at 12:25 PM, Sean Owen wrote: > CC tmalaska since he touched the line in question. This is a fun one. > So, here's the line of code added last week: > > val channelFactory = new NioServerSocketChannelFactory > (Executors.newCachedThreadPool(), Executors.newCachedThreadPool()); > > Scala parses this as two statements, one invoking a no-arg constructor > and one making a tuple for fun. Put it on one line and it's fine. > > It works with newer Netty since there is a no-arg constructor. It > fails with older Netty, which is what you get with older Hadoop. > > The fix is obvious. I'm away and if nobody beats me to a PR in the > meantime, I'll propose one as an addendum to the recent JIRA. > > Sean > > * > > On Thu, Jul 17, 2014 at 3:58 PM, Nathan Kronenfeld > wrote: > > My full build command is: > > ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly > > > > > > I've changed one line in RDD.scala, nothing else. > > > > > > > > On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen wrote: > > > >> This looks like a Jetty version problem actually. Are you bringing in > >> something that might be changing the version of Jetty used by Spark? > >> It depends a lot on how you are building things. > >> > >> Good to specify exactly how your'e building here. > >> > >> On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld > >> wrote: > >> > I'm trying to compile the latest code, with the hadoop-version set for > >> > 2.0.0-mr1-cdh4.6.0. > >> > > >> > I'm getting the following error, which I don't get when I don't set > the > >> > hadoop version: > >> > > >> > [error] > >> > > >> > /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: > >> > overloaded method constructor NioServerSocketChannelFactory with > >> > alternatives: > >> > [error] (x$1: java.util.concurrent.Executor,x$2: > >> > java.util.concurrent.Executor,x$3: > >> > Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory > >> > >> > [error] (x$1: java.util.concurrent.Executor,x$2: > >> > > >> > java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory > >> > [error] cannot be applied to () > >> > [error] val channelFactory = new NioServerSocketChannelFactory > >> > [error]^ > >> > [error] one error found > >> > > >> > > >> > I don't know flume from a hole in the wall - does anyone know what I > can > >> do > >> > to fix this? > >> > > >> > > >> > Thanks, > >> > -Nathan > >> > > >> > > >> > -- > >> > Nathan Kronenfeld > >> > Senior Visualization Developer > >> > Oculus Info Inc > >> > 2 Berkeley Street, Suite 600, > >> > Toronto, Ontario M5A 4J5 > >> > Phone: +1-416-203-3003 x 238 > >> > Email: nkronenf...@oculusinfo.com > >> > > > > > > > > -- > > Nathan Kronenfeld > > Senior Visualization Developer > > Oculus Info Inc > > 2 Berkeley Street, Suite 600, > > Toronto, Ontario M5A 4J5 > > Phone: +1-416-203-3003 x 238 > > Email: nkronenf...@oculusinfo.com >
Re: Possible bug in ClientBase.scala?
OK I will create PR. thanks On Thu, Jul 17, 2014 at 7:58 AM, Sean Owen wrote: > Looks like a real problem. I see it too. I think the same workaround > found in ClientBase.scala needs to be used here. There, the fact that > this field can be a String or String[] is handled explicitly. In fact > I think you can just call to ClientBase for this? PR it, I say. > > On Thu, Jul 17, 2014 at 3:24 PM, Chester Chen > wrote: > > val knownDefMRAppCP: Seq[String] = > > getFieldValue[String, Seq[String]](classOf[MRJobConfig], > > > > "DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH", > > Seq[String]())(a => > a.split(",")) > > > > will fail for yarn-alpha. > > > > sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha yarn-alpha/test > > >
Re: Compile error when compiling for cloudera
Should be an easy rebase for your PR, so I went ahead just to get this fixed up: https://github.com/apache/spark/pull/1466 On Thu, Jul 17, 2014 at 5:32 PM, Ted Malaska wrote: > Don't make this change yet. I have a 1642 that needs to get through around > the same code. > > I can make this change after 1642 is through. > > > On Thu, Jul 17, 2014 at 12:25 PM, Sean Owen wrote: >> >> CC tmalaska since he touched the line in question. This is a fun one. >> So, here's the line of code added last week: >> >> val channelFactory = new NioServerSocketChannelFactory >> (Executors.newCachedThreadPool(), Executors.newCachedThreadPool()); >> >> Scala parses this as two statements, one invoking a no-arg constructor >> and one making a tuple for fun. Put it on one line and it's fine. >> >> It works with newer Netty since there is a no-arg constructor. It >> fails with older Netty, which is what you get with older Hadoop. >> >> The fix is obvious. I'm away and if nobody beats me to a PR in the >> meantime, I'll propose one as an addendum to the recent JIRA. >> >> Sean >> >> * >> >> On Thu, Jul 17, 2014 at 3:58 PM, Nathan Kronenfeld >> wrote: >> > My full build command is: >> > ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly >> > >> > >> > I've changed one line in RDD.scala, nothing else. >> > >> > >> > >> > On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen wrote: >> > >> >> This looks like a Jetty version problem actually. Are you bringing in >> >> something that might be changing the version of Jetty used by Spark? >> >> It depends a lot on how you are building things. >> >> >> >> Good to specify exactly how your'e building here. >> >> >> >> On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld >> >> wrote: >> >> > I'm trying to compile the latest code, with the hadoop-version set >> >> > for >> >> > 2.0.0-mr1-cdh4.6.0. >> >> > >> >> > I'm getting the following error, which I don't get when I don't set >> >> > the >> >> > hadoop version: >> >> > >> >> > [error] >> >> > >> >> >> >> /data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:156: >> >> > overloaded method constructor NioServerSocketChannelFactory with >> >> > alternatives: >> >> > [error] (x$1: java.util.concurrent.Executor,x$2: >> >> > java.util.concurrent.Executor,x$3: >> >> > Int)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory >> >> >> >> > [error] (x$1: java.util.concurrent.Executor,x$2: >> >> > >> >> >> >> java.util.concurrent.Executor)org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory >> >> > [error] cannot be applied to () >> >> > [error] val channelFactory = new NioServerSocketChannelFactory >> >> > [error]^ >> >> > [error] one error found >> >> > >> >> > >> >> > I don't know flume from a hole in the wall - does anyone know what I >> >> > can >> >> do >> >> > to fix this? >> >> > >> >> > >> >> > Thanks, >> >> > -Nathan >> >> > >> >> > >> >> > -- >> >> > Nathan Kronenfeld >> >> > Senior Visualization Developer >> >> > Oculus Info Inc >> >> > 2 Berkeley Street, Suite 600, >> >> > Toronto, Ontario M5A 4J5 >> >> > Phone: +1-416-203-3003 x 238 >> >> > Email: nkronenf...@oculusinfo.com >> >> >> > >> > >> > >> > -- >> > Nathan Kronenfeld >> > Senior Visualization Developer >> > Oculus Info Inc >> > 2 Berkeley Street, Suite 600, >> > Toronto, Ontario M5A 4J5 >> > Phone: +1-416-203-3003 x 238 >> > Email: nkronenf...@oculusinfo.com > >
Re: small (yet major) change going in: broadcasting RDD to reduce task size
On Thu, Jul 17, 2014 at 1:23 AM, Stephen Haberman < stephen.haber...@gmail.com> wrote: > I'd be ecstatic if more major changes were this well/succinctly > explained > Ditto on that. The summary of user impact was very nice. It would be good to repeat that on the user list or release notes when this change goes out. Nick
Re: [VOTE] Release Apache Spark 0.9.2 (RC1)
I start the voting with a +1. Ran tests on the release candidates and some basic operations in spark-shell and pyspark (local and standalone). -Xiangrui On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng wrote: > Please vote on releasing the following candidate as Apache Spark version > 0.9.2! > > The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba): > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b > > The release files, including signatures, digests, etc. can be found at: > http://people.apache.org/~meng/spark-0.9.2-rc1/ > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/meng.asc > > The staging repository for this release can be found at: > https://repository.apache.org/service/local/repositories/orgapachespark-1023/content/ > > The documentation corresponding to this release can be found at: > http://people.apache.org/~meng/spark-0.9.2-rc1-docs/ > > Please vote on releasing this package as Apache Spark 0.9.2! > > The vote is open until Sunday, July 20, at 11:10 UTC and passes if > a majority of at least 3 +1 PMC votes are cast. > > [ ] +1 Release this package as Apache Spark 0.9.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see > http://spark.apache.org/ > > === About this release === > This release fixes a few high-priority bugs in 0.9.1 and has a variety > of smaller fixes. The full list is here: http://s.apache.org/d0t. Some > of the more visible patches are: > > SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size > SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys > SPARK-1676: HDFS FileSystems continually pile up in the FS cache > SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo > SPARK-1870: Secondary jars are not added to executor classpath for YARN > > This is the second maintenance release on the 0.9 line. We plan to make > additional maintenance releases as new fixes come in. > > Best, > Xiangrui
InputSplit and RecordReader control on HadoopRDD
Hello, I am currently trying to extend some custom InputSplit and RecordReader classes to provide to SparkContext's hadoopRDD() function. My question is the following: Does the value returned by InpuSplit.getLenght() and/or RecordReader.getProgress() affect the execution of a map() function in the Spark runtime? I am asking because I have used these two custom classes on Hadoop and they do not cause any problems. However, in Spark, I see that new InputSplit objects are generated during runtime. To be more precise: In the beginning, I see in my log file that an InputSplit object is generated and the RecordReader object associated to it is fetching records. At some point, the job that is handling the previous InputSplit stops, and a new one is spawned with a new InputSplit. I do not understand why this is happening? Any help? Thank you, Nick P.S.-1 : I am sorry for posting my question on the Developer Mailing List, but I could not find anything similar in the User's list. Also, I really need to understand the runtime of Spark and I believe that in the developer's list my question will be read by contributors of Spark. P.S.-2: I can provide more technical details if they are needed.
Current way to include hive in a build
Having looked at trunk make-distribution.sh the --with-hive and --with-yarn are now deprecated. Here is the way I have built it: Added to pom.xml: cdh5 false 2.3.0-cdh5.0.0 2.3.0-cdh5.0.0 0.96.1.1-cdh5.0.0 3.4.5-cdh5.0.0 *mvn -Pyarn -Pcdh5 -Phive -Dhadoop.version=2.3.0-cdh5.0.1 -Dyarn.version=2.3.0-cdh5.0.0 -DskipTests clean package* [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [3.165s] [INFO] Spark Project Core SUCCESS [2:39.504s] [INFO] Spark Project Bagel ... SUCCESS [7.596s] [INFO] Spark Project GraphX .. SUCCESS [22.027s] [INFO] Spark Project ML Library .. SUCCESS [36.284s] [INFO] Spark Project Streaming ... SUCCESS [24.309s] [INFO] Spark Project Tools ... SUCCESS [3.147s] [INFO] Spark Project Catalyst SUCCESS [20.148s] [INFO] Spark Project SQL . SUCCESS [18.560s] *[INFO] Spark Project Hive FAILURE [33.962s]* [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies (copy-dependencies) on project spark-hive_2.10: Execution copy-dependencies of goal org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies failed: Plugin org.apache.maven.plugins:maven-dependency-plugin:2.4 or one of its dependencies could not be resolved: Could not find artifact commons-logging:commons-logging:jar:1.0.4 -> [Help 1] Anyone who is presently building with -Phive and has a suggestion for this?
Re: Contributing to MLlib: Proposal for Clustering Algorithms
Hi all, Cool discussion! I agree that a more standardized API for clustering, and easy access to underlying routines, would be useful (we've also been discussing this when trying to develop streaming clustering algorithms, similar to https://github.com/apache/spark/pull/1361) For divisive, hierarchical clustering I implemented something awhile back, here's a gist. https://gist.github.com/freeman-lab/5947e7c53b368fe90371 It does bisecting k-means clustering (with k=2), with a recursive class for keeping track of the tree. I also found this much better than agglomerative methods (for the reasons Hector points out). This needs to be cleaned up, and can surely be optimized (esp. by replacing the core KMeans step with existing MLLib code), but I can say I was running it successfully on quite large data sets. RJ, depending on where you are in your progress, I'd be happy to help work on this piece and / or have you use this as a jumping off point, if useful. -- Jeremy -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-Proposal-for-Clustering-Algorithms-tp7212p7398.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: [VOTE] Release Apache Spark 0.9.2 (RC1)
+1 Tested on Mac, verified CHANGES.txt is good, verified several of the bug fixes. Matei On Jul 17, 2014, at 11:12 AM, Xiangrui Meng wrote: > I start the voting with a +1. > > Ran tests on the release candidates and some basic operations in > spark-shell and pyspark (local and standalone). > > -Xiangrui > > On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng wrote: >> Please vote on releasing the following candidate as Apache Spark version >> 0.9.2! >> >> The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba): >> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b >> >> The release files, including signatures, digests, etc. can be found at: >> http://people.apache.org/~meng/spark-0.9.2-rc1/ >> >> Release artifacts are signed with the following key: >> https://people.apache.org/keys/committer/meng.asc >> >> The staging repository for this release can be found at: >> https://repository.apache.org/service/local/repositories/orgapachespark-1023/content/ >> >> The documentation corresponding to this release can be found at: >> http://people.apache.org/~meng/spark-0.9.2-rc1-docs/ >> >> Please vote on releasing this package as Apache Spark 0.9.2! >> >> The vote is open until Sunday, July 20, at 11:10 UTC and passes if >> a majority of at least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache Spark 0.9.2 >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache Spark, please see >> http://spark.apache.org/ >> >> === About this release === >> This release fixes a few high-priority bugs in 0.9.1 and has a variety >> of smaller fixes. The full list is here: http://s.apache.org/d0t. Some >> of the more visible patches are: >> >> SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size >> SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys >> SPARK-1676: HDFS FileSystems continually pile up in the FS cache >> SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo >> SPARK-1870: Secondary jars are not added to executor classpath for YARN >> >> This is the second maintenance release on the 0.9 line. We plan to make >> additional maintenance releases as new fixes come in. >> >> Best, >> Xiangrui
Re: [VOTE] Release Apache Spark 0.9.2 (RC1)
+1 Tested with my Ubuntu Linux. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Thu, Jul 17, 2014 at 6:36 PM, Matei Zaharia wrote: > +1 > > Tested on Mac, verified CHANGES.txt is good, verified several of the bug > fixes. > > Matei > > On Jul 17, 2014, at 11:12 AM, Xiangrui Meng wrote: > >> I start the voting with a +1. >> >> Ran tests on the release candidates and some basic operations in >> spark-shell and pyspark (local and standalone). >> >> -Xiangrui >> >> On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng wrote: >>> Please vote on releasing the following candidate as Apache Spark version >>> 0.9.2! >>> >>> The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba): >>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b >>> >>> The release files, including signatures, digests, etc. can be found at: >>> http://people.apache.org/~meng/spark-0.9.2-rc1/ >>> >>> Release artifacts are signed with the following key: >>> https://people.apache.org/keys/committer/meng.asc >>> >>> The staging repository for this release can be found at: >>> https://repository.apache.org/service/local/repositories/orgapachespark-1023/content/ >>> >>> The documentation corresponding to this release can be found at: >>> http://people.apache.org/~meng/spark-0.9.2-rc1-docs/ >>> >>> Please vote on releasing this package as Apache Spark 0.9.2! >>> >>> The vote is open until Sunday, July 20, at 11:10 UTC and passes if >>> a majority of at least 3 +1 PMC votes are cast. >>> >>> [ ] +1 Release this package as Apache Spark 0.9.2 >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache Spark, please see >>> http://spark.apache.org/ >>> >>> === About this release === >>> This release fixes a few high-priority bugs in 0.9.1 and has a variety >>> of smaller fixes. The full list is here: http://s.apache.org/d0t. Some >>> of the more visible patches are: >>> >>> SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame size >>> SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys >>> SPARK-1676: HDFS FileSystems continually pile up in the FS cache >>> SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo >>> SPARK-1870: Secondary jars are not added to executor classpath for YARN >>> >>> This is the second maintenance release on the 0.9 line. We plan to make >>> additional maintenance releases as new fixes come in. >>> >>> Best, >>> Xiangrui >
preferred Hive/Hadoop environment for generating golden test outputs
Hi all, What's the preferred environment for generating golden test outputs for new Hive tests? In particular: * what Hadoop version and Hive version should I be using, * are there particular distributions people have run successfully, and * are there any system properties or environment variables (beyond HADOOP_HOME, HIVE_HOME, and HIVE_DEV_HOME) I need to set before running the suite? I ask because I'm getting some errors while trying to add new tests and would like to eliminate any possible problems caused by differences between what my environment offers and what Spark expects. (I'm currently running with the Fedora packages for Hadoop 2.2.0 and a locally-built Hive 0.12.0.) Since I'll only be using this for generating test outputs, something as simple to set up as possible would be great. (Once I get something working, I'll be happy to write it up and contribute it as developer docs.) thanks, wb
Re: [VOTE] Release Apache Spark 0.9.2 (RC1)
+1 On Thursday, July 17, 2014, Matei Zaharia wrote: > +1 > > Tested on Mac, verified CHANGES.txt is good, verified several of the bug > fixes. > > Matei > > On Jul 17, 2014, at 11:12 AM, Xiangrui Meng > wrote: > > > I start the voting with a +1. > > > > Ran tests on the release candidates and some basic operations in > > spark-shell and pyspark (local and standalone). > > > > -Xiangrui > > > > On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng > wrote: > >> Please vote on releasing the following candidate as Apache Spark > version 0.9.2! > >> > >> The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba): > >> > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b > >> > >> The release files, including signatures, digests, etc. can be found at: > >> http://people.apache.org/~meng/spark-0.9.2-rc1/ > >> > >> Release artifacts are signed with the following key: > >> https://people.apache.org/keys/committer/meng.asc > >> > >> The staging repository for this release can be found at: > >> > https://repository.apache.org/service/local/repositories/orgapachespark-1023/content/ > >> > >> The documentation corresponding to this release can be found at: > >> http://people.apache.org/~meng/spark-0.9.2-rc1-docs/ > >> > >> Please vote on releasing this package as Apache Spark 0.9.2! > >> > >> The vote is open until Sunday, July 20, at 11:10 UTC and passes if > >> a majority of at least 3 +1 PMC votes are cast. > >> > >> [ ] +1 Release this package as Apache Spark 0.9.2 > >> [ ] -1 Do not release this package because ... > >> > >> To learn more about Apache Spark, please see > >> http://spark.apache.org/ > >> > >> === About this release === > >> This release fixes a few high-priority bugs in 0.9.1 and has a variety > >> of smaller fixes. The full list is here: http://s.apache.org/d0t. Some > >> of the more visible patches are: > >> > >> SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame > size > >> SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys > >> SPARK-1676: HDFS FileSystems continually pile up in the FS cache > >> SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo > >> SPARK-1870: Secondary jars are not added to executor classpath for YARN > >> > >> This is the second maintenance release on the 0.9 line. We plan to make > >> additional maintenance releases as new fixes come in. > >> > >> Best, > >> Xiangrui > >
Re: preferred Hive/Hadoop environment for generating golden test outputs
Hi Will, These three environment variables are needed [1]. I have had success with Hive 0.12 and Hadoop 1.0.4. For Hive, getting the source distribution seems to be required. Docs contribution will be much appreciated! [1] https://github.com/apache/spark/tree/master/sql#other-dependencies-for-developers Zongheng On Thu, Jul 17, 2014 at 7:51 PM, Will Benton wrote: > Hi all, > > What's the preferred environment for generating golden test outputs for new > Hive tests? In particular: > > * what Hadoop version and Hive version should I be using, > * are there particular distributions people have run successfully, and > * are there any system properties or environment variables (beyond > HADOOP_HOME, HIVE_HOME, and HIVE_DEV_HOME) I need to set before running the > suite? > > I ask because I'm getting some errors while trying to add new tests and would > like to eliminate any possible problems caused by differences between what my > environment offers and what Spark expects. (I'm currently running with the > Fedora packages for Hadoop 2.2.0 and a locally-built Hive 0.12.0.) Since > I'll only be using this for generating test outputs, something as simple to > set up as possible would be great. > > (Once I get something working, I'll be happy to write it up and contribute it > as developer docs.) > > > thanks, > wb
Re: Current way to include hive in a build
Hey Stephen, The only change the build was that we ask users to run -Phive and -Pyarn of --with-hive and --with-yarn (which internally just set -Phive and -Pyarn). I don't think this should affect the dependency graph. Just to test this, what happens if you run *without* the CDH profile and build with hadoop version 2.3.0? Does that work? - Patrick On Thu, Jul 17, 2014 at 4:00 PM, Stephen Boesch wrote: > Having looked at trunk make-distribution.sh the --with-hive and --with-yarn > are now deprecated. > > Here is the way I have built it: > > Added to pom.xml: > > > cdh5 > > false > > > 2.3.0-cdh5.0.0 > 2.3.0-cdh5.0.0 > 0.96.1.1-cdh5.0.0 > 3.4.5-cdh5.0.0 > > > > *mvn -Pyarn -Pcdh5 -Phive -Dhadoop.version=2.3.0-cdh5.0.1 > -Dyarn.version=2.3.0-cdh5.0.0 -DskipTests clean package* > > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Spark Project Parent POM .. SUCCESS [3.165s] > [INFO] Spark Project Core SUCCESS > [2:39.504s] > [INFO] Spark Project Bagel ... SUCCESS [7.596s] > [INFO] Spark Project GraphX .. SUCCESS [22.027s] > [INFO] Spark Project ML Library .. SUCCESS [36.284s] > [INFO] Spark Project Streaming ... SUCCESS [24.309s] > [INFO] Spark Project Tools ... SUCCESS [3.147s] > [INFO] Spark Project Catalyst SUCCESS [20.148s] > [INFO] Spark Project SQL . SUCCESS [18.560s] > *[INFO] Spark Project Hive FAILURE > [33.962s]* > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies > (copy-dependencies) on project spark-hive_2.10: Execution copy-dependencies > of goal > org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies > failed: Plugin org.apache.maven.plugins:maven-dependency-plugin:2.4 or one > of its dependencies could not be resolved: Could not find artifact > commons-logging:commons-logging:jar:1.0.4 -> [Help 1] > > Anyone who is presently building with -Phive and has a suggestion for this?
Re: [VOTE] Release Apache Spark 0.9.2 (RC1)
UPDATE: The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1023/ The previous repo contains exactly the same content but mutable. Thanks Patrick for pointing it out! -Xiangrui On Thu, Jul 17, 2014 at 7:52 PM, Reynold Xin wrote: > +1 > > On Thursday, July 17, 2014, Matei Zaharia wrote: > >> +1 >> >> Tested on Mac, verified CHANGES.txt is good, verified several of the bug >> fixes. >> >> Matei >> >> On Jul 17, 2014, at 11:12 AM, Xiangrui Meng > > wrote: >> >> > I start the voting with a +1. >> > >> > Ran tests on the release candidates and some basic operations in >> > spark-shell and pyspark (local and standalone). >> > >> > -Xiangrui >> > >> > On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng > > wrote: >> >> Please vote on releasing the following candidate as Apache Spark >> version 0.9.2! >> >> >> >> The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba): >> >> >> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b >> >> >> >> The release files, including signatures, digests, etc. can be found at: >> >> http://people.apache.org/~meng/spark-0.9.2-rc1/ >> >> >> >> Release artifacts are signed with the following key: >> >> https://people.apache.org/keys/committer/meng.asc >> >> >> >> The staging repository for this release can be found at: >> >> >> https://repository.apache.org/service/local/repositories/orgapachespark-1023/content/ >> >> >> >> The documentation corresponding to this release can be found at: >> >> http://people.apache.org/~meng/spark-0.9.2-rc1-docs/ >> >> >> >> Please vote on releasing this package as Apache Spark 0.9.2! >> >> >> >> The vote is open until Sunday, July 20, at 11:10 UTC and passes if >> >> a majority of at least 3 +1 PMC votes are cast. >> >> >> >> [ ] +1 Release this package as Apache Spark 0.9.2 >> >> [ ] -1 Do not release this package because ... >> >> >> >> To learn more about Apache Spark, please see >> >> http://spark.apache.org/ >> >> >> >> === About this release === >> >> This release fixes a few high-priority bugs in 0.9.1 and has a variety >> >> of smaller fixes. The full list is here: http://s.apache.org/d0t. Some >> >> of the more visible patches are: >> >> >> >> SPARK-2156 and SPARK-1112: Issues with jobs hanging due to akka frame >> size >> >> SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys >> >> SPARK-1676: HDFS FileSystems continually pile up in the FS cache >> >> SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo >> >> SPARK-1870: Secondary jars are not added to executor classpath for YARN >> >> >> >> This is the second maintenance release on the 0.9 line. We plan to make >> >> additional maintenance releases as new fixes come in. >> >> >> >> Best, >> >> Xiangrui >> >>