Hi, Thanks everybody to help me to solve my problem :) As Zhu said, I had to use mapPartitionsWithIndex in my code.
Thanks, Have a nice day, Anahita On Wed, Mar 29, 2017 at 2:51 AM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > mapPartitionsWithSplit was removed in Spark 2.0.0. You can > use mapPartitionsWithIndex instead. > > On Tue, Mar 28, 2017 at 3:52 PM, Anahita Talebi <anahita.t.am...@gmail.com > > wrote: > >> Thanks. >> I tried this one, as well. Unfortunately I still get the same error. >> >> >> On Wednesday, March 29, 2017, Marco Mistroni <mmistr...@gmail.com> wrote: >> >>> 1.7.5 >>> >>> On 28 Mar 2017 10:10 pm, "Anahita Talebi" <anahita.t.am...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Thanks for your answer. >>>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file? >>>> I think the problem might come from this part. >>>> >>>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com> >>>> wrote: >>>> >>>>> Hello >>>>> uhm ihave a project whose build,sbt is closest to yours, where i am >>>>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it >>>>> works fine >>>>> in my projects though i don thave any of the following libraries that >>>>> you mention >>>>> - breeze >>>>> - netlib,all >>>>> - scoopt >>>>> >>>>> hth >>>>> >>>>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi < >>>>> anahita.t.am...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Thanks for your answer. >>>>>> >>>>>> I first changed the scala version to 2.11.8 and kept the spark >>>>>> version 1.5.2 (old version). Then I changed the scalatest version into >>>>>> "3.0.1". With this configuration, I could run the code and compile it and >>>>>> generate the .jar file. >>>>>> >>>>>> When I changed the spark version into 2.1.0, I get the same error as >>>>>> before. So I imagine the problem should be somehow related to the version >>>>>> of spark. >>>>>> >>>>>> Cheers, >>>>>> Anahita >>>>>> >>>>>> ------------------------------------------------------------ >>>>>> ------------------------------------------------------------ >>>>>> -------------------------------- >>>>>> import AssemblyKeys._ >>>>>> >>>>>> assemblySettings >>>>>> >>>>>> name := "proxcocoa" >>>>>> >>>>>> version := "0.1" >>>>>> >>>>>> organization := "edu.berkeley.cs.amplab" >>>>>> >>>>>> scalaVersion := "2.11.8" >>>>>> >>>>>> parallelExecution in Test := false >>>>>> >>>>>> { >>>>>> val excludeHadoop = ExclusionRule(organization = >>>>>> "org.apache.hadoop") >>>>>> libraryDependencies ++= Seq( >>>>>> "org.slf4j" % "slf4j-api" % "1.7.2", >>>>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >>>>>> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >>>>>> "org.apache.spark" %% "spark-core" % "2.1.0" >>>>>> excludeAll(excludeHadoop), >>>>>> "org.apache.spark" %% "spark-mllib" % "2.1.0" >>>>>> excludeAll(excludeHadoop), >>>>>> "org.apache.spark" %% "spark-sql" % "2.1.0" >>>>>> excludeAll(excludeHadoop), >>>>>> "org.apache.commons" % "commons-compress" % "1.7", >>>>>> "commons-io" % "commons-io" % "2.4", >>>>>> "org.scalanlp" % "breeze_2.11" % "0.11.2", >>>>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >>>>>> "com.github.scopt" %% "scopt" % "3.3.0" >>>>>> ) >>>>>> } >>>>>> >>>>>> { >>>>>> val defaultHadoopVersion = "1.0.4" >>>>>> val hadoopVersion = >>>>>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >>>>>> defaultHadoopVersion) >>>>>> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >>>>>> hadoopVersion >>>>>> } >>>>>> >>>>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % >>>>>> "2.1.0" >>>>>> >>>>>> resolvers ++= Seq( >>>>>> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + >>>>>> ".m2/repository", >>>>>> "Typesafe" at "http://repo.typesafe.com/typesafe/releases", >>>>>> "Spray" at "http://repo.spray.cc" >>>>>> ) >>>>>> >>>>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => >>>>>> { >>>>>> case PathList("javax", "servlet", xs @ _*) => >>>>>> MergeStrategy.first >>>>>> case PathList(ps @ _*) if ps.last endsWith ".html" => >>>>>> MergeStrategy.first >>>>>> case "application.conf" => >>>>>> MergeStrategy.concat >>>>>> case "reference.conf" => >>>>>> MergeStrategy.concat >>>>>> case "log4j.properties" => >>>>>> MergeStrategy.discard >>>>>> case m if m.toLowerCase.endsWith("manifest.mf") => >>>>>> MergeStrategy.discard >>>>>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >>>>>> MergeStrategy.discard >>>>>> case _ => MergeStrategy.first >>>>>> } >>>>>> } >>>>>> >>>>>> test in assembly := {} >>>>>> ------------------------------------------------------------ >>>>>> ------------------------------------------------------------ >>>>>> -------------------------------- >>>>>> >>>>>> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello >>>>>>> that looks to me like there's something dodgy withyour Scala >>>>>>> installation >>>>>>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i >>>>>>> suggest you change one thing at a time in your sbt >>>>>>> First Spark version. run it and see if it works >>>>>>> Then amend the scala version >>>>>>> >>>>>>> hth >>>>>>> marco >>>>>>> >>>>>>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi < >>>>>>> anahita.t.am...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Thanks you all for your informative answers. >>>>>>>> I actually changed the scala version to the 2.11.8 and spark >>>>>>>> version into 2.1.0 in the build.sbt >>>>>>>> >>>>>>>> Except for these two guys (scala and spark version), I kept the >>>>>>>> same values for the rest in the build.sbt file. >>>>>>>> ------------------------------------------------------------ >>>>>>>> --------------- >>>>>>>> import AssemblyKeys._ >>>>>>>> >>>>>>>> assemblySettings >>>>>>>> >>>>>>>> name := "proxcocoa" >>>>>>>> >>>>>>>> version := "0.1" >>>>>>>> >>>>>>>> scalaVersion := "2.11.8" >>>>>>>> >>>>>>>> parallelExecution in Test := false >>>>>>>> >>>>>>>> { >>>>>>>> val excludeHadoop = ExclusionRule(organization = >>>>>>>> "org.apache.hadoop") >>>>>>>> libraryDependencies ++= Seq( >>>>>>>> "org.slf4j" % "slf4j-api" % "1.7.2", >>>>>>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >>>>>>>> "org.scalatest" %% "scalatest" % "1.9.1" % "test", >>>>>>>> "org.apache.spark" % "spark-core_2.11" % "2.1.0" >>>>>>>> excludeAll(excludeHadoop), >>>>>>>> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0" >>>>>>>> excludeAll(excludeHadoop), >>>>>>>> "org.apache.spark" % "spark-sql_2.11" % "2.1.0" >>>>>>>> excludeAll(excludeHadoop), >>>>>>>> "org.apache.commons" % "commons-compress" % "1.7", >>>>>>>> "commons-io" % "commons-io" % "2.4", >>>>>>>> "org.scalanlp" % "breeze_2.11" % "0.11.2", >>>>>>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >>>>>>>> "com.github.scopt" %% "scopt" % "3.3.0" >>>>>>>> ) >>>>>>>> } >>>>>>>> >>>>>>>> { >>>>>>>> val defaultHadoopVersion = "1.0.4" >>>>>>>> val hadoopVersion = >>>>>>>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >>>>>>>> defaultHadoopVersion) >>>>>>>> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >>>>>>>> hadoopVersion >>>>>>>> } >>>>>>>> >>>>>>>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" >>>>>>>> % "2.1.0" >>>>>>>> >>>>>>>> resolvers ++= Seq( >>>>>>>> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + >>>>>>>> ".m2/repository", >>>>>>>> "Typesafe" at "http://repo.typesafe.com/typesafe/releases", >>>>>>>> "Spray" at "http://repo.spray.cc" >>>>>>>> ) >>>>>>>> >>>>>>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => >>>>>>>> { >>>>>>>> case PathList("javax", "servlet", xs @ _*) => >>>>>>>> MergeStrategy.first >>>>>>>> case PathList(ps @ _*) if ps.last endsWith ".html" => >>>>>>>> MergeStrategy.first >>>>>>>> case "application.conf" => >>>>>>>> MergeStrategy.concat >>>>>>>> case "reference.conf" => >>>>>>>> MergeStrategy.concat >>>>>>>> case "log4j.properties" => >>>>>>>> MergeStrategy.discard >>>>>>>> case m if m.toLowerCase.endsWith("manifest.mf") => >>>>>>>> MergeStrategy.discard >>>>>>>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >>>>>>>> MergeStrategy.discard >>>>>>>> case _ => MergeStrategy.first >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> test in assembly := {} >>>>>>>> ---------------------------------------------------------------- >>>>>>>> >>>>>>>> When I compile the code, I get the following error: >>>>>>>> >>>>>>>> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_ver >>>>>>>> sion_proxcocoa-master/target/scala-2.11/classes... >>>>>>>> [error] /Users/atalebi/Desktop/new_ver >>>>>>>> sion_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40: >>>>>>>> value mapPartitionsWithSplit is not a member of >>>>>>>> org.apache.spark.rdd.RDD[String] >>>>>>>> [error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) >>>>>>>> => >>>>>>>> [error] ^ >>>>>>>> [error] /Users/atalebi/Desktop/new_ver >>>>>>>> sion_proxcocoa-master/src/main/scala/utils/OptUtils.scala:41: >>>>>>>> value length is not a member of Any >>>>>>>> [error] Iterator(i -> lines.length) >>>>>>>> [error] ^ >>>>>>>> ---------------------------------------------------------------- >>>>>>>> It gets the error in the code. Does it mean that for the different >>>>>>>> version of the spark and scala, I need to change the main code? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anahita >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč < >>>>>>>> dinko.sr...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Adding to advices given by others ... Spark 2.1.0 works with Scala >>>>>>>>> 2.11, so set: >>>>>>>>> >>>>>>>>> scalaVersion := "2.11.8" >>>>>>>>> >>>>>>>>> When you see something like: >>>>>>>>> >>>>>>>>> "org.apache.spark" % "spark-core_2.10" % "1.5.2" >>>>>>>>> >>>>>>>>> that means that library `spark-core` is compiled against Scala >>>>>>>>> 2.10, >>>>>>>>> so you would have to change that to 2.11: >>>>>>>>> >>>>>>>>> "org.apache.spark" % "spark-core_2.11" % "2.1.0" >>>>>>>>> >>>>>>>>> better yet, let SBT worry about libraries built against particular >>>>>>>>> Scala versions: >>>>>>>>> >>>>>>>>> "org.apache.spark" %% "spark-core" % "2.1.0" >>>>>>>>> >>>>>>>>> The `%%` will instruct SBT to choose the library appropriate for a >>>>>>>>> version of Scala that is set in `scalaVersion`. >>>>>>>>> >>>>>>>>> It may be worth mentioning that the `%%` thing works only with >>>>>>>>> Scala >>>>>>>>> libraries as they are compiled against a certain Scala version. >>>>>>>>> Java >>>>>>>>> libraries are unaffected (have nothing to do with Scala), e.g. for >>>>>>>>> `slf4j` one only uses single `%`s: >>>>>>>>> >>>>>>>>> "org.slf4j" % "slf4j-api" % "1.7.2" >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Dinko >>>>>>>>> >>>>>>>>> On 27 March 2017 at 23:30, Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> > check these versions >>>>>>>>> > >>>>>>>>> > function create_build_sbt_file { >>>>>>>>> > BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/ >>>>>>>>> build.sbt >>>>>>>>> > [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE} >>>>>>>>> > cat >> $BUILD_SBT_FILE << ! >>>>>>>>> > lazy val root = (project in file(".")). >>>>>>>>> > settings( >>>>>>>>> > name := "${APPLICATION}", >>>>>>>>> > version := "1.0", >>>>>>>>> > scalaVersion := "2.11.8", >>>>>>>>> > mainClass in Compile := Some("myPackage.${APPLICATION}") >>>>>>>>> > ) >>>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-core" % >>>>>>>>> "2.0.0" % >>>>>>>>> > "provided" >>>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-sql" % >>>>>>>>> "2.0.0" % >>>>>>>>> > "provided" >>>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-hive" % >>>>>>>>> "2.0.0" % >>>>>>>>> > "provided" >>>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % >>>>>>>>> "2.0.0" % >>>>>>>>> > "provided" >>>>>>>>> > libraryDependencies += "org.apache.spark" %% >>>>>>>>> "spark-streaming-kafka" % >>>>>>>>> > "1.6.1" % "provided" >>>>>>>>> > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2" >>>>>>>>> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" % >>>>>>>>> > "4.6.0-HBase-1.0" >>>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3" >>>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-client" % >>>>>>>>> "1.2.3" >>>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-common" % >>>>>>>>> "1.2.3" >>>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-server" % >>>>>>>>> "1.2.3" >>>>>>>>> > // META-INF discarding >>>>>>>>> > mergeStrategy in assembly <<= (mergeStrategy in assembly) { >>>>>>>>> (old) => >>>>>>>>> > { >>>>>>>>> > case PathList("META-INF", xs @ _*) => MergeStrategy.discard >>>>>>>>> > case x => MergeStrategy.first >>>>>>>>> > } >>>>>>>>> > } >>>>>>>>> > ! >>>>>>>>> > } >>>>>>>>> > >>>>>>>>> > HTH >>>>>>>>> > >>>>>>>>> > Dr Mich Talebzadeh >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > LinkedIn >>>>>>>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ >>>>>>>>> d6zP6AcPCCdOABUrV8Pw >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > http://talebzadehmich.wordpress.com >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Disclaimer: Use it at your own risk. Any and all responsibility >>>>>>>>> for any >>>>>>>>> > loss, damage or destruction of data or any other property which >>>>>>>>> may arise >>>>>>>>> > from relying on this email's technical content is explicitly >>>>>>>>> disclaimed. The >>>>>>>>> > author will in no case be liable for any monetary damages >>>>>>>>> arising from such >>>>>>>>> > loss, damage or destruction. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On 27 March 2017 at 21:45, Jörn Franke <jornfra...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >> >>>>>>>>> >> Usually you define the dependencies to the Spark library as >>>>>>>>> provided. You >>>>>>>>> >> also seem to mix different Spark versions which should be >>>>>>>>> avoided. >>>>>>>>> >> The Hadoop library seems to be outdated and should also only be >>>>>>>>> provided. >>>>>>>>> >> >>>>>>>>> >> The other dependencies you could assemble in a fat jar. >>>>>>>>> >> >>>>>>>>> >> On 27 Mar 2017, at 21:25, Anahita Talebi < >>>>>>>>> anahita.t.am...@gmail.com> >>>>>>>>> >> wrote: >>>>>>>>> >> >>>>>>>>> >> Hi friends, >>>>>>>>> >> >>>>>>>>> >> I have a code which is written in Scala. The scala version >>>>>>>>> 2.10.4 and >>>>>>>>> >> Spark version 1.5.2 are used to run the code. >>>>>>>>> >> >>>>>>>>> >> I would like to upgrade the code to the most updated version of >>>>>>>>> spark, >>>>>>>>> >> meaning 2.1.0. >>>>>>>>> >> >>>>>>>>> >> Here is the build.sbt: >>>>>>>>> >> >>>>>>>>> >> import AssemblyKeys._ >>>>>>>>> >> >>>>>>>>> >> assemblySettings >>>>>>>>> >> >>>>>>>>> >> name := "proxcocoa" >>>>>>>>> >> >>>>>>>>> >> version := "0.1" >>>>>>>>> >> >>>>>>>>> >> scalaVersion := "2.10.4" >>>>>>>>> >> >>>>>>>>> >> parallelExecution in Test := false >>>>>>>>> >> >>>>>>>>> >> { >>>>>>>>> >> val excludeHadoop = ExclusionRule(organization = >>>>>>>>> "org.apache.hadoop") >>>>>>>>> >> libraryDependencies ++= Seq( >>>>>>>>> >> "org.slf4j" % "slf4j-api" % "1.7.2", >>>>>>>>> >> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >>>>>>>>> >> "org.scalatest" %% "scalatest" % "1.9.1" % "test", >>>>>>>>> >> "org.apache.spark" % "spark-core_2.10" % "1.5.2" >>>>>>>>> >> excludeAll(excludeHadoop), >>>>>>>>> >> "org.apache.spark" % "spark-mllib_2.10" % "1.5.2" >>>>>>>>> >> excludeAll(excludeHadoop), >>>>>>>>> >> "org.apache.spark" % "spark-sql_2.10" % "1.5.2" >>>>>>>>> >> excludeAll(excludeHadoop), >>>>>>>>> >> "org.apache.commons" % "commons-compress" % "1.7", >>>>>>>>> >> "commons-io" % "commons-io" % "2.4", >>>>>>>>> >> "org.scalanlp" % "breeze_2.10" % "0.11.2", >>>>>>>>> >> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >>>>>>>>> >> "com.github.scopt" %% "scopt" % "3.3.0" >>>>>>>>> >> ) >>>>>>>>> >> } >>>>>>>>> >> >>>>>>>>> >> { >>>>>>>>> >> val defaultHadoopVersion = "1.0.4" >>>>>>>>> >> val hadoopVersion = >>>>>>>>> >> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >>>>>>>>> >> defaultHadoopVersion) >>>>>>>>> >> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >>>>>>>>> >> hadoopVersion >>>>>>>>> >> } >>>>>>>>> >> >>>>>>>>> >> libraryDependencies += "org.apache.spark" % >>>>>>>>> "spark-streaming_2.10" % >>>>>>>>> >> "1.5.0" >>>>>>>>> >> >>>>>>>>> >> resolvers ++= Seq( >>>>>>>>> >> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL >>>>>>>>> + >>>>>>>>> >> ".m2/repository", >>>>>>>>> >> "Typesafe" at "http://repo.typesafe.com/typesafe/releases", >>>>>>>>> >> "Spray" at "http://repo.spray.cc" >>>>>>>>> >> ) >>>>>>>>> >> >>>>>>>>> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { >>>>>>>>> (old) => >>>>>>>>> >> { >>>>>>>>> >> case PathList("javax", "servlet", xs @ _*) => >>>>>>>>> >> MergeStrategy.first >>>>>>>>> >> case PathList(ps @ _*) if ps.last endsWith ".html" => >>>>>>>>> >> MergeStrategy.first >>>>>>>>> >> case "application.conf" => >>>>>>>>> >> MergeStrategy.concat >>>>>>>>> >> case "reference.conf" => >>>>>>>>> >> MergeStrategy.concat >>>>>>>>> >> case "log4j.properties" => >>>>>>>>> >> MergeStrategy.discard >>>>>>>>> >> case m if m.toLowerCase.endsWith("manifest.mf") => >>>>>>>>> >> MergeStrategy.discard >>>>>>>>> >> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >>>>>>>>> >> MergeStrategy.discard >>>>>>>>> >> case _ => MergeStrategy.first >>>>>>>>> >> } >>>>>>>>> >> } >>>>>>>>> >> >>>>>>>>> >> test in assembly := {} >>>>>>>>> >> >>>>>>>>> >> ----------------------------------------------------------- >>>>>>>>> >> I downloaded the spark 2.1.0 and change the version of spark and >>>>>>>>> >> scalaversion in the build.sbt. But unfortunately, I was failed >>>>>>>>> to run the >>>>>>>>> >> code. >>>>>>>>> >> >>>>>>>>> >> Does anybody know how I can upgrade the code to the most recent >>>>>>>>> spark >>>>>>>>> >> version by changing the build.sbt file? >>>>>>>>> >> >>>>>>>>> >> Or do you have any other suggestion? >>>>>>>>> >> >>>>>>>>> >> Thanks a lot, >>>>>>>>> >> Anahita >>>>>>>>> >> >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >