Re: Upgrade the scala code using the most updated Spark version

Anahita Talebi Tue, 28 Mar 2017 14:11:28 -0700

Hi,

Thanks for your answer.
What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
I think the problem might come from this part.


On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:

> Hello
>  uhm ihave a project whose build,sbt is closest to yours, where i am using
> spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
> in my projects though i don thave any of the following libraries that you
> mention
> - breeze
> - netlib,all
> -  scoopt
>
> hth
>
> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>>
>> I first changed the scala version to 2.11.8 and kept the spark version
>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>> With this configuration, I could run the code and compile it and generate
>> the .jar file.
>>
>> When I changed the spark version into 2.1.0, I get the same error as
>> before. So I imagine the problem should be somehow related to the version
>> of spark.
>>
>> Cheers,
>> Anahita
>>
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> --------------------------------
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> organization := "edu.berkeley.cs.amplab"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>     "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>>     "org.apache.spark" %% "spark-core" % "2.1.0"
>> excludeAll(excludeHadoop),
>>     "org.apache.spark" %% "spark-mllib" % "2.1.0"
>> excludeAll(excludeHadoop),
>>     "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
>>     "org.apache.commons" % "commons-compress" % "1.7",
>>     "commons-io" % "commons-io" % "2.4",
>>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>     "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
>>   "Spray" at "http://repo.spray.cc";
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>>     case PathList("javax", "servlet", xs @ _*)           =>
>> MergeStrategy.first
>>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>>     case "application.conf"                              =>
>> MergeStrategy.concat
>>     case "reference.conf"                                =>
>> MergeStrategy.concat
>>     case "log4j.properties"                              =>
>> MergeStrategy.discard
>>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>> MergeStrategy.discard
>>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>>     case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> --------------------------------
>>
>> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> Hello
>>>  that looks to me like there's something dodgy withyour Scala
>>> installation
>>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>>> suggest you change one thing at a time in your sbt
>>> First Spark version. run it and see if it works
>>> Then amend the scala version
>>>
>>> hth
>>>  marco
>>>
>>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Thanks you all for your informative answers.
>>>> I actually changed the scala version to the 2.11.8 and spark version
>>>> into 2.1.0 in the build.sbt
>>>>
>>>> Except for these two guys (scala and spark version), I kept the same
>>>> values for the rest in the build.sbt file.
>>>> ------------------------------------------------------------
>>>> ---------------
>>>> import AssemblyKeys._
>>>>
>>>> assemblySettings
>>>>
>>>> name := "proxcocoa"
>>>>
>>>> version := "0.1"
>>>>
>>>> scalaVersion := "2.11.8"
>>>>
>>>> parallelExecution in Test := false
>>>>
>>>> {
>>>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>>>   libraryDependencies ++= Seq(
>>>>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>>>     "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>>     "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>>     "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>>     "org.apache.commons" % "commons-compress" % "1.7",
>>>>     "commons-io" % "commons-io" % "2.4",
>>>>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>>>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>>   )
>>>> }
>>>>
>>>> {
>>>>   val defaultHadoopVersion = "1.0.4"
>>>>   val hadoopVersion =
>>>>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>> defaultHadoopVersion)
>>>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>> hadoopVersion
>>>> }
>>>>
>>>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" %
>>>> "2.1.0"
>>>>
>>>> resolvers ++= Seq(
>>>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>>> ".m2/repository",
>>>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
>>>>   "Spray" at "http://repo.spray.cc";
>>>> )
>>>>
>>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>>   {
>>>>     case PathList("javax", "servlet", xs @ _*)           =>
>>>> MergeStrategy.first
>>>>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>>> MergeStrategy.first
>>>>     case "application.conf"                              =>
>>>> MergeStrategy.concat
>>>>     case "reference.conf"                                =>
>>>> MergeStrategy.concat
>>>>     case "log4j.properties"                              =>
>>>> MergeStrategy.discard
>>>>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>>> MergeStrategy.discard
>>>>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>>> MergeStrategy.discard
>>>>     case _ => MergeStrategy.first
>>>>   }
>>>> }
>>>>
>>>> test in assembly := {}
>>>> ----------------------------------------------------------------
>>>>
>>>> When I compile the code, I get the following error:
>>>>
>>>> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_ver
>>>> sion_proxcocoa-master/target/scala-2.11/classes...
>>>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>>>> /scala/utils/OptUtils.scala:40: value mapPartitionsWithSplit is not a
>>>> member of org.apache.spark.rdd.RDD[String]
>>>> [error]     val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
>>>> [error]                      ^
>>>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>>>> /scala/utils/OptUtils.scala:41: value length is not a member of Any
>>>> [error]       Iterator(i -> lines.length)
>>>> [error]                           ^
>>>> ----------------------------------------------------------------
>>>> It gets the error in the code. Does it mean that for the different
>>>> version of the spark and scala, I need to change the main code?
>>>>
>>>> Thanks,
>>>> Anahita
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč <dinko.sr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Adding to advices given by others ... Spark 2.1.0 works with Scala
>>>>> 2.11, so set:
>>>>>
>>>>>   scalaVersion := "2.11.8"
>>>>>
>>>>> When you see something like:
>>>>>
>>>>>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>>>>
>>>>> that means that library `spark-core` is compiled against Scala 2.10,
>>>>> so you would have to change that to 2.11:
>>>>>
>>>>>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>>>>
>>>>> better yet, let SBT worry about libraries built against particular
>>>>> Scala versions:
>>>>>
>>>>>   "org.apache.spark" %% "spark-core" % "2.1.0"
>>>>>
>>>>> The `%%` will instruct SBT to choose the library appropriate for a
>>>>> version of Scala that is set in `scalaVersion`.
>>>>>
>>>>> It may be worth mentioning that the `%%` thing works only with Scala
>>>>> libraries as they are compiled against a certain Scala version. Java
>>>>> libraries are unaffected (have nothing to do with Scala), e.g. for
>>>>> `slf4j` one only uses single `%`s:
>>>>>
>>>>>   "org.slf4j" % "slf4j-api" % "1.7.2"
>>>>>
>>>>> Cheers,
>>>>> Dinko
>>>>>
>>>>> On 27 March 2017 at 23:30, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>> wrote:
>>>>> > check these versions
>>>>> >
>>>>> > function create_build_sbt_file {
>>>>> >         BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
>>>>> >         [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
>>>>> >         cat >> $BUILD_SBT_FILE << !
>>>>> > lazy val root = (project in file(".")).
>>>>> >   settings(
>>>>> >     name := "${APPLICATION}",
>>>>> >     version := "1.0",
>>>>> >     scalaVersion := "2.11.8",
>>>>> >     mainClass in Compile := Some("myPackage.${APPLICATION}")
>>>>> >   )
>>>>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
>>>>> > "provided"
>>>>> > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
>>>>> > "provided"
>>>>> > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
>>>>> > "provided"
>>>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>>>> "2.0.0" %
>>>>> > "provided"
>>>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka"
>>>>> %
>>>>> > "1.6.1" % "provided"
>>>>> > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
>>>>> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
>>>>> > "4.6.0-HBase-1.0"
>>>>> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
>>>>> > // META-INF discarding
>>>>> > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>>> >    {
>>>>> >     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
>>>>> >     case x => MergeStrategy.first
>>>>> >    }
>>>>> > }
>>>>> > !
>>>>> > }
>>>>> >
>>>>> > HTH
>>>>> >
>>>>> > Dr Mich Talebzadeh
>>>>> >
>>>>> >
>>>>> >
>>>>> > LinkedIn
>>>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>>>> d6zP6AcPCCdOABUrV8Pw
>>>>> >
>>>>> >
>>>>> >
>>>>> > http://talebzadehmich.wordpress.com
>>>>> >
>>>>> >
>>>>> > Disclaimer: Use it at your own risk. Any and all responsibility for
>>>>> any
>>>>> > loss, damage or destruction of data or any other property which may
>>>>> arise
>>>>> > from relying on this email's technical content is explicitly
>>>>> disclaimed. The
>>>>> > author will in no case be liable for any monetary damages arising
>>>>> from such
>>>>> > loss, damage or destruction.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 27 March 2017 at 21:45, Jörn Franke <jornfra...@gmail.com> wrote:
>>>>> >>
>>>>> >> Usually you define the dependencies to the Spark library as
>>>>> provided. You
>>>>> >> also seem to mix different Spark versions which should be avoided.
>>>>> >> The Hadoop library seems to be outdated and should also only be
>>>>> provided.
>>>>> >>
>>>>> >> The other dependencies you could assemble in a fat jar.
>>>>> >>
>>>>> >> On 27 Mar 2017, at 21:25, Anahita Talebi <anahita.t.am...@gmail.com
>>>>> >
>>>>> >> wrote:
>>>>> >>
>>>>> >> Hi friends,
>>>>> >>
>>>>> >> I have a code which is written in Scala. The scala version 2.10.4
>>>>> and
>>>>> >> Spark version 1.5.2 are used to run the code.
>>>>> >>
>>>>> >> I would like to upgrade the code to the most updated version of
>>>>> spark,
>>>>> >> meaning 2.1.0.
>>>>> >>
>>>>> >> Here is the build.sbt:
>>>>> >>
>>>>> >> import AssemblyKeys._
>>>>> >>
>>>>> >> assemblySettings
>>>>> >>
>>>>> >> name := "proxcocoa"
>>>>> >>
>>>>> >> version := "0.1"
>>>>> >>
>>>>> >> scalaVersion := "2.10.4"
>>>>> >>
>>>>> >> parallelExecution in Test := false
>>>>> >>
>>>>> >> {
>>>>> >>   val excludeHadoop = ExclusionRule(organization =
>>>>> "org.apache.hadoop")
>>>>> >>   libraryDependencies ++= Seq(
>>>>> >>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>>> >>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>>> >>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>>>> >>     "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>>>> >> excludeAll(excludeHadoop),
>>>>> >>     "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
>>>>> >> excludeAll(excludeHadoop),
>>>>> >>     "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
>>>>> >> excludeAll(excludeHadoop),
>>>>> >>     "org.apache.commons" % "commons-compress" % "1.7",
>>>>> >>     "commons-io" % "commons-io" % "2.4",
>>>>> >>     "org.scalanlp" % "breeze_2.10" % "0.11.2",
>>>>> >>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>>> >>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>>> >>   )
>>>>> >> }
>>>>> >>
>>>>> >> {
>>>>> >>   val defaultHadoopVersion = "1.0.4"
>>>>> >>   val hadoopVersion =
>>>>> >>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>>> >> defaultHadoopVersion)
>>>>> >>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>>> >> hadoopVersion
>>>>> >> }
>>>>> >>
>>>>> >> libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" %
>>>>> >> "1.5.0"
>>>>> >>
>>>>> >> resolvers ++= Seq(
>>>>> >>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>>>> >> ".m2/repository",
>>>>> >>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
>>>>> >>   "Spray" at "http://repo.spray.cc";
>>>>> >> )
>>>>> >>
>>>>> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>>> >>   {
>>>>> >>     case PathList("javax", "servlet", xs @ _*)           =>
>>>>> >> MergeStrategy.first
>>>>> >>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>>>> >> MergeStrategy.first
>>>>> >>     case "application.conf"                              =>
>>>>> >> MergeStrategy.concat
>>>>> >>     case "reference.conf"                                =>
>>>>> >> MergeStrategy.concat
>>>>> >>     case "log4j.properties"                              =>
>>>>> >> MergeStrategy.discard
>>>>> >>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>>>> >> MergeStrategy.discard
>>>>> >>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>>>> >> MergeStrategy.discard
>>>>> >>     case _ => MergeStrategy.first
>>>>> >>   }
>>>>> >> }
>>>>> >>
>>>>> >> test in assembly := {}
>>>>> >>
>>>>> >> -----------------------------------------------------------
>>>>> >> I downloaded the spark 2.1.0 and change the version of spark and
>>>>> >> scalaversion in the build.sbt. But unfortunately, I was failed to
>>>>> run the
>>>>> >> code.
>>>>> >>
>>>>> >> Does anybody know how I can upgrade the code to the most recent
>>>>> spark
>>>>> >> version by changing the build.sbt file?
>>>>> >>
>>>>> >> Or do you have any other suggestion?
>>>>> >>
>>>>> >> Thanks a lot,
>>>>> >> Anahita
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Upgrade the scala code using the most updated Spark version

Reply via email to