Re: Upgrade the scala code using the most updated Spark version

Anahita Talebi Tue, 28 Mar 2017 09:20:49 -0700

Hello,

Thanks you all for your informative answers.
I actually changed the scala version to the 2.11.8 and spark version into
2.1.0 in the build.sbt


Except for these two guys (scala and spark version), I kept the same values
for the rest in the build.sbt file.
---------------------------------------------------------------------------
import AssemblyKeys._

assemblySettings

name := "proxcocoa"

version := "0.1"

scalaVersion := "2.11.8"

parallelExecution in Test := false

{
  val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
  libraryDependencies ++= Seq(
    "org.slf4j" % "slf4j-api" % "1.7.2",
    "org.slf4j" % "slf4j-log4j12" % "1.7.2",
    "org.scalatest" %% "scalatest" % "1.9.1" % "test",
    "org.apache.spark" % "spark-core_2.11" % "2.1.0"
excludeAll(excludeHadoop),
    "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
excludeAll(excludeHadoop),
    "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
excludeAll(excludeHadoop),
    "org.apache.commons" % "commons-compress" % "1.7",
    "commons-io" % "commons-io" % "2.4",
    "org.scalanlp" % "breeze_2.11" % "0.11.2",
    "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
    "com.github.scopt" %% "scopt" % "3.3.0"
  )
}

{
  val defaultHadoopVersion = "1.0.4"
  val hadoopVersion =
    scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
defaultHadoopVersion)
  libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
hadoopVersion
}

libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0"

resolvers ++= Seq(
  "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
".m2/repository",
  "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
  "Spray" at "http://repo.spray.cc";
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*)           =>
MergeStrategy.first
    case PathList(ps @ _*) if ps.last endsWith ".html"   =>
MergeStrategy.first
    case "application.conf"                              =>
MergeStrategy.concat
    case "reference.conf"                                =>
MergeStrategy.concat
    case "log4j.properties"                              =>
MergeStrategy.discard
    case m if m.toLowerCase.endsWith("manifest.mf")      =>
MergeStrategy.discard
    case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
MergeStrategy.discard
    case _ => MergeStrategy.first
  }
}

test in assembly := {}
----------------------------------------------------------------

When I compile the code, I get the following error:

[info] Compiling 4 Scala sources to
/Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.11/classes...
[error]
/Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40:
value mapPartitionsWithSplit is not a member of
org.apache.spark.rdd.RDD[String]
[error]     val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
[error]                      ^
[error]
/Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:41:
value length is not a member of Any
[error]       Iterator(i -> lines.length)
[error]                           ^
----------------------------------------------------------------
It gets the error in the code. Does it mean that for the different version
of the spark and scala, I need to change the main code?

Thanks,
Anahita






On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč <dinko.sr...@gmail.com> wrote:

> Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11,
> so set:
>
>   scalaVersion := "2.11.8"
>
> When you see something like:
>
>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>
> that means that library `spark-core` is compiled against Scala 2.10,
> so you would have to change that to 2.11:
>
>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>
> better yet, let SBT worry about libraries built against particular
> Scala versions:
>
>   "org.apache.spark" %% "spark-core" % "2.1.0"
>
> The `%%` will instruct SBT to choose the library appropriate for a
> version of Scala that is set in `scalaVersion`.
>
> It may be worth mentioning that the `%%` thing works only with Scala
> libraries as they are compiled against a certain Scala version. Java
> libraries are unaffected (have nothing to do with Scala), e.g. for
> `slf4j` one only uses single `%`s:
>
>   "org.slf4j" % "slf4j-api" % "1.7.2"
>
> Cheers,
> Dinko
>
> On 27 March 2017 at 23:30, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
> > check these versions
> >
> > function create_build_sbt_file {
> >         BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
> >         [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
> >         cat >> $BUILD_SBT_FILE << !
> > lazy val root = (project in file(".")).
> >   settings(
> >     name := "${APPLICATION}",
> >     version := "1.0",
> >     scalaVersion := "2.11.8",
> >     mainClass in Compile := Some("myPackage.${APPLICATION}")
> >   )
> > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0"
> %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
> > "1.6.1" % "provided"
> > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
> > "4.6.0-HBase-1.0"
> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
> > libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
> > libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
> > libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
> > // META-INF discarding
> > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
> >    {
> >     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
> >     case x => MergeStrategy.first
> >    }
> > }
> > !
> > }
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 27 March 2017 at 21:45, Jörn Franke <jornfra...@gmail.com> wrote:
> >>
> >> Usually you define the dependencies to the Spark library as provided.
> You
> >> also seem to mix different Spark versions which should be avoided.
> >> The Hadoop library seems to be outdated and should also only be
> provided.
> >>
> >> The other dependencies you could assemble in a fat jar.
> >>
> >> On 27 Mar 2017, at 21:25, Anahita Talebi <anahita.t.am...@gmail.com>
> >> wrote:
> >>
> >> Hi friends,
> >>
> >> I have a code which is written in Scala. The scala version 2.10.4 and
> >> Spark version 1.5.2 are used to run the code.
> >>
> >> I would like to upgrade the code to the most updated version of spark,
> >> meaning 2.1.0.
> >>
> >> Here is the build.sbt:
> >>
> >> import AssemblyKeys._
> >>
> >> assemblySettings
> >>
> >> name := "proxcocoa"
> >>
> >> version := "0.1"
> >>
> >> scalaVersion := "2.10.4"
> >>
> >> parallelExecution in Test := false
> >>
> >> {
> >>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
> >>   libraryDependencies ++= Seq(
> >>     "org.slf4j" % "slf4j-api" % "1.7.2",
> >>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
> >>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
> >>     "org.apache.spark" % "spark-core_2.10" % "1.5.2"
> >> excludeAll(excludeHadoop),
> >>     "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
> >> excludeAll(excludeHadoop),
> >>     "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
> >> excludeAll(excludeHadoop),
> >>     "org.apache.commons" % "commons-compress" % "1.7",
> >>     "commons-io" % "commons-io" % "2.4",
> >>     "org.scalanlp" % "breeze_2.10" % "0.11.2",
> >>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
> >>     "com.github.scopt" %% "scopt" % "3.3.0"
> >>   )
> >> }
> >>
> >> {
> >>   val defaultHadoopVersion = "1.0.4"
> >>   val hadoopVersion =
> >>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
> >> defaultHadoopVersion)
> >>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> >> hadoopVersion
> >> }
> >>
> >> libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" %
> >> "1.5.0"
> >>
> >> resolvers ++= Seq(
> >>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
> >> ".m2/repository",
> >>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
> >>   "Spray" at "http://repo.spray.cc";
> >> )
> >>
> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
> >>   {
> >>     case PathList("javax", "servlet", xs @ _*)           =>
> >> MergeStrategy.first
> >>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
> >> MergeStrategy.first
> >>     case "application.conf"                              =>
> >> MergeStrategy.concat
> >>     case "reference.conf"                                =>
> >> MergeStrategy.concat
> >>     case "log4j.properties"                              =>
> >> MergeStrategy.discard
> >>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
> >> MergeStrategy.discard
> >>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
> >> MergeStrategy.discard
> >>     case _ => MergeStrategy.first
> >>   }
> >> }
> >>
> >> test in assembly := {}
> >>
> >> -----------------------------------------------------------
> >> I downloaded the spark 2.1.0 and change the version of spark and
> >> scalaversion in the build.sbt. But unfortunately, I was failed to run
> the
> >> code.
> >>
> >> Does anybody know how I can upgrade the code to the most recent spark
> >> version by changing the build.sbt file?
> >>
> >> Or do you have any other suggestion?
> >>
> >> Thanks a lot,
> >> Anahita
> >>
> >
>

Re: Upgrade the scala code using the most updated Spark version

Reply via email to