Hello, everybody! May be it's not a reason of your problem, but I've noticed the line in your commentaries: *java version "1.8.0_51"*
It's strongly advised to use Java 1.8.0_66+ I use even Java 1.8.0_101 On Tue, Sep 20, 2016 at 1:09 AM, janardhan shetty <janardhan...@gmail.com> wrote: > Yes Sujit I have tried that option as well. > Also tried sbt assembly but hitting below issue: > > http://stackoverflow.com/questions/35197120/java-outofmemory > error-on-sbt-assembly > > Just wondering if there any clean approach to include StanfordCoreNLP > classes in spark ML ? > > > On Mon, Sep 19, 2016 at 1:41 PM, Sujit Pal <sujitatgt...@gmail.com> wrote: > >> Hi Janardhan, >> >> You need the classifier "models" attribute on the second entry for >> stanford-corenlp to indicate that you want the models JAR, as shown below. >> Right now you are importing two instances of stanford-corenlp JARs. >> >> libraryDependencies ++= { >> val sparkVersion = "2.0.0" >> Seq( >> "org.apache.spark" %% "spark-core" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >> "com.google.protobuf" % "protobuf-java" % "2.6.1", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models", >> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >> ) >> } >> >> -sujit >> >> >> On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty <janardhan...@gmail.com >> > wrote: >> >>> Hi Sujit, >>> >>> Tried that option but same error: >>> >>> java version "1.8.0_51" >>> >>> >>> libraryDependencies ++= { >>> val sparkVersion = "2.0.0" >>> Seq( >>> "org.apache.spark" %% "spark-core" % sparkVersion % "provided", >>> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", >>> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", >>> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> "com.google.protobuf" % "protobuf-java" % "2.6.1", >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>> ) >>> } >>> >>> Error: >>> >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> edu/stanford/nlp/pipeline/StanfordCoreNLP >>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap >>> ply(Lemmatizer.scala:37) >>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap >>> ply(Lemmatizer.scala:33) >>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ >>> 2.apply(ScalaUDF.scala:88) >>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ >>> 2.apply(ScalaUDF.scala:87) >>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(Scal >>> aUDF.scala:1060) >>> at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedEx >>> pressions.scala:142) >>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject >>> ion.apply(Projection.scala:45) >>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject >>> ion.apply(Projection.scala:29) >>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>> sableLike.scala:234) >>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>> sableLike.scala:234) >>> at scala.collection.immutable.List.foreach(List.scala:381) >>> at scala.collection.TraversableLike$class.map(TraversableLike.s >>> cala:234) >>> >>> >>> >>> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal <sujitatgt...@gmail.com> >>> wrote: >>> >>>> Hi Janardhan, >>>> >>>> Maybe try removing the string "test" from this line in your build.sbt? >>>> IIRC, this restricts the models JAR to be called from a test. >>>> >>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >>>> classifier "models", >>>> >>>> -sujit >>>> >>>> >>>> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty < >>>> janardhan...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to use lemmatization as a transformer and added belwo to >>>>> the build.sbt >>>>> >>>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>>>> "com.google.protobuf" % "protobuf-java" % "2.6.1", >>>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >>>>> classifier "models", >>>>> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>>>> >>>>> >>>>> Error: >>>>> *Exception in thread "main" java.lang.NoClassDefFoundError: >>>>> edu/stanford/nlp/pipeline/StanfordCoreNLP* >>>>> >>>>> I have tried other versions of this spark package. >>>>> >>>>> Any help is appreciated.. >>>>> >>>> >>>> >>> >> >