Also sometimes hitting this Error when spark-shell is used: Caused by: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:770) at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:298) at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:263) at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:97) at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:77) at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:59) at edu.stanford.nlp.pipeline.AnnotatorFactories$4.create(AnnotatorFactories.java:290) ... 114 more Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:765)
On Sun, Sep 18, 2016 at 12:27 PM, janardhan shetty <[email protected]> wrote: > Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11 > > On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty <[email protected] > > wrote: > >> Hi Jacek, >> >> Thanks for your response. This is the code I am trying to execute >> >> import org.apache.spark.sql.functions._ >> import com.databricks.spark.corenlp.functions._ >> >> val inputd = Seq( >> (1, "<xml>Stanford University is located in California. </xml>") >> ).toDF("id", "text") >> >> val output = >> inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)", >> "text") >> >> val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)", >> "text") >> >> output.show() works >> >> Error happens when I execute *out.show()* >> >> >> >> On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowski <[email protected]> >> wrote: >> >>> Hi Jonardhan, >>> >>> Can you share the code that you execute? What's the command? Mind >>> sharing the complete project on github? >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >>> >>> On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty >>> <[email protected]> wrote: >>> > Hi, >>> > >>> > I am trying to use lemmatization as a transformer and added belwo to >>> the >>> > build.sbt >>> > >>> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> > "com.google.protobuf" % "protobuf-java" % "2.6.1", >>> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >>> classifier >>> > "models", >>> > "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>> > >>> > >>> > Error: >>> > Exception in thread "main" java.lang.NoClassDefFoundError: >>> > edu/stanford/nlp/pipeline/StanfordCoreNLP >>> > >>> > I have tried other versions of this spark package. >>> > >>> > Any help is appreciated.. >>> >> >> >
