Hi users,
I hope this is a simple one and you can help me 😊
I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS EMR 
(emr-5.4.0). I notice that the %dep interpreter is not available on AWS EMR so 
I can’t use that option.

I follow these instructions to add the dependency: 
https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html

I want to add the databricks spark-xml package for importing xml files to 
dataframes:  https://github.com/databricks/spark-xml

This is the groupId:artifactId:version:
com.databricks:spark-xml_2.11:0.4.1

In Zeppelin, when I go to edit spark interpreter,
*I enter  com.databricks:spark-xml_2.11:0.4.1 to the artifact field
*click save
*and then when I click OK to this dialog “Do you want to update this 
interpreter and restart with new settings – cancel | OK” click OK does nothing, 
the dialog stays on screen.

I assume this is writing dependency to spark group in the interpreter.json, is 
that correct? I tried altering write permissions for that file but didn’t help.

I confirm this is correct for my Spark/Scala version by running spark-shell, 
and since this works I assume I don’t need to add any additional maven repo.
Maybe I do need new repo?
Maybe I need to put the jar in my local repo? Interpreter.json says my local 
repo is /var/lib/zeppelin/.m2/repository but this directory does not exist.


I can use this package from spark shell successfully:

$spark-shell --packages com.databricks:spark-xml_2.11:0.4.1
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
                    .format("com.databricks.spark.xml")
…



[zipMoney Logo]





David Howell
Data Engineering

+61 477 150 379



[Facebook link]<https://www.facebook.com/ZipmoneyAU/?fref=ts>

[Twitter link]<https://twitter.com/zipmoneyau>

[Instagram link]<https://www.instagram.com/zipmoneyau/?hl=en>

[Linkedin link]<https://www.linkedin.com/company/zipmoney>





Reply via email to