I have my dev environment on my Mac. I have a dev Spark server on a freshly 
installed physical Ubuntu box.

I had some connection issues, but it is now all fine.

In my code, running on the Mac, I have:

        1       SparkConf conf = new 
SparkConf().setAppName("myapp").setMaster("spark://10.0.100.120:7077");
        2       JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
        3       javaSparkContext.setLogLevel("WARN");
        4       SQLContext sqlContext = new SQLContext(javaSparkContext);
        5
        6       // Restaurant Data
        7       df = sqlContext.read().option("dateFormat", 
"yyyy-mm-dd").json(source.getLocalStorage());


1) Clarification question: This code runs on my mac, connects to the server, 
but line #7 assumes the file is on my mac, not on the server, right?

2) On line 7, I get an exception:

16-07-10 22:20:04:143 DEBUG  - address: jgp-MacBook-Air.local/10.0.100.100 
isLoopbackAddress: false, with host 10.0.100.100 jgp-MacBook-Air.local
16-07-10 22:20:04:240 INFO 
org.apache.spark.sql.execution.datasources.json.JSONRelation - Listing 
file:/Users/jgp/Documents/Data/restaurants-data.json on driver
16-07-10 22:20:04:288 DEBUG org.apache.hadoop.util.Shell - Failed to detect a 
valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
        at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:225)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:250)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:447)
        at 
org.apache.spark.sql.execution.datasources.json.JSONRelation.org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd(JSONRelation.scala:98)
        at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4$$anonfun$apply$1.apply(JSONRelation.scala:115)
        at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4$$anonfun$apply$1.apply(JSONRelation.scala:115)
        at scala.Option.getOrElse(Option.scala:120)
        at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:115)
        at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:109)
        at scala.Option.getOrElse(Option.scala:120)

Do I have to install HADOOP on the server? - I imagine that from:
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.

TIA,

jg

Reply via email to