Hi Ryan,
since the driver is at your laptop, in order to access a remote file you
need to specify the url for this I guess.
For example, when I am using Spark over HDFS I specify the file like
hdfs://blablabla which contains the url where namenode
can answer. I believe that something similar must be done here.
all the best,
Apostolos
On 25/11/20 16:51, Ryan Victory wrote:
Hello!
I have been tearing my hair out trying to solve this problem. Here is
my setup:
1. I have Spark running on a server in standalone mode with data on
the filesystem of the server itself (/opt/data/).
2. I have an instance of a Hive Metastore server running (backed by
MariaDB) on the same server
3. I have a laptop where I am developing my spark jobs (Scala)
I have configured Spark to use the metastore and set the warehouse
directory to be in /opt/data/warehouse/. What I am trying to
accomplish are a couple of things:
1. I am trying to submit Spark jobs (via JARs) using spark-submit, but
have the driver run on my local machine (my laptop). I want the jobs
to use the data ON THE SERVER and not try to reference it from my
local machine. If I do something like this:
val df = spark.sql("SELECT * FROM
parquet.`/opt/data/transactions.parquet`")
I get an error that the path doesn't exist (because it's trying to
find it on my laptop). If I run the same thing in a spark-shell on the
spark server itself, there isn't an issue because the driver has
access to the data. If I submit the job with submit-mode=cluster then
it works too because the driver is on the cluster. I don't want this,
I want to get the results on my laptop.
How can I force Spark to read the data from the cluster's filesystem
and not the driver's?
2. I have setup a Hive Metastore and created a table (in the spark
shell on the spark server itself). The data in the warehouse is in the
local filesystem. When I create a spark application JAR and try to run
it from my laptop, I get the same problem as #1, namely that it tries
to find the warehouse directory on my laptop itself.
Am I crazy? Perhaps this isn't a supported way to use Spark? Any help
or insights are much appreciated!
-Ryan Victory
--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: [email protected]
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]