Thanks Apostolos, I'm trying to avoid standing up HDFS just for this use case (single node).
-Ryan On Wed, Nov 25, 2020 at 8:56 AM Apostolos N. Papadopoulos < papad...@csd.auth.gr> wrote: > Hi Ryan, > > since the driver is at your laptop, in order to access a remote file you > need to specify the url for this I guess. > > For example, when I am using Spark over HDFS I specify the file like > hdfs://blablabla which contains the url where namenode > > can answer. I believe that something similar must be done here. > > all the best, > > Apostolos > > > On 25/11/20 16:51, Ryan Victory wrote: > > Hello! > > > > I have been tearing my hair out trying to solve this problem. Here is > > my setup: > > > > 1. I have Spark running on a server in standalone mode with data on > > the filesystem of the server itself (/opt/data/). > > 2. I have an instance of a Hive Metastore server running (backed by > > MariaDB) on the same server > > 3. I have a laptop where I am developing my spark jobs (Scala) > > > > I have configured Spark to use the metastore and set the warehouse > > directory to be in /opt/data/warehouse/. What I am trying to > > accomplish are a couple of things: > > > > 1. I am trying to submit Spark jobs (via JARs) using spark-submit, but > > have the driver run on my local machine (my laptop). I want the jobs > > to use the data ON THE SERVER and not try to reference it from my > > local machine. If I do something like this: > > > > val df = spark.sql("SELECT * FROM > > parquet.`/opt/data/transactions.parquet`") > > > > I get an error that the path doesn't exist (because it's trying to > > find it on my laptop). If I run the same thing in a spark-shell on the > > spark server itself, there isn't an issue because the driver has > > access to the data. If I submit the job with submit-mode=cluster then > > it works too because the driver is on the cluster. I don't want this, > > I want to get the results on my laptop. > > > > How can I force Spark to read the data from the cluster's filesystem > > and not the driver's? > > > > 2. I have setup a Hive Metastore and created a table (in the spark > > shell on the spark server itself). The data in the warehouse is in the > > local filesystem. When I create a spark application JAR and try to run > > it from my laptop, I get the same problem as #1, namely that it tries > > to find the warehouse directory on my laptop itself. > > > > Am I crazy? Perhaps this isn't a supported way to use Spark? Any help > > or insights are much appreciated! > > > > -Ryan Victory > > -- > Apostolos N. Papadopoulos, Associate Professor > Department of Informatics > Aristotle University of Thessaloniki > Thessaloniki, GREECE > tel: ++0030312310991918 > email: papad...@csd.auth.gr > twitter: @papadopoulos_ap > web: http://datalab.csd.auth.gr/~apostol > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >