well it seems to work if set spark.sql.warehouse.dir to /tmp/spark-warehouse in spark-defaults, and it creates it on hdfs.
however can this directory safely be shared between multiple users running jobs? if not then i need to set this per user (instead of single setting in spark-defaults) which means i need to change the jobs, which means an upgrade for a production cluster running many jobs becomes more difficult. or can i create a setting in spark-defaults that includes a reference to the user? something like /tmp/{user}/spark-warehouse? On Thu, Oct 6, 2016 at 6:04 AM, Sean Owen <so...@cloudera.com> wrote: > Yeah I see the same thing. You can fix this by setting > spark.sql.warehouse.dir of course as a workaround. I restarted a > conversation about it at https://github.com/apache/spark/pull/13868# > pullrequestreview-3081020 > > I think the question is whether spark-warehouse is always supposed to be a > local dir, or could be an HDFS dir? a change is needed either way, just > want to clarify what it is. > > > On Thu, Oct 6, 2016 at 5:18 AM Koert Kuipers <ko...@tresata.com> wrote: > >> i just replaced out spark 2.0.0 install on yarn cluster with spark 2.0.1 >> and copied over the configs. >> >> to give it a quick test i started spark-shell and created a dataset. i >> get this: >> >> 16/10/05 23:55:13 WARN spark.SparkContext: Use an existing SparkContext, >> some configuration may not take effect. >> Spark context Web UI available at http://***:4040 >> Spark context available as 'sc' (master = yarn, app id = >> application_1471212701720_1580). >> Spark session available as 'spark'. >> Welcome to >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /___/ .__/\_,_/_/ /_/\_\ version 2.0.1 >> /_/ >> >> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java >> 1.7.0_75) >> Type in expressions to have them evaluated. >> Type :help for more information. >> >> scala> import spark.implicits._ >> import spark.implicits._ >> >> scala> val x = List(1,2,3).toDS >> org.apache.spark.SparkException: Unable to create database default as >> failed to create its directory hdfs://dev/home/koert/spark-warehouse >> at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog. >> liftedTree1$1(InMemoryCatalog.scala:114) >> at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog. >> createDatabase(InMemoryCatalog.scala:108) >> at org.apache.spark.sql.catalyst.catalog.SessionCatalog. >> createDatabase(SessionCatalog.scala:147) >> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>( >> SessionCatalog.scala:89) >> at org.apache.spark.sql.internal.SessionState.catalog$ >> lzycompute(SessionState.scala:95) >> at org.apache.spark.sql.internal.SessionState.catalog( >> SessionState.scala:95) >> at org.apache.spark.sql.internal.SessionState$$anon$1.<init>( >> SessionState.scala:112) >> at org.apache.spark.sql.internal.SessionState.analyzer$ >> lzycompute(SessionState.scala:112) >> at org.apache.spark.sql.internal.SessionState.analyzer( >> SessionState.scala:111) >> at org.apache.spark.sql.execution.QueryExecution. >> assertAnalyzed(QueryExecution.scala:49) >> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161) >> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167) >> at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) >> at org.apache.spark.sql.SparkSession.createDataset( >> SparkSession.scala:423) >> at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:380) >> at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder( >> SQLImplicits.scala:171) >> ... 50 elided >> >> this did not happen in spark 2.0.0 >> the location it is trying to access makes little sense, since it is going >> to hdfs but then it is looking for my local home directory (/home/koert >> exists locally but not on hdfs). >> >> i suspect the issue is SPARK-15899, but i am not sure. in the pullreq for >> that WAREHOUSE_PATH got changed: >> val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") >> val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") >> .doc("The default location for managed databases and tables.") >> .doc("The default location for managed databases and tables.") >> .stringConf >> - .createWithDefault("file:${system:user.dir}/spark-warehouse") >> + .createWithDefault("${system:user.dir}/spark-warehouse") >> >> notice how the file: got removed from the url, causing spark to look on >> hdfs now since it is my default filesystem on the cluster. but >> system:user.dir is still a local home directory. when combining the two you >> get something that doesn't exist. >> >> >>