Yeah I see the same thing. You can fix this by setting spark.sql.warehouse.dir of course as a workaround. I restarted a conversation about it at https://github.com/apache/spark/pull/13868#pullrequestreview-3081020
I think the question is whether spark-warehouse is always supposed to be a local dir, or could be an HDFS dir? a change is needed either way, just want to clarify what it is. On Thu, Oct 6, 2016 at 5:18 AM Koert Kuipers <ko...@tresata.com> wrote: > i just replaced out spark 2.0.0 install on yarn cluster with spark 2.0.1 > and copied over the configs. > > to give it a quick test i started spark-shell and created a dataset. i get > this: > > 16/10/05 23:55:13 WARN spark.SparkContext: Use an existing SparkContext, > some configuration may not take effect. > Spark context Web UI available at http://***:4040 > Spark context available as 'sc' (master = yarn, app id = > application_1471212701720_1580). > Spark session available as 'spark'. > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.1 > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.7.0_75) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> import spark.implicits._ > import spark.implicits._ > > scala> val x = List(1,2,3).toDS > org.apache.spark.SparkException: Unable to create database default as > failed to create its directory hdfs://dev/home/koert/spark-warehouse > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:114) > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:108) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) > at > org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95) > at > org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) > at > org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112) > at > org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112) > at > org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) > at > org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:423) > at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:380) > at > org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:171) > ... 50 elided > > this did not happen in spark 2.0.0 > the location it is trying to access makes little sense, since it is going > to hdfs but then it is looking for my local home directory (/home/koert > exists locally but not on hdfs). > > i suspect the issue is SPARK-15899, but i am not sure. in the pullreq for > that WAREHOUSE_PATH got changed: > val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") > val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") > .doc("The default location for managed databases and tables.") > .doc("The default location for managed databases and tables.") > .stringConf > - .createWithDefault("file:${system:user.dir}/spark-warehouse") > + .createWithDefault("${system:user.dir}/spark-warehouse") > > notice how the file: got removed from the url, causing spark to look on > hdfs now since it is my default filesystem on the cluster. but > system:user.dir is still a local home directory. when combining the two you > get something that doesn't exist. > > >