Yeah I see the same thing. You can fix this by setting
spark.sql.warehouse.dir of course as a workaround. I restarted a
conversation about it at
https://github.com/apache/spark/pull/13868#pullrequestreview-3081020

I think the question is whether spark-warehouse is always supposed to be a
local dir, or could be an HDFS dir? a change is needed either way, just
want to clarify what it is.

On Thu, Oct 6, 2016 at 5:18 AM Koert Kuipers <ko...@tresata.com> wrote:

> i just replaced out spark 2.0.0 install on yarn cluster with spark 2.0.1
> and copied over the configs.
>
> to give it a quick test i started spark-shell and created a dataset. i get
> this:
>
> 16/10/05 23:55:13 WARN spark.SparkContext: Use an existing SparkContext,
> some configuration may not take effect.
> Spark context Web UI available at http://***:4040
> Spark context available as 'sc' (master = yarn, app id =
> application_1471212701720_1580).
> Spark session available as 'spark'.
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
>       /_/
>
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_75)
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> import spark.implicits._
> import spark.implicits._
>
> scala> val x = List(1,2,3).toDS
> org.apache.spark.SparkException: Unable to create database default as
> failed to create its directory hdfs://dev/home/koert/spark-warehouse
>   at
> org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:114)
>   at
> org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:108)
>   at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
>   at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
>   at
> org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
>   at
> org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
>   at
> org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
>   at
> org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
>   at
> org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
>   at
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161)
>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
>   at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59)
>   at
> org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:423)
>   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:380)
>   at
> org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:171)
>   ... 50 elided
>
> this did not happen in spark 2.0.0
> the location it is trying to access makes little sense, since it is going
> to hdfs but then it is looking for my local home directory (/home/koert
> exists locally but not on hdfs).
>
> i suspect the issue is SPARK-15899, but i am not sure. in the pullreq for
> that WAREHOUSE_PATH got changed:
>    val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
>    val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
>      .doc("The default location for managed databases and tables.")
>      .doc("The default location for managed databases and tables.")
>      .stringConf
>  -    .createWithDefault("file:${system:user.dir}/spark-warehouse")
>  +    .createWithDefault("${system:user.dir}/spark-warehouse")
>
> notice how the file: got removed from the url, causing spark to look on
> hdfs now since it is my default filesystem on the cluster. but
> system:user.dir is still a local home directory. when combining the two you
> get something that doesn't exist.
>
>
>

Reply via email to