If you just start a SparkSession without calling enableHiveSupport it actually won't use the Hive catalog support.
On Mon, Nov 14, 2016 at 11:44 PM, Mendelson, Assaf <assaf.mendel...@rsa.com> wrote: > The default generation of spark context is actually a hive context. > > I tried to find on the documentation what are the differences between hive > context and sql context and couldn’t find it for spark 2.0 (I know for > previous versions there were a couple of functions which required hive > context as well as window functions but those seem to have all been fixed > for spark 2.0). > > Furthermore, I can’t seem to find a way to configure spark not to use > hive. I can only find how to compile it without hive (and having to build > from source each time is not a good idea for a production system). > > > > I would suggest that working without hive should be either a simple > configuration or even the default and that if there is any missing > functionality it should be documented. > > Assaf. > > > > > > *From:* Reynold Xin [mailto:r...@databricks.com] > *Sent:* Tuesday, November 15, 2016 9:31 AM > *To:* Mendelson, Assaf > *Cc:* dev@spark.apache.org > *Subject:* Re: separate spark and hive > > > > I agree with the high level idea, and thus SPARK-15691 > <https://issues.apache.org/jira/browse/SPARK-15691>. > > > > In reality, it's a huge amount of work to create & maintain a custom > catalog. It might actually make sense to do, but it just seems a lot of > work to do right now and it'd take a toll on interoperability. > > > > If you don't need persistent catalog, you can just run Spark without Hive > mode, can't you? > > > > > > > > > > On Mon, Nov 14, 2016 at 11:23 PM, assaf.mendelson <assaf.mendel...@rsa.com> > wrote: > > Hi, > > Today, we basically force people to use hive if they want to get the full > use of spark SQL. > > When doing the default installation this means that a derby.log and > metastore_db directory are created where we run from. > > The problem with this is that if we run multiple scripts from the same > working directory we have a problem. > > The solution we employ locally is to always run from different directory > as we ignore hive in practice (this of course means we lose the ability to > use some of the catalog options in spark session). > > The only other solution is to create a full blown hive installation with > proper configuration (probably for a JDBC solution). > > > > I would propose that in most cases there shouldn’t be any hive use at all. > Even for catalog elements such as saving a permanent table, we should be > able to configure a target directory and simply write to it (doing > everything file based to avoid the need for locking). Hive should be > reserved for those who actually use it (probably for backward > compatibility). > > > > Am I missing something here? > > Assaf. > > > ------------------------------ > > View this message in context: separate spark and hive > <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > > >