Great to know about the "spark.sql.catalogImplementation" configuration property. I can't find this anywhere but in Jacek Laskowski's "Mastering Apache Spark 2.0" Gitbook.
I guess we should document on Spark Configuration page On 15 November 2016 at 11:49, Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > You can start a spark without hive support by setting the spark.sql. > catalogImplementation configuration to in-memory, for example: >> >> ./bin/spark-shell --master local[*] --conf spark.sql. >> catalogImplementation=in-memory > > > I would not change the default from Hive to Spark-only just yet. > > On Tue, Nov 15, 2016 at 9:38 AM, assaf.mendelson <assaf.mendel...@rsa.com> > wrote: > >> After looking at the code, I found that spark.sql.catalogImplementation >> is set to “hive”. I would proposed that it should be set to “in-memory” by >> default (or at least have this in the documentation, the configuration >> documentation at http://spark.apache.org/docs/latest/configuration.html >> has no mentioning of hive at all) >> >> Assaf. >> >> >> >> *From:* Mendelson, Assaf >> *Sent:* Tuesday, November 15, 2016 10:11 AM >> *To:* 'rxin [via Apache Spark Developers List]' >> *Subject:* RE: separate spark and hive >> >> >> >> Spark shell (and pyspark) by default create the spark session with hive >> support (also true when the session is created using getOrCreate, at least >> in pyspark) >> >> At a minimum there should be a way to configure it using >> spark-defaults.conf >> >> Assaf. >> >> >> >> *From:* rxin [via Apache Spark Developers List] [[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=19884&i=0>] >> *Sent:* Tuesday, November 15, 2016 9:46 AM >> *To:* Mendelson, Assaf >> *Subject:* Re: separate spark and hive >> >> >> >> If you just start a SparkSession without calling enableHiveSupport it >> actually won't use the Hive catalog support. >> >> >> >> >> >> On Mon, Nov 14, 2016 at 11:44 PM, Mendelson, Assaf <[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=19882&i=0>> wrote: >> >> The default generation of spark context is actually a hive context. >> >> I tried to find on the documentation what are the differences between >> hive context and sql context and couldn’t find it for spark 2.0 (I know for >> previous versions there were a couple of functions which required hive >> context as well as window functions but those seem to have all been fixed >> for spark 2.0). >> >> Furthermore, I can’t seem to find a way to configure spark not to use >> hive. I can only find how to compile it without hive (and having to build >> from source each time is not a good idea for a production system). >> >> >> >> I would suggest that working without hive should be either a simple >> configuration or even the default and that if there is any missing >> functionality it should be documented. >> >> Assaf. >> >> >> >> >> >> *From:* Reynold Xin [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=19882&i=1>] >> *Sent:* Tuesday, November 15, 2016 9:31 AM >> *To:* Mendelson, Assaf >> *Cc:* [hidden email] >> <http:///user/SendEmail.jtp?type=node&node=19882&i=2> >> *Subject:* Re: separate spark and hive >> >> >> >> I agree with the high level idea, and thus SPARK-15691 >> <https://issues.apache.org/jira/browse/SPARK-15691>. >> >> >> >> In reality, it's a huge amount of work to create & maintain a custom >> catalog. It might actually make sense to do, but it just seems a lot of >> work to do right now and it'd take a toll on interoperability. >> >> >> >> If you don't need persistent catalog, you can just run Spark without Hive >> mode, can't you? >> >> >> >> >> >> >> >> >> >> On Mon, Nov 14, 2016 at 11:23 PM, assaf.mendelson <[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=19882&i=3>> wrote: >> >> Hi, >> >> Today, we basically force people to use hive if they want to get the full >> use of spark SQL. >> >> When doing the default installation this means that a derby.log and >> metastore_db directory are created where we run from. >> >> The problem with this is that if we run multiple scripts from the same >> working directory we have a problem. >> >> The solution we employ locally is to always run from different directory >> as we ignore hive in practice (this of course means we lose the ability to >> use some of the catalog options in spark session). >> >> The only other solution is to create a full blown hive installation with >> proper configuration (probably for a JDBC solution). >> >> >> >> I would propose that in most cases there shouldn’t be any hive use at >> all. Even for catalog elements such as saving a permanent table, we should >> be able to configure a target directory and simply write to it (doing >> everything file based to avoid the need for locking). Hive should be >> reserved for those who actually use it (probably for backward >> compatibility). >> >> >> >> Am I missing something here? >> >> Assaf. >> >> >> ------------------------------ >> >> View this message in context: separate spark and hive >> <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879.html> >> Sent from the Apache Spark Developers List mailing list archive >> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at >> Nabble.com. >> >> >> >> >> >> >> ------------------------------ >> >> *If you reply to this email, your message will be added to the discussion >> below:* >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/ >> separate-spark-and-hive-tp19879p19882.html >> >> To start a new topic under Apache Spark Developers List, email [hidden >> email] <http:///user/SendEmail.jtp?type=node&node=19884&i=1> >> To unsubscribe from Apache Spark Developers List, click here. >> NAML >> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> >> ------------------------------ >> View this message in context: RE: separate spark and hive >> <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879p19884.html> >> >> Sent from the Apache Spark Developers List mailing list archive >> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at >> Nabble.com. >> > >