Re: separate spark and hive

Ricardo Almeida Wed, 16 Nov 2016 05:08:49 -0800

Great to know about the "spark.sql.catalogImplementation" configuration
property.
I can't find this anywhere but in Jacek Laskowski's "Mastering Apache Spark
2.0" Gitbook.


I guess we should document on Spark Configuration page

On 15 November 2016 at 11:49, Herman van Hövell tot Westerflier <
hvanhov...@databricks.com> wrote:

> You can start a spark without hive support by setting the spark.sql.
> catalogImplementation configuration to in-memory, for example:
>>
>> ./bin/spark-shell --master local[*] --conf spark.sql.
>> catalogImplementation=in-memory
>
>
> I would not change the default from Hive to Spark-only just yet.
>
> On Tue, Nov 15, 2016 at 9:38 AM, assaf.mendelson <assaf.mendel...@rsa.com>
> wrote:
>
>> After looking at the code, I found that spark.sql.catalogImplementation
>> is set to “hive”. I would proposed that it should be set to “in-memory” by
>> default (or at least have this in the documentation, the configuration
>> documentation at http://spark.apache.org/docs/latest/configuration.html
>> has no mentioning of hive at all)
>>
>> Assaf.
>>
>>
>>
>> *From:* Mendelson, Assaf
>> *Sent:* Tuesday, November 15, 2016 10:11 AM
>> *To:* 'rxin [via Apache Spark Developers List]'
>> *Subject:* RE: separate spark and hive
>>
>>
>>
>> Spark shell (and pyspark) by default create the spark session with hive
>> support (also true when the session is created using getOrCreate, at least
>> in pyspark)
>>
>> At a minimum there should be a way to configure it using
>> spark-defaults.conf
>>
>> Assaf.
>>
>>
>>
>> *From:* rxin [via Apache Spark Developers List] [[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=19884&i=0>]
>> *Sent:* Tuesday, November 15, 2016 9:46 AM
>> *To:* Mendelson, Assaf
>> *Subject:* Re: separate spark and hive
>>
>>
>>
>> If you just start a SparkSession without calling enableHiveSupport it
>> actually won't use the Hive catalog support.
>>
>>
>>
>>
>>
>> On Mon, Nov 14, 2016 at 11:44 PM, Mendelson, Assaf <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=19882&i=0>> wrote:
>>
>> The default generation of spark context is actually a hive context.
>>
>> I tried to find on the documentation what are the differences between
>> hive context and sql context and couldn’t find it for spark 2.0 (I know for
>> previous versions there were a couple of functions which required hive
>> context as well as window functions but those seem to have all been fixed
>> for spark 2.0).
>>
>> Furthermore, I can’t seem to find a way to configure spark not to use
>> hive. I can only find how to compile it without hive (and having to build
>> from source each time is not a good idea for a production system).
>>
>>
>>
>> I would suggest that working without hive should be either a simple
>> configuration or even the default and that if there is any missing
>> functionality it should be documented.
>>
>> Assaf.
>>
>>
>>
>>
>>
>> *From:* Reynold Xin [mailto:[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=19882&i=1>]
>> *Sent:* Tuesday, November 15, 2016 9:31 AM
>> *To:* Mendelson, Assaf
>> *Cc:* [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=19882&i=2>
>> *Subject:* Re: separate spark and hive
>>
>>
>>
>> I agree with the high level idea, and thus SPARK-15691
>> <https://issues.apache.org/jira/browse/SPARK-15691>.
>>
>>
>>
>> In reality, it's a huge amount of work to create & maintain a custom
>> catalog. It might actually make sense to do, but it just seems a lot of
>> work to do right now and it'd take a toll on interoperability.
>>
>>
>>
>> If you don't need persistent catalog, you can just run Spark without Hive
>> mode, can't you?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 14, 2016 at 11:23 PM, assaf.mendelson <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=19882&i=3>> wrote:
>>
>> Hi,
>>
>> Today, we basically force people to use hive if they want to get the full
>> use of spark SQL.
>>
>> When doing the default installation this means that a derby.log and
>> metastore_db directory are created where we run from.
>>
>> The problem with this is that if we run multiple scripts from the same
>> working directory we have a problem.
>>
>> The solution we employ locally is to always run from different directory
>> as we ignore hive in practice (this of course means we lose the ability to
>> use some of the catalog options in spark session).
>>
>> The only other solution is to create a full blown hive installation with
>> proper configuration (probably for a JDBC solution).
>>
>>
>>
>> I would propose that in most cases there shouldn’t be any hive use at
>> all. Even for catalog elements such as saving a permanent table, we should
>> be able to configure a target directory and simply write to it (doing
>> everything file based to avoid the need for locking). Hive should be
>> reserved for those who actually use it (probably for backward
>> compatibility).
>>
>>
>>
>> Am I missing something here?
>>
>> Assaf.
>>
>>
>> ------------------------------
>>
>> View this message in context: separate spark and hive
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879.html>
>> Sent from the Apache Spark Developers List mailing list archive
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
>> Nabble.com.
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> *If you reply to this email, your message will be added to the discussion
>> below:*
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>> separate-spark-and-hive-tp19879p19882.html
>>
>> To start a new topic under Apache Spark Developers List, email [hidden
>> email] <http:///user/SendEmail.jtp?type=node&node=19884&i=1>
>> To unsubscribe from Apache Spark Developers List, click here.
>> NAML
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> ------------------------------
>> View this message in context: RE: separate spark and hive
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879p19884.html>
>>
>> Sent from the Apache Spark Developers List mailing list archive
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
>> Nabble.com.
>>
>
>

Re: separate spark and hive

Reply via email to