Re: separate spark and hive

Reynold Xin Mon, 14 Nov 2016 23:46:30 -0800

If you just start a SparkSession without calling enableHiveSupport it
actually won't use the Hive catalog support.



On Mon, Nov 14, 2016 at 11:44 PM, Mendelson, Assaf <assaf.mendel...@rsa.com>
wrote:

> The default generation of spark context is actually a hive context.
>
> I tried to find on the documentation what are the differences between hive
> context and sql context and couldn’t find it for spark 2.0 (I know for
> previous versions there were a couple of functions which required hive
> context as well as window functions but those seem to have all been fixed
> for spark 2.0).
>
> Furthermore, I can’t seem to find a way to configure spark not to use
> hive. I can only find how to compile it without hive (and having to build
> from source each time is not a good idea for a production system).
>
>
>
> I would suggest that working without hive should be either a simple
> configuration or even the default and that if there is any missing
> functionality it should be documented.
>
> Assaf.
>
>
>
>
>
> *From:* Reynold Xin [mailto:r...@databricks.com]
> *Sent:* Tuesday, November 15, 2016 9:31 AM
> *To:* Mendelson, Assaf
> *Cc:* dev@spark.apache.org
> *Subject:* Re: separate spark and hive
>
>
>
> I agree with the high level idea, and thus SPARK-15691
> <https://issues.apache.org/jira/browse/SPARK-15691>.
>
>
>
> In reality, it's a huge amount of work to create & maintain a custom
> catalog. It might actually make sense to do, but it just seems a lot of
> work to do right now and it'd take a toll on interoperability.
>
>
>
> If you don't need persistent catalog, you can just run Spark without Hive
> mode, can't you?
>
>
>
>
>
>
>
>
>
> On Mon, Nov 14, 2016 at 11:23 PM, assaf.mendelson <assaf.mendel...@rsa.com>
> wrote:
>
> Hi,
>
> Today, we basically force people to use hive if they want to get the full
> use of spark SQL.
>
> When doing the default installation this means that a derby.log and
> metastore_db directory are created where we run from.
>
> The problem with this is that if we run multiple scripts from the same
> working directory we have a problem.
>
> The solution we employ locally is to always run from different directory
> as we ignore hive in practice (this of course means we lose the ability to
> use some of the catalog options in spark session).
>
> The only other solution is to create a full blown hive installation with
> proper configuration (probably for a JDBC solution).
>
>
>
> I would propose that in most cases there shouldn’t be any hive use at all.
> Even for catalog elements such as saving a permanent table, we should be
> able to configure a target directory and simply write to it (doing
> everything file based to avoid the need for locking). Hive should be
> reserved for those who actually use it (probably for backward
> compatibility).
>
>
>
> Am I missing something here?
>
> Assaf.
>
>
> ------------------------------
>
> View this message in context: separate spark and hive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>
>

Re: separate spark and hive

Reply via email to