Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

Zhan Zhang Fri, 06 Nov 2015 15:03:10 -0800

I agree with minor change. Adding a config to provide the option to init 
SQLContext or HiveContext, with HiveContext as default instead of bypassing 
when hitting the Exception.


Thanks.

Zhan Zhang

On Nov 6, 2015, at 2:53 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

I would suggest adding a config parameter that allows bypassing initialization 
of HiveContext in case of SQLException

Cheers

On Fri, Nov 6, 2015 at 2:50 PM, Zhan Zhang 
<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>> wrote:
Hi Jerry,

OK. Here is an ugly walk around.

Put a hive-site.xml under $SPARK_HOME/conf with invalid content. You will get a 
bunch of exceptions because hive context initialization failure, but you can 
initialize your SQLContext on your own.

scala>  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = 
org.apache.spark.sql.SQLContext@4a5cc2e8

scala> import sqlContext.implicits._
import sqlContext.implicits._


for example

HW11188:spark zzhang$ more conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>

   <property>

      <name>hive.metastore.uris</name>
    <value>thrift://zzhang-yarn11:9083</value>

   </property>

 </configuration>
HW11188:spark zzhang$

By the way, I don’t know whether there is any caveat for this walk around.

Thanks.

Zhan Zhang





On Nov 6, 2015, at 2:40 PM, Jerry Lam 
<chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote:

Hi Zhan,

I don’t use HiveContext features at all. I use mostly DataFrame API. It is 
sexier and much less typo. :)
Also, HiveContext requires metastore database setup (derby by default). The 
problem is that I cannot have 2 spark-shell sessions running at the same time 
in the same host (e.g. /home/jerry directory). It will give me an exception 
like below.

Since I don’t use HiveContext, I don’t see the need to maintain a database.

What is interesting is that pyspark shell is able to start more than 1 session 
at the same time. I wonder what pyspark has done better than spark-shell?

Best Regards,

Jerry

On Nov 6, 2015, at 5:28 PM, Zhan Zhang 
<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>> wrote:

If you assembly jar have hive jar included, the HiveContext will be used. 
Typically, HiveContext has more functionality than SQLContext. In what case you 
have to use SQLContext that cannot be done by HiveContext?

Thanks.

Zhan Zhang

On Nov 6, 2015, at 10:43 AM, Jerry Lam 
<chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote:

What is interesting is that pyspark shell works fine with multiple session in 
the same host even though multiple HiveContext has been created. What does 
pyspark does differently in terms of starting up the shell?

On Nov 6, 2015, at 12:12 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

In SQLContext.scala :
    // After we have populated SQLConf, we call setConf to populate other confs 
in the subclass
    // (e.g. hiveconf in HiveContext).
    properties.foreach {
      case (key, value) => setConf(key, value)
    }

I don't see config of skipping the above call.

FYI

On Fri, Nov 6, 2015 at 8:53 AM, Jerry Lam 
<chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote:
Hi spark users and developers,

Is it possible to disable HiveContext from being instantiated when using 
spark-shell? I got the following errors when I have more than one session 
starts. Since I don't use HiveContext, it would be great if I can have more 
than 1 spark-shell start at the same time.

Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaS
toreClient
        at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
        at 
org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
        at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
        at 
org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
        at 
org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
        at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
        at 
org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:235)
        at 
org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:234)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:234)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:72)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at 
org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
        at 
org.apache.spark.repl.SparkILoopExt.importSpark(SparkILoopExt.scala:154)
        at 
org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply$mcZ$sp(SparkILoopExt.scala:127)
        at 
org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
        at 
org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)

Best Regards,

Jerry

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

Reply via email to