Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

Jerry Lam Fri, 06 Nov 2015 15:02:05 -0800

Hi Zhan,

Thank you for providing a workaround! 
I will try this out but I agree with Ted, there should be a better way to 
capture the exception and handle it by just initializing SQLContext instead of 
HiveContext. WARN the user that something is wrong with his hive setup.


Having spark.sql.hive.enabled false configuration would be lovely too. :)
Just an additional bonus is that it requires less memory if we don’t use 
HiveContext on the driver side (~100-200MB) from a rough observation. 

Thanks and have a nice weekend!

Jerry


> On Nov 6, 2015, at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> I would suggest adding a config parameter that allows bypassing 
> initialization of HiveContext in case of SQLException
> 
> Cheers
> 
> On Fri, Nov 6, 2015 at 2:50 PM, Zhan Zhang <zzh...@hortonworks.com 
> <mailto:zzh...@hortonworks.com>> wrote:
> Hi Jerry,
> 
> OK. Here is an ugly walk around.
> 
> Put a hive-site.xml under $SPARK_HOME/conf with invalid content. You will get 
> a bunch of exceptions because hive context initialization failure, but you 
> can initialize your SQLContext on your own.
> 
> scala>  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> sqlContext: org.apache.spark.sql.SQLContext = 
> org.apache.spark.sql.SQLContext@4a5cc2e8
> 
> scala> import sqlContext.implicits._
> import sqlContext.implicits._
> 
> 
> for example
> 
> HW11188:spark zzhang$ more conf/hive-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>  <configuration>
> 
>    <property>
> 
>       <name>hive.metastore.uris</name>
>     <value>thrift://zzhang-yarn11:9083</value> <>
> 
>    </property>
> 
>  </configuration>
> HW11188:spark zzhang$
> 
> By the way, I don’t know whether there is any caveat for this walk around.
> 
> Thanks.
> 
> Zhan Zhang
> 
> 
> 
> 
> 
> On Nov 6, 2015, at 2:40 PM, Jerry Lam <chiling...@gmail.com 
> <mailto:chiling...@gmail.com>> wrote:
> 
>> Hi Zhan,
>> 
>> I don’t use HiveContext features at all. I use mostly DataFrame API. It is 
>> sexier and much less typo. :)
>> Also, HiveContext requires metastore database setup (derby by default). The 
>> problem is that I cannot have 2 spark-shell sessions running at the same 
>> time in the same host (e.g. /home/jerry directory). It will give me an 
>> exception like below. 
>> 
>> Since I don’t use HiveContext, I don’t see the need to maintain a database. 
>> 
>> What is interesting is that pyspark shell is able to start more than 1 
>> session at the same time. I wonder what pyspark has done better than 
>> spark-shell?
>> 
>> Best Regards,
>> 
>> Jerry
>> 
>>> On Nov 6, 2015, at 5:28 PM, Zhan Zhang <zzh...@hortonworks.com 
>>> <mailto:zzh...@hortonworks.com>> wrote:
>>> 
>>> If you assembly jar have hive jar included, the HiveContext will be used. 
>>> Typically, HiveContext has more functionality than SQLContext. In what case 
>>> you have to use SQLContext that cannot be done by HiveContext?
>>> 
>>> Thanks.
>>> 
>>> Zhan Zhang
>>> 
>>> On Nov 6, 2015, at 10:43 AM, Jerry Lam <chiling...@gmail.com 
>>> <mailto:chiling...@gmail.com>> wrote:
>>> 
>>>> What is interesting is that pyspark shell works fine with multiple session 
>>>> in the same host even though multiple HiveContext has been created. What 
>>>> does pyspark does differently in terms of starting up the shell?
>>>> 
>>>>> On Nov 6, 2015, at 12:12 PM, Ted Yu <yuzhih...@gmail.com 
>>>>> <mailto:yuzhih...@gmail.com>> wrote:
>>>>> 
>>>>> In SQLContext.scala :
>>>>>     // After we have populated SQLConf, we call setConf to populate other 
>>>>> confs in the subclass
>>>>>     // (e.g. hiveconf in HiveContext).
>>>>>     properties.foreach {
>>>>>       case (key, value) => setConf(key, value)
>>>>>     }
>>>>> 
>>>>> I don't see config of skipping the above call.
>>>>> 
>>>>> FYI
>>>>> 
>>>>> On Fri, Nov 6, 2015 at 8:53 AM, Jerry Lam <chiling...@gmail.com 
>>>>> <mailto:chiling...@gmail.com>> wrote:
>>>>> Hi spark users and developers,
>>>>> 
>>>>> Is it possible to disable HiveContext from being instantiated when using 
>>>>> spark-shell? I got the following errors when I have more than one session 
>>>>> starts. Since I don't use HiveContext, it would be great if I can have 
>>>>> more than 1 spark-shell start at the same time. 
>>>>> 
>>>>> Exception in thread "main" java.lang.RuntimeException: 
>>>>> java.lang.RuntimeException: Unable to instantiate 
>>>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaS
>>>>> toreClient
>>>>>         at 
>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
>>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>>>>> Method)
>>>>>         at 
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>>>         at 
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
>>>>>         at 
>>>>> org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:235)
>>>>>         at 
>>>>> org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:234)
>>>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>         at 
>>>>> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>>>>>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>>>>>         at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:234)
>>>>>         at 
>>>>> org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:72)
>>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>>>>> Method)
>>>>>         at 
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>>>         at 
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>>>         at 
>>>>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
>>>>>         at 
>>>>> org.apache.spark.repl.SparkILoopExt.importSpark(SparkILoopExt.scala:154)
>>>>>         at 
>>>>> org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply$mcZ$sp(SparkILoopExt.scala:127)
>>>>>         at 
>>>>> org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
>>>>>         at 
>>>>> org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> Jerry
>>>>> 
>>>> 
>>> 
>> 
> 
>

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

Reply via email to