I agree with minor change. Adding a config to provide the option to init SQLContext or HiveContext, with HiveContext as default instead of bypassing when hitting the Exception.
Thanks. Zhan Zhang On Nov 6, 2015, at 2:53 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: I would suggest adding a config parameter that allows bypassing initialization of HiveContext in case of SQLException Cheers On Fri, Nov 6, 2015 at 2:50 PM, Zhan Zhang <zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>> wrote: Hi Jerry, OK. Here is an ugly walk around. Put a hive-site.xml under $SPARK_HOME/conf with invalid content. You will get a bunch of exceptions because hive context initialization failure, but you can initialize your SQLContext on your own. scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4a5cc2e8 scala> import sqlContext.implicits._ import sqlContext.implicits._ for example HW11188:spark zzhang$ more conf/hive-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://zzhang-yarn11:9083</value> </property> </configuration> HW11188:spark zzhang$ By the way, I don’t know whether there is any caveat for this walk around. Thanks. Zhan Zhang On Nov 6, 2015, at 2:40 PM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: Hi Zhan, I don’t use HiveContext features at all. I use mostly DataFrame API. It is sexier and much less typo. :) Also, HiveContext requires metastore database setup (derby by default). The problem is that I cannot have 2 spark-shell sessions running at the same time in the same host (e.g. /home/jerry directory). It will give me an exception like below. Since I don’t use HiveContext, I don’t see the need to maintain a database. What is interesting is that pyspark shell is able to start more than 1 session at the same time. I wonder what pyspark has done better than spark-shell? Best Regards, Jerry On Nov 6, 2015, at 5:28 PM, Zhan Zhang <zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>> wrote: If you assembly jar have hive jar included, the HiveContext will be used. Typically, HiveContext has more functionality than SQLContext. In what case you have to use SQLContext that cannot be done by HiveContext? Thanks. Zhan Zhang On Nov 6, 2015, at 10:43 AM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: What is interesting is that pyspark shell works fine with multiple session in the same host even though multiple HiveContext has been created. What does pyspark does differently in terms of starting up the shell? On Nov 6, 2015, at 12:12 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: In SQLContext.scala : // After we have populated SQLConf, we call setConf to populate other confs in the subclass // (e.g. hiveconf in HiveContext). properties.foreach { case (key, value) => setConf(key, value) } I don't see config of skipping the above call. FYI On Fri, Nov 6, 2015 at 8:53 AM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: Hi spark users and developers, Is it possible to disable HiveContext from being instantiated when using spark-shell? I got the following errors when I have more than one session starts. Since I don't use HiveContext, it would be great if I can have more than 1 spark-shell start at the same time. Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaS toreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183) at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179) at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185) at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392) at org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:235) at org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:234) at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:72) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028) at org.apache.spark.repl.SparkILoopExt.importSpark(SparkILoopExt.scala:154) at org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply$mcZ$sp(SparkILoopExt.scala:127) at org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113) at org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113) Best Regards, Jerry