Hi, I am trying to query a Spark RDD using the HiveThriftServer2.startWithContext functionality and getting the following Exception:
15/05/19 13:26:43 WARN thrift.ThriftCLIService: Error executing statement: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:84) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) at com.sun.proxy.$Proxy27.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:237) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:392) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.conf.HiveConf.getIntVar(HiveConf.java:1259) at org.apache.hive.service.cli.log.LogManager.createNewOperationLog(LogManager.java:101) at org.apache.hive.service.cli.log.LogManager.getOperationLogByOperation(LogManager.java:156) at org.apache.hive.service.cli.log.LogManager.registerCurrentThread(LogManager.java:120) at org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:714) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) ... 19 more Code: import org.apache.spark._ import org.apache.spark.sql._ import org.apache.spark.SparkContext._ import org.apache.spark.sql.hive._ object FederatedQueryTest { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName("FederatedQueryTest") val sc = new SparkContext(sparkConf) ... val hoursAug = sqlContext.sql("SELECT H.Hour, H.DataStore, H.SchemaName, H.TableName, H.ColumnName, H.EventAffectedCount, H.EventCount, " + "U.USERNAME, U.USERGROUP, U.LOCATION, U.DEPARTMENT " + "FROM HOURS H " + "JOIN USERS U " + "ON H.User = U.USERNAME") hoursAug.registerTempTable("HOURS_AUGM") hoursAug.show(100) import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) } } Environment: CDH 5.3 Spark 1.3.0 (upgraded from the default Spark 1.2.0 on CDH 5.3) Hive Metastore is in MySQL Configuration steps: 1. Rebuilt Spark with Hive support using the command: mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package 2. Replaced Spark Assembly jar with the result of the build. 3. Placed hive-site.xml into Spark conf directory. 4. Using Beeline to work with Spark Thrift Server. The "connect" command passes successfully, but any "select" or "show tables" command results in the Null Pointer Exception with the stack trace as shown above. However, when starting Spark Thrift Server from command line using /usr/lib/spark/sbin/start-thriftserver.sh, I am able to see and query Hive tables. Can you please help me to understand why is HiveThriftServer2.startWithContext functionality not working successfully? Thanks! Dmitriy Fingerman -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-querying-RDD-using-HiveThriftServer2-startWithContext-functionality-tp22947.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org