Problem querying RDD using HiveThriftServer2.startWithContext functionality

fdmitriy Tue, 19 May 2015 11:01:43 -0700

Hi,

I am trying to query a Spark RDD using the
HiveThriftServer2.startWithContext functionality and getting the following
Exception:


15/05/19 13:26:43 WARN thrift.ThriftCLIService: Error executing statement: 
java.lang.RuntimeException: java.lang.NullPointerException
        at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:84)
        at
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
        at
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
        at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
        at com.sun.proxy.$Proxy27.executeStatementAsync(Unknown Source)
        at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:237)
        at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:392)
        at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
        at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
        at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.conf.HiveConf.getIntVar(HiveConf.java:1259)
        at
org.apache.hive.service.cli.log.LogManager.createNewOperationLog(LogManager.java:101)
        at
org.apache.hive.service.cli.log.LogManager.getOperationLogByOperation(LogManager.java:156)
        at
org.apache.hive.service.cli.log.LogManager.registerCurrentThread(LogManager.java:120)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:714)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
        ... 19 more

Code:

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.SparkContext._
import  org.apache.spark.sql.hive._

object FederatedQueryTest
{
  
  def main(args: Array[String]) 
  {
    val sparkConf = new SparkConf().setAppName("FederatedQueryTest")
    val sc =  new SparkContext(sparkConf)
    ...
    val hoursAug = sqlContext.sql("SELECT H.Hour, H.DataStore, H.SchemaName,
H.TableName, H.ColumnName, H.EventAffectedCount, H.EventCount, " +
                                  "U.USERNAME, U.USERGROUP, U.LOCATION,
U.DEPARTMENT " +
                                  "FROM HOURS H                                 
 
" +
                                  "JOIN USERS U                                 
 
" + 
                                  "ON H.User = U.USERNAME")

    hoursAug.registerTempTable("HOURS_AUGM")
    hoursAug.show(100)
    
    import  org.apache.spark.sql.hive.thriftserver._
    HiveThriftServer2.startWithContext(sqlContext)
  }
}

Environment:

CDH 5.3
Spark 1.3.0 (upgraded from the default Spark 1.2.0 on CDH 5.3)
Hive Metastore is in MySQL

Configuration steps:

1. Rebuilt Spark with Hive support using the command:
    mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive
-Phive-thriftserver -DskipTests clean package
2. Replaced Spark Assembly jar with the result of the build.
3. Placed hive-site.xml into Spark conf directory.
4. Using Beeline to work with Spark Thrift Server. 
    The "connect" command passes successfully, but any "select" or "show
tables" command results in the Null Pointer Exception with the stack trace
as shown above. However, when starting Spark Thrift Server from command line
using /usr/lib/spark/sbin/start-thriftserver.sh, I am able to see and query
Hive tables.

Can you please help me to understand why is
HiveThriftServer2.startWithContext functionality not working successfully?

Thanks!
Dmitriy Fingerman



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-querying-RDD-using-HiveThriftServer2-startWithContext-functionality-tp22947.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Problem querying RDD using HiveThriftServer2.startWithContext functionality

Reply via email to