RE: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

james.gre...@baesystems.com Wed, 27 Jan 2016 07:22:26 -0800

Thanks Yin,  here are the logs:



INFO  SparkContext - Added JAR file:/home/jegreen1/mms/zookeeper-3.4.6.jar at 
http://10.39.65.122:38933/jars/zookeeper-3.4.6.jar with timestamp 1453907484092
INFO  SparkContext - Added JAR 
file:/home/jegreen1/mms/mms-http-0.2-SNAPSHOT.jar at 
http://10.39.65.122:38933/jars/mms-http-0.2-SNAPSHOT.jar with timestamp 
1453907484093
INFO  Executor - Starting executor ID driver on host localhost
INFO  Utils - Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 41220.
INFO  NettyBlockTransferService - Server created on 41220
INFO  BlockManagerMaster - Trying to register BlockManager
INFO  BlockManagerMasterEndpoint - Registering block manager localhost:41220 
with 511.1 MB RAM, BlockManagerId(driver, localhost, 41220)
INFO  BlockManagerMaster - Registered BlockManager
INFO  HiveContext - Initializing execution hive, version 1.2.1
INFO  ClientWrapper - Inspected Hadoop version: 2.6.0
INFO  ClientWrapper - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for 
Hadoop version 2.6.0
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not 
exist
INFO  HiveMetaStore - 0: Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore
INFO  ObjectStore - ObjectStore, initialize called
INFO  Persistence - Property hive.metastore.integral.jdo.pushdown unknown - 
will be ignored
INFO  Persistence - Property datanucleus.cache.level2 unknown - will be ignored
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not 
exist
INFO  ObjectStore - Setting MetaStore object pin classes with 
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
INFO  Datastore - The class 
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
"embedded-only" so does not have its own datastore table.
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is 
tagged as "embedded-only" so does not have its own datastore table.
INFO  Datastore - The class 
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
"embedded-only" so does not have its own datastore table.
INFO  Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is 
tagged as "embedded-only" so does not have its own datastore table.
INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
INFO  ObjectStore - Initialized ObjectStore
WARN  ObjectStore - Version information not found in metastore. 
hive.metastore.schema.verification is not enabled so recording the schema 
version 1.2.0
WARN  ObjectStore - Failed to get database default, returning 
NoSuchObjectException
INFO  HiveMetaStore - Added admin role in metastore
INFO  HiveMetaStore - Added public role in metastore
INFO  HiveMetaStore - No user is added in admin role, since config is empty
INFO  HiveMetaStore - 0: get_all_databases
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_all_databases
INFO  HiveMetaStore - 0: get_functions: db=default pat=*
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_functions: 
db=default pat=*
INFO  Datastore - The class 
"org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as 
"embedded-only" so does not have its own datastore table.
WARN  NativeCodeLoader - Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable
INFO  SessionState - Created local directory: 
/tmp/9b102c97-c3f4-4d92-b722-0a2e257d3b5b_resources
INFO  SessionState - Created HDFS directory: 
/tmp/hive/jegreen1/9b102c97-c3f4-4d92-b722-0a2e257d3b5b
INFO  SessionState - Created local directory: 
/tmp/jegreen1/9b102c97-c3f4-4d92-b722-0a2e257d3b5b
INFO  SessionState - Created HDFS directory: 
/tmp/hive/jegreen1/9b102c97-c3f4-4d92-b722-0a2e257d3b5b/_tmp_space.db
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not 
exist
INFO  HiveContext - default warehouse location is /user/hive/warehouse
INFO  HiveContext - Initializing HiveMetastoreConnection version 1.2.1 using 
Spark classes.
INFO  ClientWrapper - Inspected Hadoop version: 2.6.0
INFO  ClientWrapper - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for 
Hadoop version 2.6.0
WARN  HiveConf - HiveConf of name hive.enable.spark.execution.engine does not 
exist
INFO  metastore - Trying to connect to metastore with URI 
thrift://dkclusterm2.imp.net:9083
INFO  metastore - Connected to metastore.
INFO  SessionState - Created local directory: 
/tmp/7e230580-37af-47d3-81cc-eb4829b8da62_resources
INFO  SessionState - Created HDFS directory: 
/tmp/hive/jegreen1/7e230580-37af-47d3-81cc-eb4829b8da62
INFO  SessionState - Created local directory: 
/tmp/jegreen1/7e230580-37af-47d3-81cc-eb4829b8da62
INFO  SessionState - Created HDFS directory: 
/tmp/hive/jegreen1/7e230580-37af-47d3-81cc-eb4829b8da62/_tmp_space.db
INFO  ParquetRelation - Listing 
hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208 on driver
INFO  SparkContext - Starting job: parquet at ThriftTest.scala:39
INFO  DAGScheduler - Got job 0 (parquet at ThriftTest.scala:39) with 32 output 
partitions
INFO  DAGScheduler - Final stage: ResultStage 0 (parquet at ThriftTest.scala:39)
INFO  DAGScheduler - Parents of final stage: List()
INFO  DAGScheduler - Missing parents: List()
INFO  DAGScheduler - Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet 
at ThriftTest.scala:39), which has no missing parents
INFO  MemoryStore - Block broadcast_0 stored as values in memory (estimated 
size 65.5 KB, free 65.5 KB)
INFO  MemoryStore - Block broadcast_0_piece0 stored as bytes in memory 
(estimated size 22.9 KB, free 88.3 KB)
INFO  BlockManagerInfo - Added broadcast_0_piece0 in memory on localhost:41220 
(size: 22.9 KB, free: 511.1 MB)
INFO  SparkContext - Created broadcast 0 from broadcast at 
DAGScheduler.scala:1006
INFO  DAGScheduler - Submitting 32 missing tasks from ResultStage 0 
(MapPartitionsRDD[1] at parquet at ThriftTest.scala:39)
INFO  TaskSchedulerImpl - Adding task set 0.0 with 32 tasks
INFO  TaskSetManager - Starting task 0.0 in stage 0.0 (TID 0, localhost, 
partition 0,PROCESS_LOCAL, 6528 bytes)
INFO  TaskSetManager - Starting task 1.0 in stage 0.0 (TID 1, localhost, 
partition 1,PROCESS_LOCAL, 6528 bytes)
INFO  TaskSetManager - Starting task 2.0 in stage 0.0 (TID 2, localhost, 
partition 2,PROCESS_LOCAL, 6528 bytes)
INFO  TaskSetManager - Starting task 3.0 in stage 0.0 (TID 3, localhost, 
partition 3,PROCESS_LOCAL, 6528 bytes)
INFO  TaskSetManager - Starting task 4.0 in stage 0.0 (TID 4, localhost, 
partition 4,PROCESS_LOCAL, 6528 bytes)
INFO  TaskSetManager - Starting task 5.0 in stage 0.0 (TID 5, localhost, 
partition 5,PROCESS_LOCAL, 6528 bytes)


From: Yin Huai [mailto:yh...@databricks.com]
Sent: 26 January 2016 17:48
To: Green, James (UK Guildford)
Cc: dev@spark.apache.org
Subject: Re: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

Can you post more logs, specially lines around "Initializing execution hive 
..." (this is for an internal used fake metastore and it is derby) and 
"Initializing HiveMetastoreConnection version ..." (this is for the real 
metastore. It should be your remote one)? Also, those temp tables are stored in 
the memory and are associated with a HiveContext. If you can not see temp 
tables, it usually means that the HiveContext that you used with JDBC was 
different from the one used to create the temp table. However, in your case, 
you are using HiveThriftServer2.startWithContext(hiveContext). So, it will be 
good to provide more logs and see what happened.

Thanks,

Yin

On Tue, Jan 26, 2016 at 1:33 AM, 
james.gre...@baesystems.com<mailto:james.gre...@baesystems.com> 
<james.gre...@baesystems.com<mailto:james.gre...@baesystems.com>> wrote:
Hi

I posted this on the user list yesterday,  I am posting it here now because on 
further investigation I am pretty sure this is a bug:


On upgrade from 1.5.0 to 1.6.0 I have a problem with the hivethriftserver2, I 
have this code:

val hiveContext = new HiveContext(SparkContext.getOrCreate(conf));

val thing = 
hiveContext.read.parquet("hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208<http://dkclusterm1.imp.net:8020/user/jegreen1/ex208>")

thing.registerTempTable("thing")

HiveThriftServer2.startWithContext(hiveContext)


When I start things up on the cluster my hive-site.xml is found – I can see 
that the metastore connects:


INFO  metastore - Trying to connect to metastore with URI 
thrift://dkclusterm2.imp.net:9083<http://dkclusterm2.imp.net:9083>
INFO  metastore - Connected to metastore.


But then later on the thrift server seems not to connect to the remote hive 
metastore but to start a derby instance instead:

INFO  AbstractService - Service:CLIService is started.
INFO  ObjectStore - ObjectStore, initialize called
INFO  Query - Reading in results for query 
"org.datanucleus.store.rdbms.query.SQLQuery@0<mailto:org.datanucleus.store.rdbms.query.SQLQuery@0>"
 since the connection used is closing
INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
INFO  ObjectStore - Initialized ObjectStore
INFO  HiveMetaStore - 0: get_databases: default
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=get_databases: 
default
INFO  HiveMetaStore - 0: Shutting down the object store...
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=Shutting down the 
object store...
INFO  HiveMetaStore - 0: Metastore shutdown complete.
INFO  audit - ugi=jegreen1      ip=unknown-ip-addr      cmd=Metastore shutdown 
complete.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.

On 1.5.0 the same bit of the log reads:

INFO  AbstractService - Service:CLIService is started.
INFO  metastore - Trying to connect to metastore with URI 
thrift://dkclusterm2.imp.net:9083<http://dkclusterm2.imp.net:9083>      ******* 
ie 1.5.0 connects to remote hive
INFO  metastore - Connected to metastore.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.
INFO  ThriftCLIService - Starting ThriftBinaryCLIService on port 10000 with 
5...500 worker threads



So if I connect to this with JDBC I can see all the tables on the hive server – 
but not anything temporary – I guess they are going to derby.

I see someone on the databricks website is also having this problem.


Thanks

James
Please consider the environment before printing this email. This message should 
be regarded as confidential. If you have received this email in error please 
notify the sender and destroy it immediately. Statements of intent shall only 
become binding when confirmed in hard copy by an authorised signatory. The 
contents of this email may relate to dealings with other companies under the 
control of BAE Systems Applied Intelligence Limited, details of which can be 
found at http://www.baesystems.com/Businesses/index.htm.

Please consider the environment before printing this email. This message should 
be regarded as confidential. If you have received this email in error please 
notify the sender and destroy it immediately. Statements of intent shall only 
become binding when confirmed in hard copy by an authorised signatory. The 
contents of this email may relate to dealings with other companies under the 
control of BAE Systems Applied Intelligence Limited, details of which can be 
found at http://www.baesystems.com/Businesses/index.htm.

RE: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

Reply via email to