On upgrade from 1.5.0 to 1.6.0 I have a problem with the hivethriftserver2, I have this code:
val hiveContext = new HiveContext(SparkContext.getOrCreate(conf)); val thing = hiveContext.read.parquet("hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208") thing.registerTempTable("thing") HiveThriftServer2.startWithContext(hiveContext) When I start things up on the cluster my hive-site.xml is found – I can see that the metastore connects: INFO metastore - Trying to connect to metastore with URI thrift://dkclusterm2.imp.net:9083 INFO metastore - Connected to metastore. But then later on the thrift server seems not to connect to the remote hive metastore but to start a derby instance instead: INFO AbstractService - Service:CLIService is started. INFO ObjectStore - ObjectStore, initialize called INFO Query - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing INFO MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY INFO ObjectStore - Initialized ObjectStore INFO HiveMetaStore - 0: get_databases: default INFO audit - ugi=jegreen1 ip=unknown-ip-addr cmd=get_databases: default INFO HiveMetaStore - 0: Shutting down the object store... INFO audit - ugi=jegreen1 ip=unknown-ip-addr cmd=Shutting down the object store... INFO HiveMetaStore - 0: Metastore shutdown complete. INFO audit - ugi=jegreen1 ip=unknown-ip-addr cmd=Metastore shutdown complete. INFO AbstractService - Service:ThriftBinaryCLIService is started. INFO AbstractService - Service:HiveServer2 is started. So if I connect to this with JDBC I can see all the tables on the hive server – but not anything temporary – I guess they are going to derby. I see someone on the databricks website is also having this problem. Thanks James From: patcharee [mailto:patcharee.thong...@uni.no] Sent: 25 January 2016 14:31 To: user@spark.apache.org Cc: Eirik Thorsnes Subject: streaming textFileStream problem - got only ONE line Hi, My streaming application is receiving data from file system and just prints the input count every 1 sec interval, as the code below: val sparkConf = new SparkConf() val ssc = new StreamingContext(sparkConf, Milliseconds(interval_ms)) val lines = ssc.textFileStream(args(0)) lines.count().print() The problem is sometimes the data received from scc.textFileStream is ONLY ONE line. But in fact there are multiple lines in the new file found in that interval. See log below which shows three intervals. In the 2nd interval, the new file is: hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt. This file contains 6288 lines. The ssc.textFileStream returns ONLY ONE line (the header). Any ideas/suggestions what the problem is? ----------------------------------------------------------------------------------------- SPARK LOG ----------------------------------------------------------------------------------------- 16/01/25 15:11:11 INFO FileInputDStream: Cleared 1 old files that were older than 1453731011000 ms: 1453731010000 ms 16/01/25 15:11:11 INFO FileInputDStream: Cleared 0 old files that were older than 1453731011000 ms: 16/01/25 15:11:12 INFO FileInputDStream: Finding new files took 4 ms 16/01/25 15:11:12 INFO FileInputDStream: New files at time 1453731072000 ms: hdfs://helmhdfs/user/patcharee/cerdata/datetime_19616.txt ------------------------------------------- Time: 1453731072000 ms ------------------------------------------- 6288 16/01/25 15:11:12 INFO FileInputDStream: Cleared 1 old files that were older than 1453731012000 ms: 1453731011000 ms 16/01/25 15:11:12 INFO FileInputDStream: Cleared 0 old files that were older than 1453731012000 ms: 16/01/25 15:11:13 INFO FileInputDStream: Finding new files took 4 ms 16/01/25 15:11:13 INFO FileInputDStream: New files at time 1453731073000 ms: hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt ------------------------------------------- Time: 1453731073000 ms ------------------------------------------- 1 16/01/25 15:11:13 INFO FileInputDStream: Cleared 1 old files that were older than 1453731013000 ms: 1453731012000 ms 16/01/25 15:11:13 INFO FileInputDStream: Cleared 0 old files that were older than 1453731013000 ms: 16/01/25 15:11:14 INFO FileInputDStream: Finding new files took 3 ms 16/01/25 15:11:14 INFO FileInputDStream: New files at time 1453731074000 ms: hdfs://helmhdfs/user/patcharee/cerdata/datetime_19618.txt ------------------------------------------- Time: 1453731074000 ms ------------------------------------------- 6288 Thanks, Patcharee Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.