[ https://issues.apache.org/jira/browse/HIVE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexandre Linte updated HIVE-14631: ----------------------------------- Description: I have a cluster secured with Kerberos and Hive is configured to work with Tez by default. Everything works well through hive-cli and beeline; however, I'm facing a strange behavior through Hue. I can have a lot of client connections (these can reach 600) and after a day, the client connections fail. But this is not the case for all clients connection attempts. When it fails, I have the following logs on the HiveServer2: {noformat} Aug 3 09:28:04 hiveserver2.bigdata.fr Executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in parallel Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:05 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:05 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:06 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:06 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:06 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:08 hiveserver2.bigdata.fr FAILED: Execution Error, return code -1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Aug 3 09:28:08 hiveserver2.bigdata.fr Completed executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112); Time taken: 4.002 seconds {noformat} At the same time I have the following logs on the Metastore are: {noformat} Aug 3 09:28:03 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:03 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:04 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:05 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:05 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:06 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:06 metastore01.bigdata.fr Error occurred during processing of message. {noformat} To solve the connections issue, I have to restart the HiveServer2. Note: I also created a JIRA for Hue: https://issues.cloudera.org/browse/HUE-4748 was: I have a cluster secured with Kerberos and Hive is configured to work with Tez by default. Everything works well through hive-cli and beeline; however, I'm facing a strange behavior through Hue. I can have a lot of client connections (these can reach 600) and after a day, the client connections fail. But this is not the case for all clients connection attempts. When it fails, I have the following logs on the HiveServer2: {noformat} Aug 3 09:28:04 hiveserver2.bigdata.fr Executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in parallel Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:05 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:05 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:06 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:06 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:06 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:08 hiveserver2.bigdata.fr FAILED: Execution Error, return code -1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Aug 3 09:28:08 hiveserver2.bigdata.fr Completed executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112); Time taken: 4.002 seconds {noformat} At the same time I have the following logs on the Metastore are: {noformat} Aug 3 09:28:03 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:03 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:04 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:05 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:05 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:06 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:06 metastore01.bigdata.fr Error occurred during processing of message. {noformat} Note: I also created a JIRA for Hue: https://issues.cloudera.org/browse/HUE-4748 > HiveServer2 regularly fails to connect to metastore > --------------------------------------------------- > > Key: HIVE-14631 > URL: https://issues.apache.org/jira/browse/HIVE-14631 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 1.2.1, 2.0.0, 2.1.0 > Environment: Hive 2.1.0, Hue 3.10.0, Hadoop 2.7.2, Tez 0.8.3 > Reporter: Alexandre Linte > > I have a cluster secured with Kerberos and Hive is configured to work with > Tez by default. Everything works well through hive-cli and beeline; however, > I'm facing a strange behavior through Hue. > I can have a lot of client connections (these can reach 600) and after a day, > the client connections fail. But this is not the case for all clients > connection attempts. > When it fails, I have the following logs on the HiveServer2: > {noformat} > Aug 3 09:28:04 hiveserver2.bigdata.fr Executing > command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): > INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') > Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = > hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 > Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 > Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 > Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in > parallel > Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with > URI thrift://metastore01.bigdata.fr:9083 > Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore > Server... > Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next > connection attempt. > Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with > URI thrift://metastore01.bigdata.fr:9083 > Aug 3 09:28:05 hiveserver2.bigdata.fr Failed to connect to the MetaStore > Server... > Aug 3 09:28:05 hiveserver2.bigdata.fr Waiting 1 seconds before next > connection attempt. > Aug 3 09:28:06 hiveserver2.bigdata.fr Trying to connect to metastore with > URI thrift://metastore01.bigdata.fr:9083 > Aug 3 09:28:06 hiveserver2.bigdata.fr Failed to connect to the MetaStore > Server... > Aug 3 09:28:06 hiveserver2.bigdata.fr Waiting 1 seconds before next > connection attempt. > Aug 3 09:28:08 hiveserver2.bigdata.fr FAILED: Execution Error, return code > -1 from org.apache.hadoop.hive.ql.exec.tez.TezTask > Aug 3 09:28:08 hiveserver2.bigdata.fr Completed executing > command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112); > Time taken: 4.002 seconds > {noformat} > At the same time I have the following logs on the Metastore are: > {noformat} > Aug 3 09:28:03 metastore01.bigdata.fr 180: get_table : db=shfs3453 > tbl=camille_test > Aug 3 09:28:03 metastore01.bigdata.fr > ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 > tbl=camille_test#011 > Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 > tbl=camille_test > Aug 3 09:28:04 metastore01.bigdata.fr > ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 > tbl=camille_test#011 > Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 > tbl=camille_test > Aug 3 09:28:04 metastore01.bigdata.fr > ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 > tbl=camille_test#011 > Aug 3 09:28:04 metastore01.bigdata.fr SASL negotiation failure > Aug 3 09:28:04 metastore01.bigdata.fr Error occurred during processing of > message. > Aug 3 09:28:05 metastore01.bigdata.fr SASL negotiation failure > Aug 3 09:28:05 metastore01.bigdata.fr Error occurred during processing of > message. > Aug 3 09:28:06 metastore01.bigdata.fr SASL negotiation failure > Aug 3 09:28:06 metastore01.bigdata.fr Error occurred during processing of > message. > {noformat} > To solve the connections issue, I have to restart the HiveServer2. > Note: I also created a JIRA for Hue: > https://issues.cloudera.org/browse/HUE-4748 -- This message was sent by Atlassian JIRA (v6.3.4#6332)