Hi All, Updating further on this issue, as did some further debugging and see some of the threads which are using high CPU. And the operation I tried is listPartition on a table that has huge number of partitions ( around 1.5 million).
One of the thread that was using large chunk of CPU. 838 "pool-1-thread-141" prio=10 tid=0x00007fa51c8d3800 nid=0x2fe runnable [0x00007fa4f8b8a000] 839 java.lang.Thread.State: RUNNABLE 840 at java.net.SocketInputStream.socketRead0(Native Method) 841 at java.net.SocketInputStream.read(SocketInputStream.java:150) 842 at java.net.SocketInputStream.read(SocketInputStream.java:121) 843 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) 844 at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) 845 at java.io.BufferedInputStream.read(BufferedInputStream.java:334) 846 - locked <0x0000000564aba418> (a java.io.BufferedInputStream) 847 at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) 848 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) 849 at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) 850 at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) 851 at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) 852 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) 853 at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48) 854 at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) 855 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) 856 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 857 at java.lang.Thread.run(Thread.java:722) Regards, Manish On Tue, Jan 20, 2015 at 5:58 PM, Manish Malhotra < manish.hadoop.w...@gmail.com> wrote: > Hi All, > > I'm using Hive Thrift Server in Production which at peak handles around > 500 req/min. > After certain point the Hive Thrift Server is going into the no response > mode and throws > Following exception > "org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out" > > As the metastore we are using MySQL, that is being used by Thrift server. > The design / architecture is like this: > > Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> > MySQL (Master) -- > MySQL (Slave). > > Software versions: > > Hive version : 0.10.0 > Hadoop: 1.2.1 > > > Looks like when the load is beyond some threshold for certain operations > it is having problem in responding. > As the hive jobs sometimes fails because of this issue, we also have a > auto-restart check to see if the Thrift server is not responding, it stops > / kills and restart the service. > > Other tuning done: > > Thrift Server: > > Given 11gb heap, and configured CMS GC algo. > > MySQL: > > Tuned innodb_buffer, tmp_table and max_heap parameters. > > So, can somebody please help to understand, what could be the root cause > for this or somebody faced the similar issue. > > I found one related JIRA : > https://issues.apache.org/jira/browse/HCATALOG-541 > > But this JIRA shows that Hive Thrift Server shows OOM error, but in my > case I didnt see any OOM error in my case. > > > Regards, > Manish > > Full Exception Stack: > > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74) > at $Proxy7.getDatabase(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110) > at > org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) > at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:150) > at java.net.SocketInputStream.read(SocketInputStream.java:121) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 34 more > 2015-01-20 22:44:12,978 ERROR exec.Task > (SessionState.java:printError(401)) - FAILED: Error in metadata: > org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114) > at > org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) > at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) > >