[jira] [Commented] (HIVE-9469) Hive Thrift Server throws Socket Timeout Exception: Read time out

Manish Malhotra (JIRA) Tue, 10 Feb 2015 14:42:54 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315109#comment-14315109
 ]


Manish Malhotra commented on HIVE-9469:
---------------------------------------

Continue on the similar thread.
Following are the work, I did for the load testing and fixing some of the 
issues.
It will be great, if somebody can review this and see, if there are things 
which Im missing.
And some time still I see the SocketTimeoutException, but ETL jobs are not 
failing.


Currently running load test with following commands / load using Hive Client 
APIs.

a.      Create Partition  - 10 threads 
b.      ListPartition  - 30 threads 
c.      Show tables – 100 threads

Load on the server was around 1200 Request Per Minute.
and for this test Thirft Server + MySQL looks good.

The tuning and finding are:

Thrift Server

1.      JVM tuning :  (JVM profiling shows with default settings, there were 
too frequent Full GC happening)

Young Generation GC Algo: Parallel

Old Generation GC Algo: CMS

Max_Heap: 11 Gb

SurvivorRatio : 6


Graph before optimization:


-- Attaching as separate files.


Graph after optimization:

-- Attaching as separate files.
 


2.      Database Connection Pooling: 

Thrift Server uses DataNucleus framework for DB operations. 
And it uses DBCP as the connection pooling tool, the default config for DBCP is 
maxConnections = 10.
Changed it to 30. 

As that is the basic bottleneck to server more requests. 


Database: 

1.      innodb_buffer = 8gb and tmp_table_space, max_heap_space = 256 mb. 


The other problem I unearthed was that one of the hive-table that had more 1 
million rows, and in PROD the ListPartition was happening on this table, when 
this happened it takes a lot of time to get the response from DB as there are 
too many rows in the PARTITION table.
So, it started blocking threads in Thrift Server and keep using one of the DB 
Connection and eventually got into state where all the DB Connections are used 
and new request cannot get the DB Connection and started getting. 
This problem was eventually making our hive queries failing and restarting. 

When solved this problem the throughput of the Thrift Server has increased a 
lot.  And The failure of Hive Jobs has reduced a lot. 

So, please let me know if these changes and solving the ListPartion problem for 
big table is good or there are few other things which we should take care.


Regards,
Manish


---------------------------------------------------------  Following are the 
details of the PROD Infrastructure 
---------------------------------------------------------


Load  = 500 req/min.

Exception: "org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out" 

As the metastore we are using MySQL, that is being used by Thrift server. 
The flow is like this: 

Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL 
(Master) -- > MySQL (Slave).

Software versions: 

   Hive version : 0.10.0
   Hadoop: 1.2.1


I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541

But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I 
didnt see any OOM error in my case.


Regards,
Manish

Full Exception Stack:  ( The exception comes when the server is loaded and new 
requests are timing out )

    at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
    at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
    at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
    at $Proxy7.getDatabase(Unknown Source)
    at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
    at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
    at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:150)
    at java.net.SocketInputStream.read(SocketInputStream.java:121)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 34 more
2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - 
FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
    at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
    at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
    at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)


> Hive Thrift Server throws Socket Timeout Exception: Read time out
> -----------------------------------------------------------------
>
>                 Key: HIVE-9469
>                 URL: https://issues.apache.org/jira/browse/HIVE-9469
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.10.0
>         Environment: 4 core cpu, 15gb memory. 2 thrift server behind load 
> balancer
>            Reporter: Manish Malhotra
>
> Hi All,
> Please review the following problem, I also posted same in the hive-user 
> group, but didnt got any response yet. 
> This is happening quite frequently in our environment. 
> So, it would be great if somebody can see and advise. 
> I'm using Hive Thrift Server in Production which at peak handles around 500 
> req/min.
> After certain point the Hive Thrift Server is going into the no response mode 
> and throws 
> Following exception 
> "org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out" 
> As the metastore we are using MySQL, that is being used by Thrift server. 
> The design / architecture is like this: 
> Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL 
> (Master) -- > MySQL (Slave).
> Software versions: 
>    Hive version : 0.10.0
>    Hadoop: 1.2.1
> Looks like when the load is beyond some threshold for certain operations it 
> is having problem in responding. 
> As the hive jobs sometimes fails because of this issue, we also have a 
> auto-restart check to see if the Thrift server is not responding, it stops / 
> kills and restart the service. 
> Other tuning done: 
> Thrift Server: 
> Given 11gb heap, and configured CMS GC algo. 
> MySQL: 
> Tuned innodb_buffer, tmp_table and max_heap parameters.
> So, can somebody please help to understand, what could be the root cause for 
> this or somebody faced the similar issue. 
> I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541
> But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I 
> didnt see any OOM error in my case.
> Regards,
> Manish
> Full Exception Stack: 
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>     at $Proxy7.getDatabase(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
>     at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.read(SocketInputStream.java:150)
>     at java.net.SocketInputStream.read(SocketInputStream.java:121)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>     at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>     ... 34 more
> 2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - 
> FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>     at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
>     at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9469) Hive Thrift Server throws Socket Timeout Exception: Read time out

Reply via email to