Re: HBase stucks from time to time

Ted Yu Wed, 22 Apr 2015 11:31:49 -0700

The image didn't come through.

FYI


On Wed, Apr 22, 2015 at 11:23 AM, Serega Sheypak <[email protected]>
wrote:

> > major compaction runs daily.
>
> >What else do you see in the RS logs?
> no error, only *Slow sync cost *
>
> >How iostat looks like?
> please see image. 12.00 - 12.30 is a time when reading/writing stopped
> [image: Встроенное изображение 1]
>
>
> >Can you share the logs around the time this occurs?
> 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":20128,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"
> 5.9.75.155:58344
> ","starttimems":1429692769868,"queuetimems":28034,"class":"HRegionServer","responsesize":8,"method":"Multi"}
> 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":21434,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"
> 46.4.0.110:60149
> ","starttimems":1429692768562,"queuetimems":44263,"class":"HRegionServer","responsesize":2,"method":"Mutate"}
> 2015-04-22 12:53:10,093 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":17997,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"46.4.0.
>
> 015-04-22 12:53:10,270 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":18175,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"
> 144.76.218.107:48620
> ","starttimems":1429692772095,"queuetimems":49253,"class":"HRegionServer","responsesize":2,"method":"Mutate"}
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":20231,"call":"Mutate(org.apache.hadoop.hbase.
>
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Added
> hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/c/a93405e51088418ba95241cf53456011,
> entries=926, sequenceid=43403085, filesize=258.5 K
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~3.8 M/3985504, currentsize=12.6 K/12912 for
> region
> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
> in 857ms, sequenceid=43403085, compaction requested=true
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on c in region
> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Starting compaction of 3 file(s) in c of
> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
> into
> tmpdir=hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/.tmp,
> totalSize=8.0 M
> 2015-04-22 12:53:11,917 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 417 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:12,199 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 282 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,247 INFO
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
> sequenceid=41975630, memsize=3.6 M, hasBloomFilter=true, into tmp file
> hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/.tmp/3495a2b487fa448bb9e0b50050b11c3a
> 2015-04-22 12:53:13,301 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Added
> hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/c/3495a2b487fa448bb9e0b50050b11c3a,
> entries=886, sequenceid=41975630, filesize=272.9 K
>
> 2015-04-22 19:54 GMT+02:00 Ted Yu <[email protected]>:
>
>> Serega:
>> How often is major compaction run in your cluster ?
>>
>> Have you configured offpeak compaction ?
>> See related parameters in:
>> http://hbase.apache.org/book.html#compaction.parameters
>>
>> Cheers
>>
>> On Wed, Apr 22, 2015 at 10:39 AM, Serega Sheypak <
>> [email protected]>
>> wrote:
>>
>> > Hi, we have 10 nodes cluster running HBase 0.98 CDH 5.2.1
>> > Sometimes HBase stucks.
>> > We have several apps constantly writing/reading data from it. Sometimes
>> we
>> > see that apps response time dramatically increases. It means that app
>> > spends seconds to read/write from/to HBase. in 99% of time it takes
>> 20ms.
>> >
>> > I suppose that compactions/major compactions could be the root cause. I
>> see
>> > that compactions start at the same time when we have problems with app.
>> > Could it be so?
>> > So HBase can't write to WAL because compactions consumes all IO and apps
>> > stops to write data?
>> >
>>
>
>

Re: HBase stucks from time to time

Reply via email to