The image didn't come through. FYI
On Wed, Apr 22, 2015 at 11:23 AM, Serega Sheypak <[email protected]> wrote: > > major compaction runs daily. > > >What else do you see in the RS logs? > no error, only *Slow sync cost * > > >How iostat looks like? > please see image. 12.00 - 12.30 is a time when reading/writing stopped > [image: Встроенное изображение 1] > > > >Can you share the logs around the time this occurs? > 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer: > (responseTooSlow): > {"processingtimems":20128,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":" > 5.9.75.155:58344 > ","starttimems":1429692769868,"queuetimems":28034,"class":"HRegionServer","responsesize":8,"method":"Multi"} > 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer: > (responseTooSlow): > {"processingtimems":21434,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":" > 46.4.0.110:60149 > ","starttimems":1429692768562,"queuetimems":44263,"class":"HRegionServer","responsesize":2,"method":"Mutate"} > 2015-04-22 12:53:10,093 WARN org.apache.hadoop.ipc.RpcServer: > (responseTooSlow): > {"processingtimems":17997,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"46.4.0. > > 015-04-22 12:53:10,270 WARN org.apache.hadoop.ipc.RpcServer: > (responseTooSlow): > {"processingtimems":18175,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":" > 144.76.218.107:48620 > ","starttimems":1429692772095,"queuetimems":49253,"class":"HRegionServer","responsesize":2,"method":"Mutate"} > 2015-04-22 12:53:10,315 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:10,315 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:10,315 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:10,315 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:10,315 WARN org.apache.hadoop.ipc.RpcServer: > (responseTooSlow): > {"processingtimems":20231,"call":"Mutate(org.apache.hadoop.hbase. > > 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore: > Added > hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/c/a93405e51088418ba95241cf53456011, > entries=926, sequenceid=43403085, filesize=258.5 K > 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~3.8 M/3985504, currentsize=12.6 K/12912 for > region > cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. > in 857ms, sequenceid=43403085, compaction requested=true > 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Starting compaction on c in region > cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. > 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore: > Starting compaction of 3 file(s) in c of > cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. > into > tmpdir=hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/.tmp, > totalSize=8.0 M > 2015-04-22 12:53:11,917 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 417 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:11,939 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:11,939 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:11,939 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:11,939 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:12,199 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 282 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:13,132 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:13,132 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:13,132 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:13,132 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] > 2015-04-22 12:53:13,247 INFO > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, > sequenceid=41975630, memsize=3.6 M, hasBloomFilter=true, into tmp file > hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/.tmp/3495a2b487fa448bb9e0b50050b11c3a > 2015-04-22 12:53:13,301 INFO org.apache.hadoop.hbase.regionserver.HStore: > Added > hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/c/3495a2b487fa448bb9e0b50050b11c3a, > entries=886, sequenceid=41975630, filesize=272.9 K > > 2015-04-22 19:54 GMT+02:00 Ted Yu <[email protected]>: > >> Serega: >> How often is major compaction run in your cluster ? >> >> Have you configured offpeak compaction ? >> See related parameters in: >> http://hbase.apache.org/book.html#compaction.parameters >> >> Cheers >> >> On Wed, Apr 22, 2015 at 10:39 AM, Serega Sheypak < >> [email protected]> >> wrote: >> >> > Hi, we have 10 nodes cluster running HBase 0.98 CDH 5.2.1 >> > Sometimes HBase stucks. >> > We have several apps constantly writing/reading data from it. Sometimes >> we >> > see that apps response time dramatically increases. It means that app >> > spends seconds to read/write from/to HBase. in 99% of time it takes >> 20ms. >> > >> > I suppose that compactions/major compactions could be the root cause. I >> see >> > that compactions start at the same time when we have problems with app. >> > Could it be so? >> > So HBase can't write to WAL because compactions consumes all IO and apps >> > stops to write data? >> > >> > >
