Re: how to make tuning for hbase (every couple of days hbase region sever/s crashe)

Himanshu Vashishtha Wed, 31 Aug 2011 17:28:45 -0700

Sorry, I missed the fact that you guys were talking about the oome thing
(the exceptions were of sockettimeout)
Can you give the log snippet where it oome'd? I want to explore this use
case :)


You have about 200 regions per server, and each region configured to 500MB
makes it 100GB data per server.
Each Region is considered open when index block of all its StoreFiles are
read; where the default block size of the HFile is 64KB. Having a larger
block size will help in reducing the index size for each StoreFile. As Chris
said, looking RS metrics will give lot of useful info such as
storefileindex, blockcache.
I think that only increasing Region size to 500MB will not reduce memory
footprint (apart from reducing Region split and some entries in '.META.'),
as one has to deal with StoreFiles eventually. Yes, reducing size of
keyvalue co-ordinates will help in limiting its size (I am sure you already
have an optimised schema).

Your gc-log snapshot says that CMS failed to free even 1 byte, and then fall
back on "stop-the-world" gc. This means there are literally no garbage
object in the heap during that time window? Or, maybe your app was heavily
writing concurrently to the RS. Since not even a single byte was freed,
using MSLAB will not help (if you haven't enabled it yet); as it is for
defragmenting the freed space because cms doesn't do any compaction on its
own.

What did you do to sort this error eventually Oleg? Does bumping the RS heap
fixed it? Are you using compression while writing to HBase?

Thanks,
Himanshu

On Thu, Aug 25, 2011 at 9:41 AM, Chris Tarnas <[email protected]> wrote:

>
> On Aug 25, 2011, at 1:55 AM, Oleg Ruchovets wrote:
>
> > Thank you very much for your post , It very similar what is happening in
> our
> > environment.
> >
> >    Before we are going to increase HeapSize we want to make some tuning
> for
> > hbase memstore and related components.
> >
>
> We run with 12-15GB of heap, 4 was only enough for us in very small test
> DBs.
>
> > 1)  Currently our configuration parameters related to memstore are
> > default.
> >
> >      --    hbase.regionserver.global.memstore.upperLimit=0.4
> >
> >      --   hbase.regionserver.global.memstore.lowerLimit=0.35
> >
> >      --   hbase.hregion.memstore.flush.size=67108864
> >
> >     --    hbase.hregion.memstore.block.multiplier=2
> >
> >     --   hbase.hstore.compactionThreshold=3
> >
> >     --  hbase.hstore.blockingStoreFiles=7
> >
> > 1. Could you recommend an alternative configuration that is more suitable
> > for heavy loads?
> >
>
> Hard to say without more details - off hand you might want to lower the
> flush size and the upper/lower limits. You should monitor the cluster during
> loads, using the web UI to see each regionservers memory usage. If you click
> through to an individual regionserver you can see how the heap is being
> used.
>
> > 2. We still don't understand why we get region server OOME only once
> every
> > few days (and not every day - since each day we insert the same amount of
> > data) and why the region server heap size is growing constantly. We
> expect
> > that after memstore flushing the heap will go back to normal but this
> > doesn't happen until we restart hbase.
> >
>
> I would suspect it is happening when a regionserver has a large
> StoreFileIndex and is hosting a particularly hot region that is getting lots
> of updates. When those events coincide on a single server it OOMEs.
>
> > 3. We know exact  start  time of our  hbase job , can we force memstore
> > flush before starting the job ?
> >
> >
>
> from the hbase shell run
>
> flush 'table_name'
>
> I would highly recommend looking at how the region servers are using heap
> when you first start them up and see how large your StoreFileIndex is.
>
> -chris
>
>
> >
> > On Wed, Aug 24, 2011 at 7:06 PM, Chris Tarnas <[email protected]> wrote:
> >
> >>
> >>
> >> We had a similar OOME problem and and we solved it by allocating more
> heap
> >> space. The underlying cause for us was as the table grew, the
> StoreFileIndex
> >> grew taking up a larger and larger chunk of heap.
> >>
> >> What caused this to be a problem is that Memstore grows rapidly during
> >> inserts and its size limits are not StoreFileIndex aware. After doing
> some
> >> heavy inserts the Memstore + StoreFileIndex is more than heap. If you
> >> restart the regionserver then the Memstore is flushed and you are well
> under
> >> heap and all appears well. Something similar could happen with the
> >> BlockCache to but we didn't directly see that.
> >>
> >> We fixed this by allocating more heap and reducing the StoreFileIndex
> size
> >> by increasing the hfile block size and using shorter keys and
> column/column
> >> family names.
> >>
> >> -chris
> >>
> >>
> >> On Aug 24, 2011, at 12:35 AM, Oleg Ruchovets wrote:
> >>
> >>> Thanks for your feedback.
> >>>
> >>> The point is that once we restart hbase memory footprint is far below 4
> >> GB.
> >>> The system runs well for couple of days and then the heap reaches 4GB
> >> which
> >>> causes the region to crash.
> >>>
> >>> This may indicate on memory leak since once we restart hbase the
> problem
> >> is
> >>> solved (or maybe its just a configuration problem??).
> >>>
> >>> I'm afraid that giving more memory to the region (8GB) will only
> postpone
> >>> the problem, meaning the region will still crash but less frequently.
> >>>
> >>> How do you think we should tackle this problem?
> >>>
> >>> Best,
> >>> Oleg
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Aug 24, 2011 at 6:52 AM, Michael Segel <
> >> [email protected]>wrote:
> >>>
> >>>>
> >>>> I won't say you're crazy but .5 GB per mapper?
> >>>>
> >>>> I would say tune conservatively like you are suggesting 1GB for OS,
> but
> >>>> also I'd suggest tuning to 80% utilization instead of 100%
> utilization.
> >>>>
> >>>>> From: [email protected]
> >>>>> To: [email protected]
> >>>>> Date: Tue, 23 Aug 2011 16:35:22 -0700
> >>>>> Subject: RE: how to make tuning for hbase (every couple of days hbase
> >>>> region sever/s crashe)
> >>>>>
> >>>>> So, if you use 0.5 GB / mapper and 1 GB / reducer, your total memory
> >>>> consumption (minus hbase) on a slave node should be:
> >>>>> 4 GB M/R tasks
> >>>>> 1 GB OS -- just a guess
> >>>>> 1 GB datanode
> >>>>> 1 GB tasktracker
> >>>>> Leaving you with up to 9 GB for your region servers.  I would suggest
> >>>> bumping your region server ram up to 8GB, and leave a GB for OS
> caching.
> >> [I
> >>>> am sure someone out there will tell me I am crazy]
> >>>>>
> >>>>>
> >>>>> However, it is the log that is the most useful part of your email.
> >>>> Unfortunately I haven't seen that error before.
> >>>>> Are you using the Multi methods a lot in your code?
> >>>>>
> >>>>> Dave
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Oleg Ruchovets [mailto:[email protected]]
> >>>>> Sent: Tuesday, August 23, 2011 1:38 PM
> >>>>> To: [email protected]
> >>>>> Subject: Re: how to make tuning for hbase (every couple of days hbase
> >>>> region sever/s crashe)
> >>>>>
> >>>>> Thank you for detailed response,
> >>>>>
> >>>>> On Tue, Aug 23, 2011 at 7:49 PM, Buttler, David <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> Have you looked at the logs of the region servers?  That is a good
> >>>> first
> >>>>>> place to look.
> >>>>>
> >>>>> How many regions are in your system?
> >>>>>
> >>>>>
> >>>>>        Region Servers
> >>>>>
> >>>>> Address Start Code Load
> >>>>> hadoop01 1314007529600 requests=0, regions=212, usedHeap=3171,
> >>>> maxHeap=3983
> >>>>> hadoop02 1314007496109 requests=0, regions=207, usedHeap=2185,
> >>>> maxHeap=3983
> >>>>> hadoop03 1314008874001 requests=0, regions=208, usedHeap=1955,
> >>>> maxHeap=3983
> >>>>> hadoop04 1314008965432 requests=0, regions=209, usedHeap=2034,
> >>>> maxHeap=3983
> >>>>> hadoop05 1314007496533 requests=0, regions=208, usedHeap=1970,
> >>>> maxHeap=3983
> >>>>> hadoop06 1314008874036 requests=0, regions=208, usedHeap=1987,
> >>>> maxHeap=3983
> >>>>> hadoop07 1314007496927 requests=0, regions=209, usedHeap=2118,
> >>>> maxHeap=3983
> >>>>> hadoop08 1314007497034 requests=0, regions=211, usedHeap=2568,
> >>>> maxHeap=3983
> >>>>> hadoop09 1314007497221 requests=0, regions=209, usedHeap=2148,
> >>>> maxHeap=3983
> >>>>> master            1314008873765 requests=0, regions=208,
> usedHeap=2007,
> >>>>> maxHeap=3962
> >>>>> Total: servers: 10  requests=0, regions=2089
> >>>>>
> >>>>> most of the  time GC succeeded to clean up but every 3/4 days used
> >> memory
> >>>>> become close to 4G
> >>>>>
> >>>>> and there are alot of Exceptions like this:
> >>>>>
> >>>>>  org.apache.hadoop.ipc.*HBase*Server: IPC Server
> >>>>> Responder, call
> >>>> multi(org.apache.hadoop.*hbase*.client.MultiAction@491fb2f4)
> >>>>> from 10.11.87.73:33737: output error
> >>>>> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.*HBase*Server: IPC
> >>>> Server
> >>>>> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
> >>>>>       at
> >>>>>
> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> >>>>>       at
> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> >>>>>       at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.*hbase.ipc.HBaseServer.channelIO(HBase*Server.java:1387)
> >>>>>       at
> >>>>> org.apache.hadoop.*hbase.ipc.HBaseServer.channelWrite(HBase*
> >>>>> Server.java:1339)
> >>>>>       at
> >>>>>
> >>
> org.apache.hadoop.*hbase.ipc.HBaseServer$Responder.processResponse(HBase*
> >>>>> Server.java:727)
> >>>>>       at
> >>>>> org.apache.hadoop.*hbase.ipc.HBaseServer$Responder.doRespond(HBase*
> >>>>> Server.java:792)
> >>>>>       at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.*hbase.ipc.HBaseServer$Handler.run(HBase*Server.java:1083)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> If you are using MSLAB, it reserves 2MB/region as a buffer -- that
> can
> >>>> add
> >>>>>> up when you have lots of regions.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Given so little information all my guesses are going to be wild, but
> >>>> they
> >>>>>> might help:
> >>>>>> 4GB may not be enough for your current load.
> >>>>>
> >>>>> Have you considered changing your memory allocation, giving less to
> >> your
> >>>>>> map/reduce jobs and more to HBase?
> >>>>>>
> >>>>>>
> >>>>> Interesting point , can you advice relation between m/r memory
> >>>> allocation
> >>>>> related to hbase region?
> >>>>>
> >>>>> currently we have 512m for map (4 map per machine) and 1024m for
> >>>> reduce(2
> >>>>> reducers per machine)
> >>>>>
> >>>>>
> >>>>>> What is your key distribution like?
> >>>>>
> >>>>> Are you writing to all regions equally, or are you hotspotting on one
> >>>>>> region?
> >>>>>>
> >>>>>
> >>>>> every day before running job we manually allocates regions
> >>>>> with lexicographically start and end key to get good distribution and
> >>>>> prevent hot-spots.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Check your cell/row sizes.  Are they really large (e.g. cells > 1
> MB;
> >>>> rows
> >>>>>>> 100 MB)?  Increasing region size should help here, but there may be
> >>>> an
> >>>>>> issue with your RAM allocation for HBase.
> >>>>>>
> >>>>>>
> >>>>> I'll check but I almost sure that we have no row > 100MB, we changed
> >>>> region
> >>>>> size for 500Mb to prevent automatic splits (after successfully
> inserted
> >>>> job
> >>>>> we have ~ 200-250 mb files per region)
> >>>>> and for the next day we allocate a new one.
> >>>>>
> >>>>>
> >>>>>> Are you sure that you are not overloading the machine memory? How
> much
> >>>> RAM
> >>>>>> do you allocate for map reduce jobs?
> >>>>>>
> >>>>>>
> >>>>>   512M -- map
> >>>>>   1024 -- reduce
> >>>>>
> >>>>>
> >>>>>> How do you distribute your processes over machines?  Does your
> master
> >>>> run
> >>>>>> namenode, hmaster, jobtracker, and zookeeper, while your slaves run
> >>>>>> datanode, tasktracker, and hregionserver?
> >>>>>
> >>>>>
> >>>>> Exactly , we have such process distribution.
> >>>>> we have 16G ordinary machines
> >>>>> and 48G ram for maser , so I am not sure that I  understand your
> >>>> calculation
> >>>>> , please clarify
> >>>>>
> >>>>> If so, then your memory allocation is:
> >>>>>> 4 GB for regionserver
> >>>>>> 1 GB for OS
> >>>>>> 1 GB for datanode
> >>>>>> 1 GB for tasktracker
> >>>>>> 9/6 GB for M/R
> >>>>>> So, are you sure that all of your m/r tasks take less than 1 GB?
> >>>>>>
> >>>>>> Dave
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Oleg Ruchovets [mailto:[email protected]]
> >>>>>> Sent: Tuesday, August 23, 2011 2:15 AM
> >>>>>> To: [email protected]
> >>>>>> Subject: how to make tuning for hbase (every couple of days hbase
> >>>> region
> >>>>>> sever/s crashe)
> >>>>>>
> >>>>>> Hi ,
> >>>>>>
> >>>>>> Our environment
> >>>>>> hbase 90.2 (10 machine)
> >>>>>>  We have 10 machine grid:
> >>>>>>  master has 48G ram
> >>>>>>  slaves machine has 16G ram.
> >>>>>>  Region Server process has 4G ram
> >>>>>>  Zookeeper process has 2G ram
> >>>>>>   We have 4map/2reducer per machine
> >>>>>>
> >>>>>>
> >>>>>> We write from m/r job to hbase (2 jobs a day).  3 months system
> works
> >>>>>> without any problem , but now  every 3/4 days region server crashes.
> >>>>>> What we done so far:
> >>>>>> 1) We running major compaction manually once a day
> >>>>>> 2) We increases regions size to prevent automatic split.
> >>>>>>
> >>>>>> Question:
> >>>>>> What is the way to make a HBase tuning ?
> >>>>>> How to debug such problem , because it is still not clear for me
> what
> >>>> is
> >>>>>> the root  cause of region's crashes?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> We started from this post.
> >>>>>>
> >>>>>>
> >>>>
> >>
> http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production
> >>>>>>
> >>>>>>
> >>>>>> <
> >>>>>>
> >>>>
> >>
> http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production
> >>>>>>>
> >>>>>> Regards
> >>>>>> Oleg.
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: how to make tuning for hbase (every couple of days hbase region sever/s crashe)

Reply via email to