How large is your region server heap? What's your setting for hfile.block.cache.size? Can you identify which region is being burned up (i.e., is it META?)
It is possible for a hot region to act as a "death pill" that roams around the cluster. We see this with the meta region with poorly-behaved clients. -n On Wed, Feb 25, 2015 at 8:38 AM, Ted Tuttle <[email protected]> wrote: > Hard to say how balanced the table is. > > We have a mixed requirement where we want some locality for timeseries > queries against "clusters" of information. However the "clusters" in a > table are should be well distributed if the dataset is large enough. > > The query in question killed 5 RSs so I am inferring either: > > 1) the table was spread across these 5 RSs > 2) the query moved around on the cluster as RSs failed > > Perhaps you could tell me if #2 is possible. > > We are running v0.94.9 > > From: Ted Yu [mailto:[email protected]] > Sent: Wednesday, February 25, 2015 7:24 AM > To: [email protected] > Cc: Development > Subject: Re: Table.get(List<Get>) overwhelms several RSs > > Was the underlying table balanced (meaning its regions spread evenly > across region servers) ? > > What release of HBase are you using ? > > Cheers > > On Wed, Feb 25, 2015 at 7:08 AM, Ted Tuttle <[email protected]<mailto: > [email protected]>> wrote: > Hello- > > In the last week we had multiple times where we lost 5 of 8 RSs in the > space of a few minutes because of slow GCs. > > We traced this back to a client calling Table.get(List<Get> gets) with a > collection containing ~4000 individual gets. > > We've worked around this by limiting the number of Gets we send in a > single call to Table.get(List<Get>) > > Is there some configuration parameter that we are missing here? > Thanks, > Ted > >
