Re: mslab enabled jvm crash

2011-06-06 Thread Jack Levin
We have two production clusters, and we don't on either. We also have days and days worth of no CMF reported. Here is my config that works great for us: export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=2G" # Uncomment below to enable java garbage collection logging.

Re: mslab enabled jvm crash

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 10:06 AM, Wayne wrote: > I had 25 sec CMF failure this morning...looks like bulk inserts are required > along with possibly weekly/daily scheduled rolling restarts. Do most > production clusters run rolling restarts on a regular basis to give the JVM > a fresh start? > We d

Re: mslab enabled jvm crash

2011-06-06 Thread Wayne
I had 25 sec CMF failure this morning...looks like bulk inserts are required along with possibly weekly/daily scheduled rolling restarts. Do most production clusters run rolling restarts on a regular basis to give the JVM a fresh start? Thanks. On Thu, Jun 2, 2011 at 1:56 PM, Wayne wrote: > JVM

Re: mslab enabled jvm crash

2011-06-02 Thread Wayne
JVM w/ 10g Heap settings below. Once we are "bored" with stability we will try to up the 65 to 70 which seems to be standard. -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=65 -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:+Use

Re: mslab enabled jvm crash

2011-06-02 Thread Wayne
Our storefileindex was pushing 3g. We used the hfile tool to see that we had very large keys (50-70 bytes) and small values (5-7 bytes). Jack pointed me to a great Jira about this: https://issues.apache.org/jira/browse/HBASE-3551 . We HAD to increase from the default and we picked 256k to reduce t

Re: mslab enabled jvm crash

2011-06-02 Thread Stack
Thanks for writing back to the list Wayne. Hopefully this message hits you before the next CMF does. Would you mind pasting your final JVM args and any other configs you think one of us could use writing up your war story for the 'book' as per Jeff Whiting's suggestion? Good stuff, St.Ack On T

Re: mslab enabled jvm crash

2011-06-02 Thread Erik Onnen
I'd be particularly interested how you guys came to the conclusion for increasing block size and how you arrived at the size you chose. For example, what metrics were you looking at that indicated the block size was too small and what tests did you run to arrive at 256k for the correct size?

Re: mslab enabled jvm crash

2011-06-02 Thread Jeff Whiting
Is there any information from this thread that we should make sure gets into the hbase book? it seem like Wayne went through a lot of work to get good performance and it would be nice if all the information he gleaned from the community were recorded somewhere. If it doesn't make sense to put

Re: mslab enabled jvm crash

2011-06-02 Thread Wayne
I have finally been able to spend enough time to digest/test all recommendations and get this under control. I wanted to thank Stack, Jack Levin, and Ted Dunning for their input. Basically our memory was being pushed to the limit and the JVM does not like/can not handle this. We are successfully u

Re: mslab enabled jvm crash

2011-05-26 Thread Erik Onnen
On Thu, May 26, 2011 at 11:01 AM, Stack wrote: > What JVM configs are you running Erik? > St.Ack Omitting some of the irrelevant ones... JAVA_OPTS="-XX:+UseLargePages -Xms8192M -Xmx8192M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -Xlogg

Re: mslab enabled jvm crash

2011-05-26 Thread Ted Dunning
Bulk load is just another front door. It is very reasonable to have an adaptive policy that throttles uploads and switches to fairly frequent bulk loading when the load gets very high. Whether this is an option depends on your real-time SLA's. On Thu, May 26, 2011 at 10:55 AM, Wayne wrote: > >

Re: mslab enabled jvm crash

2011-05-26 Thread Stack
On Thu, May 26, 2011 at 10:42 AM, Wayne wrote: > I left parnew alone (did not add any settings). I also did not increase the > heap. 8g with 50% for memstore. Below are the JVM settings. > > The errors I pasted occurred after running for only maybe 12 hours. The > cluster as a whole has been runni

Re: mslab enabled jvm crash

2011-05-26 Thread Jack Levin
It might sound crazy, but if you have plenty of CPU, consider lowering your NewSize to like 30MB, if you do that your ParNews will be more frequent, but hitting CMS failure will be less likely, this is what we seen. -Jack On Thu, May 26, 2011 at 10:51 AM, Jack Levin wrote: > Wayne, we get CMS fa

Re: mslab enabled jvm crash

2011-05-26 Thread Stack
What JVM configs are you running Erik? St.Ack On Wed, May 25, 2011 at 6:31 PM, Erik Onnen wrote: > On Wed, May 25, 2011 at 2:44 PM, Wayne wrote: >> What are your write levels? We are pushing 30-40k writes/sec/node on 10 >> nodes for 24-36-48-72 hours straight. We have only 4 writers per node so

Re: mslab enabled jvm crash

2011-05-26 Thread Wayne
On Thu, May 26, 2011 at 1:41 PM, Stack wrote: > On Thu, May 26, 2011 at 6:08 AM, Wayne wrote: > > I think our problem is the load pattern. Since we use a very controlled q > > based method to do work our Python code is relentless in terms of keeping > > the pressure up. In our testing we will Q

Re: mslab enabled jvm crash

2011-05-26 Thread Jack Levin
Wayne, we get CMS failures also, I am pretty sure they are fragmentation related: 2011-05-26T09:20:00.304-0700: 206371.599: [GC 206371.599: [ParNew (promotion failed): 76633K->76023K(76672K), 0.0924180 secs]206371.692: [CMS: 11452308K->7142504K(122 02816K), 13.5870310 secs] 11525447K->7142504K(122

Re: mslab enabled jvm crash

2011-05-26 Thread Wayne
I left parnew alone (did not add any settings). I also did not increase the heap. 8g with 50% for memstore. Below are the JVM settings. The errors I pasted occurred after running for only maybe 12 hours. The cluster as a whole has been running for 24 hours with dropping a node, but short time span

Re: mslab enabled jvm crash

2011-05-26 Thread Stack
On Thu, May 26, 2011 at 6:08 AM, Wayne wrote: > I think our problem is the load pattern. Since we use a very controlled q > based method to do work our Python code is relentless in terms of keeping > the pressure up. In our testing we will Q up 500k messages with 10k writes > per message that all

Re: mslab enabled jvm crash

2011-05-26 Thread Stack
On Thu, May 26, 2011 at 9:00 AM, Wayne wrote: > Looking more closely I can see that we are still > getting Concurrent Mode Failures on some of the nodes but they are only > lasting for 10s so the nodes don't go away. Is this considered "normal"? > With CMSInitiatingOccupancyFraction=65 I would sus

Re: mslab enabled jvm crash

2011-05-26 Thread Wayne
Looking more closely I can see that we are still getting Concurrent Mode Failures on some of the nodes but they are only lasting for 10s so the nodes don't go away. Is this considered "normal"? With CMSInitiatingOccupancyFraction=65 I would suspect this is not normal?? Here is a link to some GC lo

Re: mslab enabled jvm crash

2011-05-26 Thread Wayne
Attached is our memstore size graph...not sure it will make it to the post. Ours it definitely not as gracefull as yours. You can see where we restarted last 16 hours ago. We have not had any issues since, but we usually don't have problems until 24-48 hours into loads. Stack yes the 65% seams to h

Re: mslab enabled jvm crash

2011-05-26 Thread Jack Levin
Wayne, I think you are hitting fragmentation, how often do you flush? Can you share memstore flush graphs? Here is ours: http://img851.yfrog.com/img851/9814/screenshot20110526at124.png We run at 12G Heap, 20% memstore size, 50% blockcache, have recently added incremental mode to combat too frequ

Re: mslab enabled jvm crash

2011-05-25 Thread Stack
Python is great. If you can hold your nose a little longer, you are either almost there, or its a lost cause so bare with us a little longer. Did the configs. above make a difference? (Initiating compaction at 65% is conservative -- you'll be burning lots of CPU -- but probably good to start her

Re: mslab enabled jvm crash

2011-05-25 Thread Erik Onnen
On Wed, May 25, 2011 at 2:44 PM, Wayne wrote: > What are your write levels? We are pushing 30-40k writes/sec/node on 10 > nodes for 24-36-48-72 hours straight. We have only 4 writers per node so we > are hardly overwhelming the nodes. Disk utilization runs at 10-20%, load is > max 50% including so

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
That may be the best advice I ever got...although I would say 9 months we didn't have 1 line of python and now we have a fantastic mpp framework built with python with a team most of which never wrote a line of python before. But...java is not python... We have shredded our relational past and fr

Re: mslab enabled jvm crash

2011-05-25 Thread Ted Dunning
This may be the most important detail of all. It is important to go with your deep skills. I would be a round peg in your square shop and you would be a square one in my round one. On Wed, May 25, 2011 at 5:55 PM, Wayne wrote: > We are not a Java shop, and do not want to become one. I think to

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
We are using std thrift from python. All writes are batched into usually 30k writes per batch. The writes are small double/varchar(100) type values. Our current write performance is fine for our needs...our concern is that they are not sustainable over time given the GC timeouts. Per the 4 items a

Re: mslab enabled jvm crash

2011-05-25 Thread Ted Dunning
How large are these writes? Are you using asynchbase or other alternative client implementation? Are you batching updates? On Wed, May 25, 2011 at 2:44 PM, Wayne wrote: > What are your write levels? We are pushing 30-40k writes/sec/node on 10 > nodes for 24-36-48-72 hours straight. We have onl

Re: mslab enabled jvm crash

2011-05-25 Thread Ted Dunning
We know several things that are in common with your hbase and your cassandra. a) the jvm b) the machines c) the os d) the (necessary) prejudices of the implementors and op staff On the other hand, we know of other hbase (and cassandra) installations running similar volumes on the same JVM. I

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
What are your write levels? We are pushing 30-40k writes/sec/node on 10 nodes for 24-36-48-72 hours straight. We have only 4 writers per node so we are hardly overwhelming the nodes. Disk utilization runs at 10-20%, load is max 50% including some app code, and memory is the 8g JVM out of 24G. We ru

Re: mslab enabled jvm crash

2011-05-25 Thread Erik Onnen
On Wed, May 25, 2011 at 11:39 AM, Ted Dunning wrote: > It should be recognized that your experiences are a bit out of the norm > here.  Many hbase installations use more recent JVM's without problems. Indeed, we run u25 on CentOS 5.6 and over several days uptime it's common to never see a full GC

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
I have restarted kicking in CMS earlier (65%) and turning off the incremental. We have an 8g heap...should we go to 10g (24g in box)? More memory for the JVM has never seemed to be better...though maybe with lots of hot regions and our flush size we might be pushing it? Should we up the 50% for mem

Re: mslab enabled jvm crash

2011-05-25 Thread Stack
On Wed, May 25, 2011 at 11:08 AM, Wayne wrote: > I tried to turn off all special JVM settings we have tried in the past. > Below are link to the requested configs. I will try to find more logs for > the full GC. We just made the switch and on this node it has > only occurred once in the scope of t

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
Most hbase installations also seem to recommend bulk inserts for loading data. We are pushing it more than most in terms of actually using the client API to load large volumes of data. We keep delaying putting hbase into production as nodes going awol for as much as 2+ minutes we can not accept as

Re: mslab enabled jvm crash

2011-05-25 Thread Ted Dunning
Wayne, It should be recognized that your experiences are a bit out of the norm here. Many hbase installations use more recent JVM's without problems. As such, it may be premature to point the finger at the JVM as opposed to the workload or environmental factors. Such a premature diagnosis can m

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
We have the line commented out with the new ratio. I will turn off the incremental mode. We do have cache turned off on the table level and have set to 1% for .meta. only. We do not use the block cache. I will keep testing. Frankly u25 scares as well as older JVMs seem much better based on previou

Re: mslab enabled jvm crash

2011-05-25 Thread Todd Lipcon
For your GC settings: - i wouldn't tune newratio or survivor ratio at all - if you want to tame your young GC pauses, use -Xmn to pick a new size - eg -Xmn256m - turn off CMS Incremental Mode if you're running on real server hardware HBase settings: - 1% of heap to block cache seems strange. maybe

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
I tried to turn off all special JVM settings we have tried in the past. Below are link to the requested configs. I will try to find more logs for the full GC. We just made the switch and on this node it has only occurred once in the scope of the current log (it may have rolled?). Thanks. http://p

Re: mslab enabled jvm crash

2011-05-25 Thread Todd Lipcon
Hi Wayne, Looks like your RAM might be oversubscribed. Could you paste your hbase-site.xml and hbase-env.sh files? Also looks like you have some strange GC settings on (eg perm gen collection which we don't really need) If you can paste a larger segment of GC logs (enough to include at least two

Re: mslab enabled jvm crash

2011-05-25 Thread Wayne
We switched to u25 and reverted the JVM settings to those recommended. Now we have concurrent mode failures that occur lasting more than 60 seconds while not under hardly any load Below are the entries from the JVM log. Of course we can up the zookeeper timeout to 2 min or 10 min for that matt

Re: mslab enabled jvm crash

2011-05-23 Thread Stack
On Mon, May 23, 2011 at 8:42 AM, Wayne wrote: > Our experience with any newer JVM was that fragmentation was much much worse > and Concurrent Mode Failures were rampant. We kept moving back in releases >  to get to what we use now. We are on CentOS 5.5. We will try to use u24. > CMS's you should

Re: mslab enabled jvm crash

2011-05-23 Thread Wayne
gt;> > >> Also I'm going to assume that you're not running your ZK on the same > nodes > >> as your data nodes, but you know what they say about assumptions... > >> > >> > >> > From: tdunn...@maprtech.com > >> > Date: Mon, 23 May 2011 07:33

Re: mslab enabled jvm crash

2011-05-23 Thread Stack
nodes, but you know what they say about assumptions... >> >> >> > From: tdunn...@maprtech.com >> > Date: Mon, 23 May 2011 07:33:05 -0700 >> > Subject: Re: mslab enabled jvm crash >> > To: user@hbase.apache.org >> > >> > Do you have

Re: mslab enabled jvm crash

2011-05-23 Thread Wayne
he same nodes > as your data nodes, but you know what they say about assumptions... > > > > From: tdunn...@maprtech.com > > Date: Mon, 23 May 2011 07:33:05 -0700 > > Subject: Re: mslab enabled jvm crash > > To: user@hbase.apache.org > > > > Do you have the

Re: mslab enabled jvm crash

2011-05-23 Thread Wayne
We have not used a more recent JVM with the mslab enabled. In the past (pre 0.90.1) we had a TON of problems (CMF) with more recent JVMs so we avoid them. What is the recommended JVM / settings to use with mslab enabled? Thanks. On Mon, May 23, 2011 at 10:33 AM, Ted Dunning wrote: > Do you ha

RE: mslab enabled jvm crash

2011-05-23 Thread Michael Segel
; From: tdunn...@maprtech.com > Date: Mon, 23 May 2011 07:33:05 -0700 > Subject: Re: mslab enabled jvm crash > To: user@hbase.apache.org > > Do you have the same problem with a more recent JVM? > > On Mon, May 23, 2011 at 4:52 AM, Wayne wrote: > > > I have switched

Re: mslab enabled jvm crash

2011-05-23 Thread Ted Dunning
Do you have the same problem with a more recent JVM? On Mon, May 23, 2011 at 4:52 AM, Wayne wrote: > I have switched to using the mslab enabled java setting to try to avoid GC > causing nodes to go awol but it almost appears to be worse. Below is the > latest problem with the JVM apparently actu

mslab enabled jvm crash

2011-05-23 Thread Wayne
I have switched to using the mslab enabled java setting to try to avoid GC causing nodes to go awol but it almost appears to be worse. Below is the latest problem with the JVM apparently actually crashing. I am using 0.90.1 with an 8GB heap. Is there a recommended JVM and recommended settings to be