Re: major hdfs issues

2011-03-30 Thread Andrew Purtell
cient load. Added item #20 to http://wiki.apache.org/hadoop/Hbase/Troubleshooting Sorry I didn't get to this sooner, Jack. - Andy --- On Wed, 3/30/11, Jack Levin wrote: > From: Jack Levin > Subject: Re: major hdfs issues > To: user@hbase.apache.org > Cc: "Suraj Varma

Re: major hdfs issues

2011-03-30 Thread Stack
Thanks for updating the list Jack. I added a note to our 'book' on nproc and referenced your email below (Will push the changes to the website later). Good stuff, St.Ack On Wed, Mar 30, 2011 at 7:31 PM, Jack Levin wrote: > Thanks to everyone chiming in to help me fix this issue... It has now > b

Re: major hdfs issues

2011-03-30 Thread Jack Levin
Thanks to everyone chiming in to help me fix this issue... It has now been resolved, JD and I spend some time looking at thread limits and apparently, our userid 'hadoop' had nproc limit (default) set to 1024, this of course caused the issue of running out of threads every time we were under load,

Re: major hdfs issues

2011-03-13 Thread Todd Lipcon
On Sun, Mar 13, 2011 at 1:33 PM, Jack Levin wrote: > we are running at 128000 ulimit -n.   I am pretty sure the culpreet is the > thrift server, it opens up 20k threads under load, and crashes all other > servers by taking away RAM. > Do you guys disable tcp cookies also?  In regards to iptables,

Re: major hdfs issues

2011-03-13 Thread Jack Levin
we are running at 128000 ulimit -n. I am pretty sure the culpreet is the thrift server, it opens up 20k threads under load, and crashes all other servers by taking away RAM. Do you guys disable tcp cookies also? In regards to iptables, what is the best way to disable? -Jack On Sat, Mar 12, 20

Re: major hdfs issues

2011-03-12 Thread Todd Lipcon
You may also want to look at the value set for ulimit -u - it's unlimited on many OSes, but RHEL6 in particular sets it way too low, which will cause the "unable to creative native thread". What OS you running? The conntrack error has to do with ip_conntrack, which is an iptables module that keeps

Re: major hdfs issues

2011-03-12 Thread Jack Levin
Awesome, thanks... This is similar to mysql max-conn setting. -Jack On Sat, Mar 12, 2011 at 11:29 AM, Stack wrote: > I opened HBASE-3628 to expose the TThreadPoolServer options on the > command-line for thrift server. > St.Ack > > On Sat, Mar 12, 2011 at 11:20 AM, Stack wrote: > > Via Bryan (a

Re: major hdfs issues

2011-03-12 Thread Stack
I opened HBASE-3628 to expose the TThreadPoolServer options on the command-line for thrift server. St.Ack On Sat, Mar 12, 2011 at 11:20 AM, Stack wrote: > Via Bryan (and J-D), by default we use the thread pool server from > Thrift (unless you choose the non-blocking option): > > 978       LOG.inf

Re: major hdfs issues

2011-03-12 Thread Stack
Via Bryan (and J-D), by default we use the thread pool server from Thrift (unless you choose the non-blocking option): 978 LOG.info("starting HBase ThreadPool Thrift server on " + listenAddress + ":" + Integer.toString(listenPort)); 979 server = new TThreadPoolServer(processor, serverT

Re: major hdfs issues

2011-03-12 Thread Stack
I don't see any bounding in the thrift code. Asking Bryan St.Ack On Sat, Mar 12, 2011 at 10:04 AM, Jack Levin wrote: > So our problem is this: when we restart a region server, or it goes > down, hbase slows down, while we send super high frequency thrift > calls from our PHP front-end APP we

Re: major hdfs issues

2011-03-12 Thread Jack Levin
So our problem is this: when we restart a region server, or it goes down, hbase slows down, while we send super high frequency thrift calls from our PHP front-end APP we actually spawn up 2+ threads on thrift, and what this does is destroys all memory on the boxes, and causes DNs just to shut d

Re: major hdfs issues

2011-03-12 Thread Suraj Varma
>> to:java.lang.OutOfMemoryError: unable to create new native thread This indicates that you are oversubscribed on your RAM to the extent that the JVM doesn't have any space to create native threads (which are allocated outside of the JVM heap.) You may actually have to _reduce_ your heap sizes t

Re: major hdfs issues

2011-03-11 Thread Jack Levin
I am noticing following errors also: 2011-03-11 17:52:00,376 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438, infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due to:java.lang.OutOfMemoryEr

Re: major hdfs issues

2011-03-10 Thread Ryan Rawson
Looks like a datanode went down. InterruptedException is how java uses to interrupt IO in threads, its similar to the EINTR errno. That means the actual source of the abort is higher up... So back to how InterruptedException works... at some point a thread in the JVM decides that the VM should a

Re: major hdfs issues

2011-03-10 Thread Jack Levin
http://pastebin.com/ZmsyvcVc Here is the regionserver log, they all have similar stuff, On Thu, Mar 10, 2011 at 11:34 AM, Stack wrote: > Whats in the regionserver logs? Please put up regionserver and > datanode excerpts. > Thanks Jack, > St.Ack > > On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin

Re: major hdfs issues

2011-03-10 Thread Stack
Whats in the regionserver logs? Please put up regionserver and datanode excerpts. Thanks Jack, St.Ack On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin wrote: > All was well, until this happen: > > http://pastebin.com/iM1niwrS > > and all regionservers went down, is this xciever issue? > > > dfs.dat