Hi Greg, I tried running it on a physical machine, and the task completed without any crashes. However, I am still unable to figure out the reason for the mds crashing in case of the VMs. I don't see RAM being a bottleneck. Also, simply restarting the mds restarts the execution of the code (and the mds crashes at fixed intervals too).
Regards Varun On Thu, Apr 25, 2013 at 9:55 PM, Gregory Farnum <g...@inktank.com> wrote: > On Thu, Apr 25, 2013 at 8:22 AM, Noah Watkins <noah.watk...@inktank.com> > wrote: > > > > On Apr 25, 2013, at 4:08 AM, Varun Chandramouli <varun....@gmail.com> > wrote: > > > >> 2013-04-25 13:54:36.182188 bff8cb40 -1 common/Thread.cc: In function > 'void Thread::create(size_t)' thread bff8cb40 time 2013-04-25 > 13:54:36.053392#012common/Thread.cc: 110: FAILED assert(ret == 0)#012#012 > ceph version 0.58-500-gaf3b163 > (af3b16349a49a8aee401e27c1b71fd704b31297c)#012 1: (Thread::create(unsigned > int)+0xdc) [0x843866c]#012 2: (Pipe::start_writer()+0x4e) [0x84d837e]#012 > 3: (Pipe::accept()+0x4955) [0x84ee625]#012 4: (Pipe::reader()+0x1758) > [0x84f10b8]#012 5: (Pipe::Reader::entry()+0x1e) [0x84f2dee]#012 6: > (Thread::_entry_func(void*)+0xf) [0x843833f]#012 7: (()+0x6d4c) > [0xb7784d4c]#012 8: (clone()+0x5e) [0xb7106ace]#012 NOTE: a copy of the > executable, or `objdump -rdS <executable>` is needed to interpret this. > > > > The assertion failure here doesn't look like any of the MDS problems I > was getting with Hadoop, but someone else may recognize the problem. A > couple things that might be helpful. First, I think that multi-MDS is less > stable right now than running a single MDS. Second, using GDB to run > 'thread apply all bt' to the crashed MDS core file would provide a lot more > context to help debug. > > That assert indicates the MDS tried to create a new thread and got an > error back. Given that your MDS is already running, this means it's > not an issue with thread setup — you've run into a resource limit of > some kind. Since you're in VMs I'll guess you've run out of RAM, but > it's also possible that the process has exceeded some limitations > imposed by the kernel. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > -- Varun Chandramouli Birla Institute of Technology & Science http://in.linkedin.com/in/chandramoulivarun
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com