Hi Greg,

I tried running it on a physical machine, and the task completed without
any crashes. However, I am still unable to figure out the reason for the
mds crashing in case of the VMs. I don't see RAM being a bottleneck. Also,
simply restarting the mds restarts the execution of the code (and the mds
crashes at fixed intervals too).

Regards
Varun


On Thu, Apr 25, 2013 at 9:55 PM, Gregory Farnum <g...@inktank.com> wrote:

> On Thu, Apr 25, 2013 at 8:22 AM, Noah Watkins <noah.watk...@inktank.com>
> wrote:
> >
> > On Apr 25, 2013, at 4:08 AM, Varun Chandramouli <varun....@gmail.com>
> wrote:
> >
> >> 2013-04-25 13:54:36.182188 bff8cb40 -1 common/Thread.cc: In function
> 'void Thread::create(size_t)' thread bff8cb40 time 2013-04-25
> 13:54:36.053392#012common/Thread.cc: 110: FAILED assert(ret == 0)#012#012
> ceph version 0.58-500-gaf3b163
> (af3b16349a49a8aee401e27c1b71fd704b31297c)#012 1: (Thread::create(unsigned
> int)+0xdc) [0x843866c]#012 2: (Pipe::start_writer()+0x4e) [0x84d837e]#012
> 3: (Pipe::accept()+0x4955) [0x84ee625]#012 4: (Pipe::reader()+0x1758)
> [0x84f10b8]#012 5: (Pipe::Reader::entry()+0x1e) [0x84f2dee]#012 6:
> (Thread::_entry_func(void*)+0xf) [0x843833f]#012 7: (()+0x6d4c)
> [0xb7784d4c]#012 8: (clone()+0x5e) [0xb7106ace]#012 NOTE: a copy of the
> executable, or `objdump -rdS <executable>` is needed to interpret this.
> >
> > The assertion failure here doesn't look like any of the MDS problems I
> was getting with Hadoop, but someone else may recognize the problem. A
> couple things that might be helpful. First, I think that multi-MDS is less
> stable right now than running a single MDS. Second, using GDB to run
> 'thread apply all bt' to the crashed MDS core file would provide a lot more
> context to help debug.
>
> That assert indicates the MDS tried to create a new thread and got an
> error back. Given that your MDS is already running, this means it's
> not an issue with thread setup — you've run into a resource limit of
> some kind. Since you're in VMs I'll guess you've run out of RAM, but
> it's also possible that the process has exceeded some limitations
> imposed by the kernel.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>



-- 
Varun Chandramouli
Birla Institute of Technology & Science
http://in.linkedin.com/in/chandramoulivarun
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to