Re: [ceph-users] Monitor segfault

Gregory Farnum Mon, 31 Aug 2015 01:46:46 -0700

On Mon, Aug 31, 2015 at 9:33 AM, Eino Tuominen <e...@utu.fi> wrote:
> Hello,
>
> I'm getting a segmentation fault error from the monitor of our test cluster. 
> The cluster was in a bad state because I have recently removed three hosts 
> from it. Now I started cleaning it up and first marked the removed osd's as 
> lost (ceph osd lost), and then I tried to remove the osd's from the crush map 
> (ceph osd crush remove). After a few successful commands the cluster ceased 
> to respond. On monitor seemed to stay up (it was responding through the admin 
> socket), so I stopped it and used monmaptool to remove the failed monitor 
> from the monmap. But, now also the second monitor segfaults when I try to 
> start it.
>
> The cluster does not have any important data, but I'd like to get the 
> monitors up as a practice. How do I debug this further?
>
> Linux cephmon-test-02 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 
> 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> The output:
>
>  -2> 2015-08-31 10:28:52.606894 7f8ab493c8c0  0 log_channel(cluster) log 
> [INF] : pgmap v1845959: 6288 pgs: 55 inactive, 153 active, 473 active+clean, 
> 1 stale+active+undersized+degraded+remapped, 455 stale+incomplete, 272 
> peering, 145 stale+down+peering, 6 degraded+remapped, 1 
> active+recovery_wait+degraded, 70 undersized+degraded+remapped, 504 
> incomplete, 206 active+undersized+degraded+remapped, 2 
> stale+active+clean+inconsistent, 101 down+peering, 59 
> active+undersized+degraded+remapped+backfilling, 294 remapped, 11 
> active+undersized+degraded+remapped+wait_backfill, 1264 active+remapped, 5 
> stale+undersized+degraded, 1 active+undersized+remapped, 1 
> stale+active+undersized+degraded, 23 stale+remapped+incomplete, 297 
> remapped+peering, 1 active+remapped+wait_backfill, 1 degraded, 32 
> undersized+degraded, 454 active+undersized+degraded, 7 
> active+recovery_wait+degraded+remapped, 1134 stale+active+clean, 142 
> remapped+incomplete, 115 stale+peering, 3 active+recovering+degraded+remappe
 d;
>   10014 GB data, 5508 GB used, 41981 GB / 47489 GB avail; 33343/19990223 
> objects degraded (0.167%); 45721/19990223 objects misplaced (0.229%)
>     -1> 2015-08-31 10:28:52.606969 7f8ab493c8c0  0 log_channel(cluster) log 
> [INF] : mdsmap e1: 0/0/1 up
>      0> 2015-08-31 10:28:52.617974 7f8ab493c8c0 -1 *** Caught signal 
> (Segmentation fault) **
>  in thread 7f8ab493c8c0
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-mon() [0x9a98aa]
>  2: (()+0x10340) [0x7f8ab3a3d340]
>  3: (crush_do_rule()+0x292) [0x85ada2]
>  4: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, 
> std::allocator<int> >*, int*, unsigned int*) const+0xeb) [0x7a85cb]
>  5: (OSDMap::pg_to_raw_up(pg_t, std::vector<int, std::allocator<int> >*, 
> int*) const+0x94) [0x7a8a64]
>  6: (OSDMap::remove_redundant_temporaries(CephContext*, OSDMap const&, 
> OSDMap::Incremental*)+0x317) [0x7ab8f7]
>  7: (OSDMonitor::create_pending()+0xf69) [0x60fdb9]
>  8: (PaxosService::_active()+0x709) [0x6047b9]
>  9: (PaxosService::election_finished()+0x67) [0x604ad7]
>  10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, 
> std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, 
> std::less<int>, std::allocator<int> > const*)
> +0x236) [0x5c34a6]
>  11: (Monitor::win_standalone_election()+0x1cc) [0x5c388c]
>  12: (Monitor::bootstrap()+0x9bb) [0x5c42eb]
>  13: (Monitor::init()+0xd5) [0x5c4645]
>  14: (main()+0x2470) [0x5769c0]
>  15: (__libc_start_main()+0xf5) [0x7f8ab1ec7ec5]
>  16: /usr/bin/ceph-mon() [0x5984f7]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.


Can you get a core dump, open it in gdb, and provide the output of the
"backtrace" command?

The cluster is for some reason trying to create new PGs and something
is going wrong; I suspect the monitors aren't handling the loss of PGs
properly. :/
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor segfault

Reply via email to