Re: [ceph-users] osd.1 marked down after no pg stats for ~900seconds

Shane Gibson Sun, 21 Jun 2015 08:42:38 -0700

Cristian,

I'm not sure off hand what's up - but can you increase the logging levels, then 
rerun the test:


  http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/

See the "Runtime" section for injecting the logging arguments after starting - 
or change the {cluster}.conf (eg /etc/ceph/ceph.conf) settings with the a 
stanza like:


[osd]
        debug osd = 20/20
        debug journal = 20/20
        debug monc = 20/20

~~shane


On 6/21/15, 8:22 AM, "ceph-users on behalf of Cristian Falcas" 
<ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com> on 
behalf of cristi.fal...@gmail.com<mailto:cristi.fal...@gmail.com>> wrote:

Those are the logs from that moment from "ceph -w":

2015-06-21 14:09:34.542891 mon.0 [INF] pgmap v172617: 512 pgs: 512 
active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 GB avail; 0 B/s rd, 
12695 B/s wr, 3 op/s
2015-06-21 14:09:39.544302 mon.0 [INF] pgmap v172618: 512 pgs: 512 
active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 GB avail; 103 kB/s rd, 
9419 B/s wr, 22 op/s
2015-06-21 14:09:44.544762 mon.0 [INF] pgmap v172619: 512 pgs: 512 
active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 GB avail; 209 kB/s rd, 
9009 B/s wr, 28 op/s
2015-06-21 14:09:38.489980 osd.0 [INF] 8.4c scrub starts
2015-06-21 14:09:39.143548 osd.0 [INF] 8.4c scrub ok
2015-06-21 14:09:39.490283 osd.0 [INF] 8.4d scrub starts
2015-06-21 14:09:40.170572 osd.0 [INF] 8.4d scrub ok
2015-06-21 14:09:41.490652 osd.0 [INF] 8.4e scrub starts
2015-06-21 14:09:42.269054 osd.0 [INF] 8.4e scrub ok
2015-06-21 14:09:44.491206 osd.0 [INF] 8.4f scrub starts
2015-06-21 14:09:45.213658 osd.0 [INF] 8.4f scrub ok
2015-06-21 14:09:49.629596 mon.0 [INF] pgmap v172620: 512 pgs: 512 
active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 GB avail; 104 kB/s rd, 
8528 B/s wr, 7 op/s
2015-06-21 14:09:54.630316 mon.0 [INF] pgmap v172621: 512 pgs: 512 
active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 GB avail; 0 B/s rd, 
20306 B/s wr, 5 op/s
2015-06-21 14:09:55.443987 mon.0 [INF] osd.1 marked down after no pg stats for 
904.221819seconds
2015-06-21 14:09:55.453660 mon.0 [INF] osdmap e122: 2 osds: 1 up, 2 in
2015-06-21 14:09:55.458644 mon.0 [INF] pgmap v172622: 512 pgs: 128 
stale+active+clean, 384 active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 
GB avail; 0 B/s rd, 28136 B/s wr, 8 op/s
2015-06-21 14:09:47.491759 osd.0 [INF] 8.50 scrub starts
2015-06-21 14:09:48.574902 osd.0 [INF] 8.50 scrub ok
2015-06-21 14:09:48.575136 osd.0 [INF] 8.50 scrub starts
2015-06-21 14:09:48.678662 osd.0 [INF] 8.50 scrub ok
2015-06-21 14:09:52.575940 osd.0 [INF] 8.51 scrub starts
2015-06-21 14:09:53.314203 osd.0 [INF] 8.51 scrub ok
2015-06-21 14:09:59.650334 mon.0 [INF] pgmap v172623: 512 pgs: 128 
stale+active+clean, 384 active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 
GB avail; 2359 kB/s rd, 12272 B/s wr, 69 op/s
2015-06-21 14:10:04.633154 mon.0 [INF] pgmap v172624: 512 pgs: 128 
stale+active+clean, 384 active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 
GB avail; 1286 kB/s rd, 20523 B/s wr, 41 op/s
2015-06-21 14:10:00.578299 osd.0 [INF] 8.52 scrub starts
2015-06-21 14:10:01.172525 osd.0 [INF] 8.52 scrub ok
2015-06-21 14:10:02.578690 osd.0 [INF] 8.53 scrub starts
2015-06-21 14:10:03.178836 osd.0 [INF] 8.53 scrub ok
2015-06-21 14:10:09.634306 mon.0 [INF] pgmap v172625: 512 pgs: 128 
stale+active+clean, 384 active+clean; 502 GB data, 183 GB used, 2279 GB / 2469 
GB avail; 0 B/s rd, 24171 B/s wr, 4



On Sun, Jun 21, 2015 at 6:19 PM, Cristian Falcas 
<cristi.fal...@gmail.com<mailto:cristi.fal...@gmail.com>> wrote:
Hello,

When doing a fio test on a vm, after some time the osd goes down with this 
error:

osd.1 marked down after no pg stats for 904.221819seconds

Anyone can help me with this error?

I can't find any errors on the physical mascine at that time. Only one vm is 
running, the one with the fio test.

Also this is repeatable, meaning that if I reboot the vm and restart the test, 
after a while the osd goes down again.

Version used:
# rpm -qa | grep ceph | sort
ceph-0.94.2-0.el7.centos.x86_64
ceph-common-0.94.2-0.el7.centos.x86_64
libcephfs1-0.94.2-0.el7.centos.x86_64
python-ceph-compat-0.94.2-0.el7.centos.x86_64
python-cephfs-0.94.2-0.el7.centos.x86_64

I don't know if that matters, but the physical machine is a ceph+openstack all 
in one installation.

Thank you,
Cristian Falcas

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd.1 marked down after no pg stats for ~900seconds

Reply via email to