subject:"\[ceph\-users\] osd down after server failure"

Re: [ceph-users] osd down after server failure

2013-10-14 Thread Dong Yuan

>From your informantion, the osd log ended with " 2013-10-14 06:21:26.727681 7f02690f9780 10 osd.47 43203 load_pgs 3.df1_TEMP clearing temp" That means the osd is loading all PG directories from the disk. If there is any I/O error (disk or xfs error), the process couldn't finished. Suggest resta

Re: [ceph-users] osd down after server failure

2013-10-14 Thread Sage Weil

Is osd.47 the one with the bad disk? I should not start. If there are other osds on the same host that aren't started with 'service ceph start', you may have to mention them by name (the old version of the script would stop on the first error instead of continuing). e.g., service ceph start

Re: [ceph-users] osd down after server failure

2013-10-14 Thread Dominik Mostowiec

Hi I have found somthing. After restart time was wrong on server (+2hours) before ntp has fixed it. I restarted this 3 osd - it not helps. It is possible that ceph banned this osd? Or after start with wrong time osd has broken hi's filestore? -- Regards Dominik 2013/10/14 Dominik Mostowiec : > H

[ceph-users] osd down after server failure

2013-10-13 Thread Dominik Mostowiec

Hi, I had server failure that starts from one disk failure: Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023986] sd 4:2:26:0: [sdaa] Unhandled error code Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023990] sd 4:2:26:0: [sdaa] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Oct 14 03:25:04 s