[ceph-users] ceph-fuse hanging on df with ceph luminous >= 12.1.3
Hi, when trying to use df on a ceph-fuse mounted cephfs filesystem with ceph luminous >= 12.1.3 I'm having hangs with the following kind of messages in the logs: 2017-08-22 02:20:51.094704 7f80addb7700 0 client.174216 ms_handle_reset on 192.168.0.10:6789/0 The logs are only showing this type of messages and nothing more useful. The only possible way to resume the operations is to kill ceph-fuse and remount. Only df is hanging though, while file operations, like copy/rm/ls are working as expected. This behavior is only shown for ceph >= 12.1.3, while for example ceph-fuse on 12.1.2 works. Anyone has seen the same problems? Any help is highly appreciated. Thanks, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs degraded on ceph luminous 12.2.2
Hi, I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded. I have 2 active mds instances and 1 standby. All the active instances are now in replay state and show the same error in the logs: mds1 2018-01-08 16:04:15.765637 7fc2e92451c0 0 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 164 starting mds.mds1 at - 2018-01-08 16:04:15.785849 7fc2e92451c0 0 pidfile_write: ignore empty --pid-file 2018-01-08 16:04:20.168178 7fc2e1ee1700 1 mds.mds1 handle_mds_map standby 2018-01-08 16:04:20.278424 7fc2e1ee1700 1 mds.1.20635 handle_mds_map i am now mds.1.20635 2018-01-08 16:04:20.278432 7fc2e1ee1700 1 mds.1.20635 handle_mds_map state change up:boot --> up:replay 2018-01-08 16:04:20.278443 7fc2e1ee1700 1 mds.1.20635 replay_start 2018-01-08 16:04:20.278449 7fc2e1ee1700 1 mds.1.20635 recovery set is 0 2018-01-08 16:04:20.278458 7fc2e1ee1700 1 mds.1.20635 waiting for osdmap 21467 (which blacklists prior instance) mds2 2018-01-08 16:04:16.870459 7fd8456201c0 0 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 295 starting mds.mds2 at - 2018-01-08 16:04:16.881616 7fd8456201c0 0 pidfile_write: ignore empty --pid-file 2018-01-08 16:04:21.274543 7fd83e2bc700 1 mds.mds2 handle_mds_map standby 2018-01-08 16:04:21.314438 7fd83e2bc700 1 mds.0.20637 handle_mds_map i am now mds.0.20637 2018-01-08 16:04:21.314459 7fd83e2bc700 1 mds.0.20637 handle_mds_map state change up:boot --> up:replay 2018-01-08 16:04:21.314479 7fd83e2bc700 1 mds.0.20637 replay_start 2018-01-08 16:04:21.314492 7fd83e2bc700 1 mds.0.20637 recovery set is 1 2018-01-08 16:04:21.314517 7fd83e2bc700 1 mds.0.20637 waiting for osdmap 21467 (which blacklists prior instance) 2018-01-08 16:04:21.393307 7fd837aaf700 0 mds.0.cache creating system inode with ino:0x100 2018-01-08 16:04:21.397246 7fd837aaf700 0 mds.0.cache creating system inode with ino:0x1 The cluster is recovering as we are changing some of the osds, and there are a few slow/stuck requests, but I'm not sure if this is the cause, as there is apparently no data loss (until now). How can I force the MDSes to quit the replay state? Thanks for any help, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2
Thanks Lincoln, indeed, as I said the cluster is recovering, so there are pending ops: pgs: 21.034% pgs not active 1692310/24980804 objects degraded (6.774%) 5612149/24980804 objects misplaced (22.466%) 458 active+clean 329 active+remapped+backfill_wait 159 activating+remapped 100 active+undersized+degraded+remapped+backfill_wait 58 activating+undersized+degraded+remapped 27 activating 22 active+undersized+degraded+remapped+backfilling 6 active+remapped+backfilling 1 active+recovery_wait+degraded If it's just a matter to wait for the system to complete the recovery it's fine, I'll deal with that, but I was wondendering if there is a more suble problem here. OK, I'll wait for the recovery to complete and see what happens, thanks. Cheers, Alessandro Il 08/01/18 17:36, Lincoln Bryant ha scritto: Hi Alessandro, What is the state of your PGs? Inactive PGs have blocked CephFS recovery on our cluster before. I'd try to clear any blocked ops and see if the MDSes recover. --Lincoln On Mon, 2018-01-08 at 17:21 +0100, Alessandro De Salvo wrote: Hi, I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded. I have 2 active mds instances and 1 standby. All the active instances are now in replay state and show the same error in the logs: mds1 2018-01-08 16:04:15.765637 7fc2e92451c0 0 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 164 starting mds.mds1 at - 2018-01-08 16:04:15.785849 7fc2e92451c0 0 pidfile_write: ignore empty --pid-file 2018-01-08 16:04:20.168178 7fc2e1ee1700 1 mds.mds1 handle_mds_map standby 2018-01-08 16:04:20.278424 7fc2e1ee1700 1 mds.1.20635 handle_mds_map i am now mds.1.20635 2018-01-08 16:04:20.278432 7fc2e1ee1700 1 mds.1.20635 handle_mds_map state change up:boot --> up:replay 2018-01-08 16:04:20.278443 7fc2e1ee1700 1 mds.1.20635 replay_start 2018-01-08 16:04:20.278449 7fc2e1ee1700 1 mds.1.20635 recovery set is 0 2018-01-08 16:04:20.278458 7fc2e1ee1700 1 mds.1.20635 waiting for osdmap 21467 (which blacklists prior instance) mds2 2018-01-08 16:04:16.870459 7fd8456201c0 0 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 295 starting mds.mds2 at - 2018-01-08 16:04:16.881616 7fd8456201c0 0 pidfile_write: ignore empty --pid-file 2018-01-08 16:04:21.274543 7fd83e2bc700 1 mds.mds2 handle_mds_map standby 2018-01-08 16:04:21.314438 7fd83e2bc700 1 mds.0.20637 handle_mds_map i am now mds.0.20637 2018-01-08 16:04:21.314459 7fd83e2bc700 1 mds.0.20637 handle_mds_map state change up:boot --> up:replay 2018-01-08 16:04:21.314479 7fd83e2bc700 1 mds.0.20637 replay_start 2018-01-08 16:04:21.314492 7fd83e2bc700 1 mds.0.20637 recovery set is 1 2018-01-08 16:04:21.314517 7fd83e2bc700 1 mds.0.20637 waiting for osdmap 21467 (which blacklists prior instance) 2018-01-08 16:04:21.393307 7fd837aaf700 0 mds.0.cache creating system inode with ino:0x100 2018-01-08 16:04:21.397246 7fd837aaf700 0 mds.0.cache creating system inode with ino:0x1 The cluster is recovering as we are changing some of the osds, and there are a few slow/stuck requests, but I'm not sure if this is the cause, as there is apparently no data loss (until now). How can I force the MDSes to quit the replay state? Thanks for any help, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2
Hi, took quite some time to recover the pgs, and indeed the problem with the mds instances was due to the activating pgs. Once they were cleared the fs went back to the original state. I had to restart a few times some OSds though, in order to get all the pgs activated, and I didn't hit the limits on the max pgs, but I'm close to, so I have set them to 300 just to be safe (AFAIK it was the limit set to prior releases of ceph, not sure why it was lowered to 200 now). Thanks, Alessandro On Tue, 2018-01-09 at 09:01 +0100, Burkhard Linke wrote: > Hi, > > > On 01/08/2018 05:40 PM, Alessandro De Salvo wrote: > > Thanks Lincoln, > > > > indeed, as I said the cluster is recovering, so there are pending ops: > > > > > > pgs: 21.034% pgs not active > > 1692310/24980804 objects degraded (6.774%) > > 5612149/24980804 objects misplaced (22.466%) > > 458 active+clean > > 329 active+remapped+backfill_wait > > 159 activating+remapped > > 100 active+undersized+degraded+remapped+backfill_wait > > 58 activating+undersized+degraded+remapped > > 27 activating > > 22 active+undersized+degraded+remapped+backfilling > > 6 active+remapped+backfilling > > 1 active+recovery_wait+degraded > > > > > > If it's just a matter to wait for the system to complete the recovery > > it's fine, I'll deal with that, but I was wondendering if there is a > > more suble problem here. > > > > OK, I'll wait for the recovery to complete and see what happens, thanks. > > The blocked MDS might be caused by the 'activating' PGs. Do you have a > warning about too much PGs per OSD? If that is the case, > activating/creating/peering/whatever on the affected OSDs is blocked, > which leads to blocked requests etc. > > You can resolve this be increasing the number of allowed PGs per OSD > ('mon_max_pg_per_osd'). AFAIK it needs to be set for mon, mgr and osd > instances. There was also been some discussion about this setting on the > mailing list in the last weeks. > > Regards, > Burkhard > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly
Hi, we have several times a day different OSDs running Luminous 12.2.2 and Bluestore crashing with errors like this: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal 2018-01-30 13:45:28.440883 7f1e193cbd00 -1 osd.2 107082 log_to_monitors {default=true} /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned int)' thread 7f1dfd734700 time 2018-01-30 13:45:29.498133 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 12819: FAILED assert(obc) ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x556c6df51550] 2: (PrimaryLogPG::hit_set_trim(std::unique_ptrstd::default_delete >&, unsigned int)+0x3b6) [0x556c6db5e106] 3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7] 4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389) [0x556c6db78d39] 5: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa] 6: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9) [0x556c6d9c0899] 7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr const&)+0x57) [0x556c6dc38897] 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e] 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x556c6df57069] 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556c6df59000] 11: (()+0x7e25) [0x7f1e16c17e25] 12: (clone()+0x6d) [0x7f1e15d0b34d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2018-01-30 13:45:29.505317 7f1dfd734700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned int)' thread 7f1dfd734700 time 2018-01-30 13:45:29.498133 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 12819: FAILED assert(obc) ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x556c6df51550] 2: (PrimaryLogPG::hit_set_trim(std::unique_ptrstd::default_delete >&, unsigned int)+0x3b6) [0x556c6db5e106] 3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7] 4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389) [0x556c6db78d39] 5: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa] 6: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9) [0x556c6d9c0899] 7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr const&)+0x57) [0x556c6dc38897] 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e] 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x556c6df57069] 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556c6df59000] 11: (()+0x7e25) [0x7f1e16c17e25] 12: (clone()+0x6d) [0x7f1e15d0b34d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Is it a known issue? How can we fix that? Thanks, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly
Hi Greg, many thanks. This is a new cluster created initially with luminous 12.2.0. I'm not sure the instructions on jewel really apply on my case too, and all the machines have ntp enabled, but I'll have a look, many thanks for the link. All machines are set to CET, although I'm running over docker containers which are using UTC internally, but they are all consistent. At the moment, after setting 5 of the osds out the cluster resumed, and now I'm recreating those osds to be on the safe side. Thanks, Alessandro Il 31/01/18 19:26, Gregory Farnum ha scritto: On Tue, Jan 30, 2018 at 5:49 AM Alessandro De Salvo <mailto:alessandro.desa...@roma1.infn.it>> wrote: Hi, we have several times a day different OSDs running Luminous 12.2.2 and Bluestore crashing with errors like this: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal 2018-01-30 13:45:28.440883 7f1e193cbd00 -1 osd.2 107082 log_to_monitors {default=true} /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned int)' thread 7f1dfd734700 time 2018-01-30 13:45:29.498133 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 12819: FAILED assert(obc) ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x556c6df51550] 2: (PrimaryLogPG::hit_set_trim(std::unique_ptr >&, unsigned int)+0x3b6) [0x556c6db5e106] 3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7] 4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389) [0x556c6db78d39] 5: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa] 6: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9) [0x556c6d9c0899] 7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr const&)+0x57) [0x556c6dc38897] 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e] 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x556c6df57069] 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556c6df59000] 11: (()+0x7e25) [0x7f1e16c17e25] 12: (clone()+0x6d) [0x7f1e15d0b34d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2018-01-30 13:45:29.505317 7f1dfd734700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned int)' thread 7f1dfd734700 time 2018-01-30 13:45:29.498133 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 12819: FAILED assert(obc) ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x556c6df51550] 2: (PrimaryLogPG::hit_set_trim(std::unique_ptr >&, unsigned int)+0x3b6) [0x556c6db5e106] 3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7] 4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389) [0x556c6db78d39] 5: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa] 6: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9) [0x556c6d9c0899] 7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr const&)+0x57) [0x556c6dc38897] 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e] 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x556c6df57069] 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556c6df59000] 11: (()+0x7e25) [0x7f1e16c17e25] 12: (clone()+0x6d) [0x7f1e15d0b34d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Is it a known issue? How can we fix that? Hmm, it looks a lot like http://tracker.ceph.com/issues/19185, but that wasn't suppo
[ceph-users] MDS damaged
Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269 down:damaged mds.1 :6800/830539001 down:damaged mds.0 :6800/4080298733 down:damaged Whenever I try to force the repaired state with ceph mds repaired : I get something like this in the MDS logs: 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) error getting journal off disk 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x201: (5) Input/output error Any attempt of running the journal export results in errors, like this one: cephfs-journal-tool --rank=cephfs:0 journal export backup.bin Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 Header 200. is unreadable 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados` Same happens for recover_dentries cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 200. is unreadable Errors: 0 Is there something I could try to do to have the cluster back? I was able to dump the contents of the metadata pool with rados export -p cephfs_metadata and I'm currently trying the procedure described in http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery but I'm not sure if it will work as it's apparently doing nothing at the moment (maybe it's just very slow). Any help is appreciated, thanks! Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS damaged
Hi Gregory, thanks for the reply. I have the dump of the metadata pool, but I'm not sure what to check there. Is it what you mean? The cluster was operational until today at noon, when a full restart of the daemons was issued, like many other times in the past. I was trying to issue the repaired command to get a real error in the logs, but it was apparently not the case. Thanks, Alessandro Il 11/07/18 18:22, Gregory Farnum ha scritto: Have you checked the actual journal objects as the "journal export" suggested? Did you identify any actual source of the damage before issuing the "repaired" command? What is the history of the filesystems on this cluster? On Wed, Jul 11, 2018 at 8:10 AM Alessandro De Salvo <mailto:alessandro.desa...@roma1.infn.it>> wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269 down:damaged mds.1 :6800/830539001 down:damaged mds.0 :6800/4080298733 down:damaged Whenever I try to force the repaired state with ceph mds repaired : I get something like this in the MDS logs: 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) error getting journal off disk 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x201: (5) Input/output error Any attempt of running the journal export results in errors, like this one: cephfs-journal-tool --rank=cephfs:0 journal export backup.bin Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 Header 200. is unreadable 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados` Same happens for recover_dentries cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 200. is unreadable Errors: 0 Is there something I could try to do to have the cluster back? I was able to dump the contents of the metadata pool with rados export -p cephfs_metadata and I'm currently trying the procedure described in http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery but I'm not sure if it will work as it's apparently doing nothing at the moment (maybe it's just very slow). Any help is appreciated, thanks! Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS damaged
Hi John, in fact I get an I/O error by hand too: rados get -p cephfs_metadata 200. 200. error getting cephfs_metadata/200.: (5) Input/output error Can this be recovered someway? Thanks, Alessandro Il 11/07/18 18:33, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269 down:damaged mds.1 :6800/830539001 down:damaged mds.0 :6800/4080298733 down:damaged Whenever I try to force the repaired state with ceph mds repaired : I get something like this in the MDS logs: 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) error getting journal off disk 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x201: (5) Input/output error An EIO reading the journal header is pretty scary. The MDS itself probably can't tell you much more about this: you need to dig down into the RADOS layer. Try reading the 200. object (that happens to be the rank 0 journal header, every CephFS filesystem should have one) using the `rados` command line tool. John Any attempt of running the journal export results in errors, like this one: cephfs-journal-tool --rank=cephfs:0 journal export backup.bin Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 Header 200. is unreadable 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados` Same happens for recover_dentries cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 200. is unreadable Errors: 0 Is there something I could try to do to have the cluster back? I was able to dump the contents of the metadata pool with rados export -p cephfs_metadata and I'm currently trying the procedure described in http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery but I'm not sure if it will work as it's apparently doing nothing at the moment (maybe it's just very slow). Any help is appreciated, thanks! Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS damaged
OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23: 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.35: 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.18: 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head So, basically the same error everywhere. I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may help. No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and no disk problems anywhere. No relevant errors in syslogs, the hosts are just fine. I cannot exclude an error on the RAID controllers, but 2 of the OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. Thanks, Alessandro Il 11/07/18 18:56, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo wrote: Hi John, in fact I get an I/O error by hand too: rados get -p cephfs_metadata 200. 200. error getting cephfs_metadata/200.: (5) Input/output error Next step would be to go look for corresponding errors on your OSD logs, system logs, and possibly also check things like the SMART counters on your hard drives for possible root causes. John Can this be recovered someway? Thanks, Alessandro Il 11/07/18 18:33, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269 down:damaged mds.1 :6800/830539001 down:damaged mds.0 :6800/4080298733 down:damaged Whenever I try to force the repaired state with ceph mds repaired : I get something like this in the MDS logs: 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) error getting journal off disk 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x201: (5) Input/output error An EIO reading the journal header is pretty scary. The MDS itself probably can't tell you much more about this: you need to dig down into the RADOS layer. Try reading the 200. object (that happens to be the rank 0 journal header, every CephFS filesystem should have one) using the `rados` command line tool. John Any attempt of running the journal export results in errors, like this one: cephfs-journal-tool --rank=cephfs:0 journal export backup.bin Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 Header 200. is unreadable 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados` Same happens for recover_dentries cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 200. is unreadable Errors: 0 Is there something I could try to do to have the cluster back? I was able to dump the contents of the metadata pool with rados export -p cephfs_metadata and I'm currently trying the procedure described in http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery but I'm not sure if it will work as it's apparently doing nothing at the moment (maybe it's just very slow). Any help is appreciated, thanks! Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS damaged
> Il giorno 11 lug 2018, alle ore 23:25, Gregory Farnum ha > scritto: > >> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo >> wrote: >> OK, I found where the object is: >> >> >> ceph osd map cephfs_metadata 200. >> osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg >> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) >> >> >> So, looking at the osds 23, 35 and 18 logs in fact I see: >> >> >> osd.23: >> >> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on >> 10:292cf221:::200.:head >> >> >> osd.35: >> >> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on >> 10:292cf221:::200.:head >> >> >> osd.18: >> >> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on >> 10:292cf221:::200.:head >> >> >> So, basically the same error everywhere. >> >> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may >> help. >> >> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and >> no disk problems anywhere. No relevant errors in syslogs, the hosts are >> just fine. I cannot exclude an error on the RAID controllers, but 2 of >> the OSDs with 10.14 are on a SAN system and one on a different one, so I >> would tend to exclude they both had (silent) errors at the same time. > > That's fairly distressing. At this point I'd probably try extracting the > object using ceph-objectstore-tool and seeing if it decodes properly as an > mds journal. If it does, you might risk just putting it back in place to > overwrite the crc. > Ok, I guess I know how to extract the object from a given OSD, but I’m not sure how to check if it decodes as mds journal, is there a procedure for this? However if trying to export all the sophie’s from all the osd brings the same object md5sum I believe I can try directly to overwrite the object, as it cannot go worse than this, correct? Also I’d need a confirmation of the procedure to follow in this case, when possibly all copies of an object are wrong, I would try the following: - set the noout - bring down all the osd where the object is present - replace the object in all stores - bring the osds up again - unset the noout Correct? > However, I'm also quite curious how it ended up that way, with a checksum > mismatch but identical data (and identical checksums!) across the three > replicas. Have you previously done some kind of scrub repair on the metadata > pool? No, at least not on this pg, I only remember of a repair but it was on a different pool. > Did the PG perhaps get backfilled due to cluster changes? That might be the case, as we have to reboot the osds sometimes when they crash. Also, yesterday we rebooted all of them, but this happens always in sequence, one by one, not all at the same time. Thanks for the help, Alessandro > -Greg > >> >> Thanks, >> >> >> Alessandro >> >> >> >> Il 11/07/18 18:56, John Spray ha scritto: >> > On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo >> > wrote: >> >> Hi John, >> >> >> >> in fact I get an I/O error by hand too: >> >> >> >> >> >> rados get -p cephfs_metadata 200. 200. >> >> error getting cephfs_metadata/200.: (5) Input/output error >> > Next step would be to go look for corresponding errors on your OSD >> > logs, system logs, and possibly also check things like the SMART >> > counters on your hard drives for possible root causes. >> > >> > John >> > >> > >> > >> >> >> >> Can this be recovered someway? >> >> >> >> Thanks, >> >> >> >> >> >> Alessandro >> >> >> >> >> >> Il 11/07/18 18:33, John Spray ha scritto: >> >>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo >> >>> wrote: >> >>>> Hi, >> >>>> >> >>>> after the upgrade to luminous 12.2.6 today, all our MDSes have been >> >>>> marked as damaged. Trying to restart the instances only result in >> >>>>
Re: [ceph-users] MDS damaged
Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23: 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.35: 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.18: 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head So, basically the same error everywhere. I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may help. No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and no disk problems anywhere. No relevant errors in syslogs, the hosts are just fine. I cannot exclude an error on the RAID controllers, but 2 of the OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. That's fairly distressing. At this point I'd probably try extracting the object using ceph-objectstore-tool and seeing if it decodes properly as an mds journal. If it does, you might risk just putting it back in place to overwrite the crc. Wouldn't it be easier to scrub repair the PG to fix the crc? this is what I already instructed the cluster to do, a deep scrub, but I'm not sure it could repair in case all replicas are bad, as it seems to be the case. Alessandro, did you already try a deep-scrub on pg 10.14? I'm waiting for the cluster to do that, I've sent it earlier this morning. I expect it'll show an inconsistent object. Though, I'm unsure if repair will correct the crc given that in this case *all* replicas have a bad crc. Exactly, this is what I wonder too. Cheers, Alessandro --Dan However, I'm also quite curious how it ended up that way, with a checksum mismatch but identical data (and identical checksums!) across the three replicas. Have you previously done some kind of scrub repair on the metadata pool? Did the PG perhaps get backfilled due to cluster changes? -Greg Thanks, Alessandro Il 11/07/18 18:56, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo wrote: Hi John, in fact I get an I/O error by hand too: rados get -p cephfs_metadata 200. 200. error getting cephfs_metadata/200.: (5) Input/output error Next step would be to go look for corresponding errors on your OSD logs, system logs, and possibly also check things like the SMART counters on your hard drives for possible root causes. John Can this be recovered someway? Thanks, Alessandro Il 11/07/18 18:33, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269 down:damaged mds.1 :6800/830539001 down:damaged mds.0 :6800/4080298733 down:damaged Whenever I try to force the repaired state with ceph mds repaired : I get something like this in the MDS logs: 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) error getting journal off disk 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x201: (5) Input/output error An EIO reading the journal header is pretty scary. The MDS itself probably can't tell you much more about this: you need to dig down into the RADOS layer. Try reading the 200. object (that happens to be the rank 0 journal header, every CephFS filesystem should have one) using the `rados` command line tool. John Any attempt of running the journal export results in errors, like this one: cephfs-journal-tool --rank=cephfs:0 journal export backup.bin Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 Header 200. is unreadable 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados` Same happens for recover_dentries cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 200. is unreadable Errors: 0
Re: [ceph-users] MDS damaged
Il 12/07/18 11:20, Alessandro De Salvo ha scritto: Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23: 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.35: 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.18: 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head So, basically the same error everywhere. I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may help. No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and no disk problems anywhere. No relevant errors in syslogs, the hosts are just fine. I cannot exclude an error on the RAID controllers, but 2 of the OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. That's fairly distressing. At this point I'd probably try extracting the object using ceph-objectstore-tool and seeing if it decodes properly as an mds journal. If it does, you might risk just putting it back in place to overwrite the crc. Wouldn't it be easier to scrub repair the PG to fix the crc? this is what I already instructed the cluster to do, a deep scrub, but I'm not sure it could repair in case all replicas are bad, as it seems to be the case. I finally managed (with the help of Dan), to perform the deep-scrub on pg 10.14, but the deep scrub did not detect anything wrong. Also trying to repair 10.14 has no effect. Still, trying to access the object I get in the OSDs: 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds like a bug. Can I force the repair someway? Thanks, Alessandro Alessandro, did you already try a deep-scrub on pg 10.14? I'm waiting for the cluster to do that, I've sent it earlier this morning. I expect it'll show an inconsistent object. Though, I'm unsure if repair will correct the crc given that in this case *all* replicas have a bad crc. Exactly, this is what I wonder too. Cheers, Alessandro --Dan However, I'm also quite curious how it ended up that way, with a checksum mismatch but identical data (and identical checksums!) across the three replicas. Have you previously done some kind of scrub repair on the metadata pool? Did the PG perhaps get backfilled due to cluster changes? -Greg Thanks, Alessandro Il 11/07/18 18:56, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo wrote: Hi John, in fact I get an I/O error by hand too: rados get -p cephfs_metadata 200. 200. error getting cephfs_metadata/200.: (5) Input/output error Next step would be to go look for corresponding errors on your OSD logs, system logs, and possibly also check things like the SMART counters on your hard drives for possible root causes. John Can this be recovered someway? Thanks, Alessandro Il 11/07/18 18:33, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269 down:damaged mds.1 :6800/830539001 down:damaged mds.0 :6800/4080298733 down:damaged Whenever I try to force the repaired state with ceph mds repaired : I get something like this in the MDS logs: 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) error getting journal off disk 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log [ERR] : Error recovering journal 0x201: (5) Input/output error An EIO reading the journal header is pretty scary. The MDS itself probably can't tell you much more about this: you need to dig down into the RADOS layer. Try reading the 200. object (that happens to be the rank 0 journal header, every CephFS filesystem should have one) u
Re: [ceph-users] MDS damaged
Unfortunately yes, all the OSDs were restarted a few times, but no change. Thanks, Alessandro Il 12/07/18 15:55, Paul Emmerich ha scritto: This might seem like a stupid suggestion, but: have you tried to restart the OSDs? I've also encountered some random CRC errors that only showed up when trying to read an object, but not on scrubbing, that magically disappeared after restarting the OSD. However, in my case it was clearly related to https://tracker.ceph.com/issues/22464 which doesn't seem to be the issue here. Paul 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo <mailto:alessandro.desa...@roma1.infn.it>>: Il 12/07/18 11:20, Alessandro De Salvo ha scritto: Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum mailto:gfar...@redhat.com>> wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo mailto:alessandro.desa...@roma1.infn.it>> wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23: 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.35: 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.18: 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head So, basically the same error everywhere. I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may help. No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and no disk problems anywhere. No relevant errors in syslogs, the hosts are just fine. I cannot exclude an error on the RAID controllers, but 2 of the OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. That's fairly distressing. At this point I'd probably try extracting the object using ceph-objectstore-tool and seeing if it decodes properly as an mds journal. If it does, you might risk just putting it back in place to overwrite the crc. Wouldn't it be easier to scrub repair the PG to fix the crc? this is what I already instructed the cluster to do, a deep scrub, but I'm not sure it could repair in case all replicas are bad, as it seems to be the case. I finally managed (with the help of Dan), to perform the deep-scrub on pg 10.14, but the deep scrub did not detect anything wrong. Also trying to repair 10.14 has no effect. Still, trying to access the object I get in the OSDs: 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds like a bug. Can I force the repair someway? Thanks, Alessandro Alessandro, did you already try a deep-scrub on pg 10.14? I'm waiting for the cluster to do that, I've sent it earlier this morning. I expect it'll show an inconsistent object. Though, I'm unsure if repair will correct the crc given that in this case *all* replicas have a bad crc. Exactly, this is what I wonder too. Cheers, Alessandro --Dan However, I'm also quite curious how it ended up that way, with a checksum mismatch b
Re: [ceph-users] MDS damaged
Some progress, and more pain... I was able to recover the 200. using the ceph-objectstore-tool for one of the OSDs (all identical copies) but trying to re-inject it just with rados put was giving no error while the get was still giving the same I/O error. So the solution was to rm the object and the put it again, that worked. However, after restarting one of the MDSes and seeting it to repaired, I've hit another, similar problem: 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : error reading table object 'mds0_inotable' -5 ((5) Input/output error) Can I safely try to do the same as for object 200.? Should I check something before trying it? Again, checking the copies of the object, they have identical md5sums on all the replicas. Thanks, Alessandro Il 12/07/18 16:46, Alessandro De Salvo ha scritto: Unfortunately yes, all the OSDs were restarted a few times, but no change. Thanks, Alessandro Il 12/07/18 15:55, Paul Emmerich ha scritto: This might seem like a stupid suggestion, but: have you tried to restart the OSDs? I've also encountered some random CRC errors that only showed up when trying to read an object, but not on scrubbing, that magically disappeared after restarting the OSD. However, in my case it was clearly related to https://tracker.ceph.com/issues/22464 which doesn't seem to be the issue here. Paul 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo <mailto:alessandro.desa...@roma1.infn.it>>: Il 12/07/18 11:20, Alessandro De Salvo ha scritto: Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum mailto:gfar...@redhat.com>> wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo mailto:alessandro.desa...@roma1.infn.it>> wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23: 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.35: 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.18: 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head So, basically the same error everywhere. I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may help. No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and no disk problems anywhere. No relevant errors in syslogs, the hosts are just fine. I cannot exclude an error on the RAID controllers, but 2 of the OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. That's fairly distressing. At this point I'd probably try extracting the object using ceph-objectstore-tool and seeing if it decodes properly as an mds journal. If it does, you might risk just putting it back in place to overwrite the crc. Wouldn't it be easier to scrub repair the PG to fix the crc? this is what I already instructed the cluster to do, a deep scrub, but I'm not sure it could repair in case all replicas are bad, as it seems to be the case. I finally managed (with the help of Dan), to perform the deep-scrub on pg 10.14, but the deep scrub did not detect anything wrong. Also trying to repair 10.14 has no effect. Still, trying to access the object I get in the OSDs: 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR]
Re: [ceph-users] MDS damaged
Thanks all, 100..inode, mds_snaptable and 1..inode were not corrupted, so I left them as they were. I have re-injected all the bad objects, for all mdses (2 per filesysytem) and all filesystems I had (2), and after setiing the mdses as repaired my filesystems are back! However, I cannot reduce the number of mdses anymore, I was used to do that with e.g.: ceph fs set cephfs max_mds 1 Trying this with 12.2.6 has apparently no effect, I am left with 2 active mdses. Is this another bug? Thanks, Alessandro Il 13/07/18 15:54, Yan, Zheng ha scritto: On Thu, Jul 12, 2018 at 11:39 PM Alessandro De Salvo wrote: Some progress, and more pain... I was able to recover the 200. using the ceph-objectstore-tool for one of the OSDs (all identical copies) but trying to re-inject it just with rados put was giving no error while the get was still giving the same I/O error. So the solution was to rm the object and the put it again, that worked. However, after restarting one of the MDSes and seeting it to repaired, I've hit another, similar problem: 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : error reading table object 'mds0_inotable' -5 ((5) Input/output error) Can I safely try to do the same as for object 200.? Should I check something before trying it? Again, checking the copies of the object, they have identical md5sums on all the replicas. Yes, It should be safe. you also need to the same for several other objects. full object list are: 200. mds0_inotable 100..inode mds_snaptable 1..inode The first three objects are per-mds-rank. Ff you have enabled multi-active mds, you also need to update objects of other ranks. For mds.1, object names are 201., mds1_inotable and 101..inode. Thanks, Alessandro Il 12/07/18 16:46, Alessandro De Salvo ha scritto: Unfortunately yes, all the OSDs were restarted a few times, but no change. Thanks, Alessandro Il 12/07/18 15:55, Paul Emmerich ha scritto: This might seem like a stupid suggestion, but: have you tried to restart the OSDs? I've also encountered some random CRC errors that only showed up when trying to read an object, but not on scrubbing, that magically disappeared after restarting the OSD. However, in my case it was clearly related to https://tracker.ceph.com/issues/22464 which doesn't seem to be the issue here. Paul 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo : Il 12/07/18 11:20, Alessandro De Salvo ha scritto: Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23: 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.35: 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head osd.18: 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:head So, basically the same error everywhere. I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may help. No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and no disk problems anywhere. No relevant errors in syslogs, the hosts are just fine. I cannot exclude an error on the RAID controllers, but 2 of the OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. That's fairly distressing. At this point I'd probably try extracting the object using ceph-objectstore-tool and seeing if it decodes properly as an mds journal. If it does, you might risk just putting it back in place to overwrite the crc. Wouldn't it be easier to scrub repair the PG to fix the crc? this is what I already instructed the cluster to do, a deep scrub, but I'm not sure it could repair in case all replicas are bad, as it seems to be the case. I finally managed (with the help of Dan), to perform the deep-scrub on pg 10.14, but the deep scrub did not detect anything wrong. Also trying to repair 10.14 has no effect. Still, trying to access the object I get in the OSDs: 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.:h
Re: [ceph-users] MDS damaged
Hi Dan, you're right, I was following the mimic instructions (which indeed worked on my mimic testbed), but luminous is different and I missed the additional step. Works now, thanks! Alessandro Il 13/07/18 17:51, Dan van der Ster ha scritto: On Fri, Jul 13, 2018 at 4:07 PM Alessandro De Salvo wrote: However, I cannot reduce the number of mdses anymore, I was used to do that with e.g.: ceph fs set cephfs max_mds 1 Trying this with 12.2.6 has apparently no effect, I am left with 2 active mdses. Is this another bug? Are you following this procedure? http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks i.e. you need to deactivate after decreasing max_mds. (Mimic does this automatically, OTOH). -- dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster
Hi, I'm trying to migrate a cephfs data pool to a different one in order to reconfigure with new pool parameters. I've found some hints but no specific documentation to migrate pools. I'm currently trying with rados export + import, but I get errors like these: Write #-9223372036854775808::::11e1007.:head# omap_set_header failed: (95) Operation not supported The command I'm using is the following: rados export -p cephfs_data | rados import -p cephfs_data_new - So, I have a few questions: 1) would it work to swap the cephfs data pools by renaming them while the fs cluster is down? 2) how can I copy the old data pool into a new one without errors like the ones above? 3) plain copy from a fs to another one would also work, but I didn't find a way to tell the ceph fuse clients how to mount different filesystems in the same cluster, any documentation on it? 4) even if I found a way to mount via fuse different filesystems belonging to the same cluster, is this feature stable enough or is it still super-experimental? Thanks, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster
Hi, Il 13/06/18 14:40, Yan, Zheng ha scritto: On Wed, Jun 13, 2018 at 7:06 PM Alessandro De Salvo wrote: Hi, I'm trying to migrate a cephfs data pool to a different one in order to reconfigure with new pool parameters. I've found some hints but no specific documentation to migrate pools. I'm currently trying with rados export + import, but I get errors like these: Write #-9223372036854775808::::11e1007.:head# omap_set_header failed: (95) Operation not supported The command I'm using is the following: rados export -p cephfs_data | rados import -p cephfs_data_new - So, I have a few questions: 1) would it work to swap the cephfs data pools by renaming them while the fs cluster is down? 2) how can I copy the old data pool into a new one without errors like the ones above? This won't work as you expected. some cephfs metadata records ID of data pool. This is was suspecting too, hence the question, so thanks for confirming it. Basically, once a cephfs filesystem is created the pool and structure are immutable. This is not good, though. 3) plain copy from a fs to another one would also work, but I didn't find a way to tell the ceph fuse clients how to mount different filesystems in the same cluster, any documentation on it? ceph-fuse /mnt/ceph --client_mds_namespace=cephfs_name In the meantime I also found the same option for fuse and tried it. It works with fuse, but it seems it's not possible to export via nfs-ganesha multiple filesystems. Anyone tried it? 4) even if I found a way to mount via fuse different filesystems belonging to the same cluster, is this feature stable enough or is it still super-experimental? very stable Very good! Thanks, Alessandro Thanks, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster
Hi, Il 14/06/18 06:13, Yan, Zheng ha scritto: On Wed, Jun 13, 2018 at 9:35 PM Alessandro De Salvo wrote: Hi, Il 13/06/18 14:40, Yan, Zheng ha scritto: On Wed, Jun 13, 2018 at 7:06 PM Alessandro De Salvo wrote: Hi, I'm trying to migrate a cephfs data pool to a different one in order to reconfigure with new pool parameters. I've found some hints but no specific documentation to migrate pools. I'm currently trying with rados export + import, but I get errors like these: Write #-9223372036854775808::::11e1007.:head# omap_set_header failed: (95) Operation not supported The command I'm using is the following: rados export -p cephfs_data | rados import -p cephfs_data_new - So, I have a few questions: 1) would it work to swap the cephfs data pools by renaming them while the fs cluster is down? 2) how can I copy the old data pool into a new one without errors like the ones above? This won't work as you expected. some cephfs metadata records ID of data pool. This is was suspecting too, hence the question, so thanks for confirming it. Basically, once a cephfs filesystem is created the pool and structure are immutable. This is not good, though. 3) plain copy from a fs to another one would also work, but I didn't find a way to tell the ceph fuse clients how to mount different filesystems in the same cluster, any documentation on it? ceph-fuse /mnt/ceph --client_mds_namespace=cephfs_name In the meantime I also found the same option for fuse and tried it. It works with fuse, but it seems it's not possible to export via nfs-ganesha multiple filesystems. put client_mds_namespace option to client section of ceph.conf (the machine the run ganesha) Yes, that would work but then I need a (set of) exporter(s) for every cephfs filesystem. That sounds reasonable though, as it's the same situation as for the mds services. Thanks for the hint, Alessandro Anyone tried it? 4) even if I found a way to mount via fuse different filesystems belonging to the same cluster, is this feature stable enough or is it still super-experimental? very stable Very good! Thanks, Alessandro Thanks, Alessandro ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com