Hi Frank > I'm not sure if my hypothesis can be correct. Ceph sends an acknowledge of a > write only after all copies are on disk. In other words, if PGs end up on > different versions after a power outage, one always needs to roll back. Since > you have two healthy OSDs in the PG and the PG is active (successfully > peered), it might just be a broken disk and read/write errors. I would focus > on that.
I tried to revert the PG as follows: # ceph pg 3.b query | grep version "last_user_version": 2263481, "version": "4825'2264303", "last_user_version": 2263481, "version": "4825'2264301", "last_user_version": 2263481, "version": "4825'2264301", ceph pg 3.b list_unfound { "num_missing": 0, "num_unfound": 0, "objects": [], "more": false} # ceph pg 3.b mark_unfound_lost revertpg has no unfound objects # ceph pg 3.b revertInvalid command: revert not in querypg <pgid> query : show details of a specific pgError EINVAL: invalid command How to revert/rollback a PG? > Another question, do you have write caches enabled (disk cache and controller > cache)? This is know to cause problems on power outages and also degraded > performance with ceph. You should check and disable any caches if necessary. No. HDD is directly connected to motherboard. Thank you Sagara _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io