On Wed, Oct 14, 2015 at 7:20 PM, Francois Lafont <flafdiv...@free.fr> wrote: > Hi, > > On 14/10/2015 06:45, Gregory Farnum wrote: > >>> Ok, however during my tests I had been careful to replace the correct >>> file by a bad file with *exactly* the same size (the content of the >>> file was just a little string and I have changed it by a string with >>> exactly the same size). I had been careful to undo the mtime update >>> too (I had restore the mtime of the file before the change). Despite >>> this, the "repair" command worked well. Tested twice: 1. with the change >>> on the primary OSD and 2. on the secondary OSD. And I was surprised >>> because I though the test 1. (in primary OSD) will fail. >> >> Hm. I'm a little confused by that, actually. Exactly what was the path >> to the files you changed, and do you have before-and-after comparisons >> on the content and metadata? > > I didn't remember exactly the process I have made so I have just retried > today. Here is my process. I have a healthy cluster with 3 nodes (Ubuntu > Trusty) and I have ceph Hammer (version 0.94.3). I have mounted cephfs on > /mnt on one of the nodes. > > ~# cat /mnt/file.txt # yes it's a little file. ;) > 123456 > > ~# ls -i /mnt/file.txt > 1099511627776 /mnt/file.txt > > ~# printf "%x\n" 1099511627776 > 10000000000 > > ~# rados -p data ls - | grep 10000000000 > 10000000000.00000000 > > I have the name of the object mapped to my "file.txt". > > ~# ceph osd map data 10000000000.00000000 > osdmap e76 pool 'data' (3) object '10000000000.00000000' -> pg 3.f0b56f30 > (3.30) -> up ([1,2], p1) acting ([1,2], p1) > > So my object is in the primary OSD OSD-1 and in the secondary OSD OSD-2. > So I open a terminal in the node which hosts the primary OSD OSD-1 and > then: > > ~# cat > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > 123456 > > ~# ll > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > -rw-r--r-- 1 root root 7 Oct 15 03:46 > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > > Now, I change the content with this script called "change_content.sh" to > preserve the mtime after the change: > > ----------------------------- > #!/bin/sh > > f="$1" > f_tmp="${f}.tmp" > content="$2" > cp --preserve=all "$f" "$f_tmp" > echo "$content" >"$f" > touch -r "$f_tmp" "$f" # to restore the mtime after the change > rm "$f_tmp" > ----------------------------- > > So, let's go, I replace the content by a new content with exactly > the same size (ie "ABCDEF" in this example): > > ~# ./change_content.sh > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > ABCDEF > > ~# cat > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > ABCDEF > > ~# ll > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > -rw-r--r-- 1 root root 7 Oct 15 03:46 > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > > Now, the secondary OSD contains the good version of the object and > the primary a bad version. Now, I launch a "ceph pg repair": > > ~# ceph pg repair 3.30 > instructing pg 3.30 on osd.1 to repair > > # I'm in the primary OSD and the file below has been repaired correctly. > ~# cat > /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 > 123456 > > As you can see, the repair command has worked well. > Maybe my little is too trivial?
Hmm, maybe David has some idea. > >>> Greg, if I understand you well, I shouldn't have too much confidence in >>> the "ceph pg repair" command, is it correct? >>> >>> But, if yes, what is the good way to repair a PG? >> >> Usually what we recommend is for those with 3 copies to find the >> differing copy, delete it, and run a repair — then you know it'll >> repair from a good version. But yeah, it's not as reliable as we'd >> like it to be on its own. > > I would like to be sure to well understand. The process could be (in > the case where size == 3): > > 1. In each of the 3 OSDs where my object is put: > > md5sum /var/lib/ceph/osd/ceph-$id/current/${pg_id}_head/${object_name}* > > 2. Normally, I will have the same result in 2 OSDs, and in the other > OSD, let's call it OSD-X, the result will be different. So, in the OSD-X, > I run: > > rm /var/lib/ceph/osd/ceph-$id/current/${pg_id}_head/${object_name}* > > 3. And now I can run the "ceph pg repair" command without risk: > > ceph pg repair $pg_id > > Is it the correct process? Yes, I would expect this to work. -Greg _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com