On 09/15/2016 01:24 PM, Ilya Dryomov wrote:
> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov
> <[email protected]> wrote:
>>
>>
>> On 09/15/2016 09:22 AM, Nikolay Borisov wrote:
>>>
>>>
>>> On 09/14/2016 05:53 PM, Ilya Dryomov wrote:
>>>> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov <[email protected]> wrote:
>>>>>
>>>>>
>>>>> On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
>>>>>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 09/14/2016 09:55 AM, Adrian Saul wrote:
>>>>>>>>
>>>>>>>> I found I could ignore the XFS issues and just mount it with the
>>>>>>>> appropriate options (below from my backup scripts):
>>>>>>>>
>>>>>>>> #
>>>>>>>> # Mount with nouuid (conflicting XFS) and norecovery (ro
>>>>>>>> snapshot)
>>>>>>>> #
>>>>>>>> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then
>>>>>>>> echo "FAILED: Unable to mount snapshot $DATESTAMP of
>>>>>>>> $FS - cleaning up"
>>>>>>>> rbd unmap $SNAPDEV
>>>>>>>> rbd snap rm ${RBDPATH}@${DATESTAMP}
>>>>>>>> exit 3;
>>>>>>>> fi
>>>>>>>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>>>>>>>>
>>>>>>>> It's impossible without clones to do it without norecovery.
>>>>>>>
>>>>>>> But shouldn't freezing the fs and doing a snapshot constitute a "clean
>>>>>>> unmount" hence no need to recover on the next mount (of the snapshot) -
>>>>>>> Ilya?
>>>>>>
>>>>>> I *thought* it should (well, except for orphan inodes), but now I'm not
>>>>>> sure. Have you tried reproducing with loop devices yet?
>>>>>
>>>>> Here is what the checksum tests showed:
>>>>>
>>>>> fsfreeze -f /mountpoit
>>>>> md5sum /dev/rbd0
>>>>> f33c926373ad604da674bcbfbe6460c5 /dev/rbd0
>>>>> rbd snap create xx@xxx && rbd snap protect xx@xxx
>>>>> rbd map xx@xxx
>>>>> md5sum /dev/rbd1
>>>>> 6f702740281874632c73aeb2c0fcf34a /dev/rbd1
>>>>>
>>>>> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
>>>>> different, worrying.
>>>>
>>>> Sorry, for the filesystem device you should do
>>>>
>>>> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)
>>>>
>>>> to get what's actually on disk, so that it's apples to apples.
>>>
>>> root@alxc13:~# rbd showmapped |egrep "device|c11579"
>>> id pool image snap device
>>> 47 rbd c11579 - /dev/rbd47
>>> root@alxc13:~# fsfreeze -f /var/lxc/c11579
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s
>>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after
>>> freeze
>>> root@alxc13:~# rbd snap create rbd/c11579@snap_test
>>> root@alxc13:~# rbd map c11579@snap_test
>>> /dev/rbd1
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s
>>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of snapshot
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s
>>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of original
>>> device, not changed - GOOD
>>> root@alxc13:~# file -s /dev/rbd1
>>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge
>>> files)
>>> root@alxc13:~# fsfreeze -u /var/lxc/c11579
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s
>>> 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze checksum
>>> is different - OK
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s
>>> bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the checksum
>>> of the snapshot is different after unfreeze? BAD?
>>> root@alxc13:~# file -s /dev/rbd1
>>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery)
>>> (extents) (large files) (huge files)
>>> root@alxc13:~#
>>>
>>
>> And something even more peculiar - taking an md5sum some hours after the
>> above test produced this:
>>
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s
>> e68e41616489d41544cd873c73defb08 /dev/fd/63
>>
>> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't
>> recreated, just the same old snapshot. Is this normal?
>
> Hrm, I wonder if it missed a snapshot context update. Please pastebin
> entire dmesg for that boot.
The machine has been up more than 2 and the dmesg has been rewritten
several times for that time. Also the node is rather busy so there's
plenty of irrelevant stuff in the dmesg. Grepped for rbd1/0 and found no
strings containing them so it's unlikely you will get anything useful.
>
> Have those devices been remapped or alxc13 rebooted since then? If
> not, what's the output of
>
> $ rados -p rbd listwatchers $(rbd info c11579 | grep block_name_prefix
> | awk '{ print $2 }' | sed 's/rbd_data/rbd_header/')
watcher=xx.xxx.xxx.xx:0/3416829538 client.157729 cookie=673
watcher=xx.xxx.xxx.xx:0/3416829538 client.157729 cookie=676
>
> and can you check whether that snapshot is continuing to mutate as the
> image is mutated - freeze /var/lxc/c11579 again and check rbd47 and
> rbd1?
That would take a bit more time since it involves downtime to production
workloads.
Btw, are you on IRC in ceph/ceph-devel ?
>
> Thanks,
>
> Ilya
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com