hello, guys, I found data lost when flattening a cloned image on giant(0.87.2). The problem can be easily reproduced by runing the following script:ceph osd pool create wuxingyi 1 1rbd create --image-format 2 wuxingyi/disk1.img --size 8#writing "FOOBAR" at offset 0python writetooffset.py disk1.img 0 FOOBARrbd snap create wuxingyi/disk1.img@SNAPSHOTrbd snap protect wuxingyi/disk1.img@SNAPSHOTecho "start cloing"rbd clone wuxingyi/disk1.img@SNAPSHOT wuxingyi/CLONEIMAGE#writing "WUXINGYI" at offset 4M of cloned imagepython writetooffset.py CLONEIMAGE $((4*1048576)) WUXINGYIrbd snap create wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT #modify at offset 4M of cloned imagepython writetooffset.py CLONEIMAGE $((4*1048576)) HEHEHEHE echo "start flattening CLONEIMAGE"rbd flatten wuxingyi/CLONEIMAGEecho "before rollback"rbd export wuxingyi/CLONEIMAGE && hexdump -C CLONEIMAGErm CLONEIMAGE -f rbd snap rollback wuxingyi/CLONEIMAGE@CLONEDSNAPSHOTecho "after rollback"rbd export wuxingyi/CLONEIMAGE && hexdump -C CLONEIMAGErm CLONEIMAGE -fwhere writetooffset.py is a simple python script writing specific data to the specific offset of the image:#!/usr/bin/python#coding=utf-8import sysimport rbdimport radoscluster = rados.Rados(conffile='/etc/ceph/ceph.conf')cluster.connect()ioctx = cluster.open_ioctx('wuxingyi')rbd_inst = rbd.RBD()image=rbd.Image(ioctx, sys.argv[1])image.write(sys.argv[3], int(sys.argv[2]))The output is something like:before rollbackExporting image: 100% complete...done.00000000 46 4f 4f 42 41 52 00 00 00 00 00 00 00 00 00 00 |FOOBAR..........|00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|*00400000 48 45 48 45 48 45 48 45 00 00 00 00 00 00 00 00 |HEHEHEHE........|00400010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|*00800000Rolling back to snapshot: 100% complete...done.after rollbackExporting image: 100% complete...done.00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|*00400000 57 55 58 49 4e 47 59 49 00 00 00 00 00 00 00 00 |WUXINGYI........|00400010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|*00800000We can easily fount that the first object of the image is definitely lost, and I found the data loss is happened when flattening, there is only a "head" version of the first object, actually a "snapid" version of the object should also be created and writed when flattening.But when running this scripts on upstream code, I cannot hit this problem. I look through the upstream code but could not find which commit fixes this bug. I also found the whole state machine dealing with RBD layering changed a lot since giant release.Could you please give me some hints on which commits should I backport?Thanks~~~~
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com