Re: Snapshots on KVM corrupting disk images

Ivan Kudryavtsev Tue, 22 Jan 2019 18:16:51 -0800

I've met the situations when CLOUDSTACK+KVM+QCOW2+SNAPSHOTS led to
corrupted images, mostly in 4.3 and NFS, but I've thought that CS stops VM
just before it does the snapshot. At least the VM behavior when the VM
snapshot is created looks like it happens (freezing). That's why it looks
strange. But, in general, I agree, that the above bundle leads to data
corruption, especially when the storage is under IO pressure. We recommend
our customers avoiding running snapshots if possible for such a bundle.


ср, 23 янв. 2019 г. в 05:06, Wei ZHOU <ustcweiz...@gmail.com>:

> Hi Sean,
>
> The (recurring) volume snapshot on running vms should be disabled in
> cloudstack.
>
> According to some discussions (for example
> https://bugzilla.redhat.com/show_bug.cgi?id=920020), the image might be
> corrupted due to the concurrent read/write operations in volume snapshot
> (by qemu-img snapshot).
>
> ```
>
> qcow2 images must not be used in read-write mode from two processes at the
> same
> time. You can either have them opened either by one read-write process or
> by
> many read-only processes. Having one (paused) read-write process (the
> running
> VM) and additional read-only processes (copying out a snapshot with
> qemu-img)
> may happen to work in practice, but you're on your own and we won't give
> support for such attempts.
>
> ```
> The safe way to take a volume snapshot of running vm is
> (1) take a vm snapshot (vm will be paused)
> (2) then create a volume snapshot from the vm snapshot
>
> -Wei
>
>
>
> Sean Lair <sl...@ippathways.com> 于2019年1月22日周二 下午5:30写道：
>
> > Hi all,
> >
> > We had some instances where VM disks are becoming corrupted when using
> KVM
> > snapshots.  We are running CloudStack 4.9.3 with KVM on CentOS 7.
> >
> > The first time was when someone mass-enabled scheduled snapshots on a lot
> > of large number VMs and secondary storage filled up.  We had to restore
> all
> > those VM disks...  But believed it was just our fault with letting
> > secondary storage fill up.
> >
> > Today we had an instance where a snapshot failed and now the disk image
> is
> > corrupted and the VM can't boot.  here is the output of some commands:
> >
> > -----------------------
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Could
> > not read snapshots: File too large
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Could
> > not read snapshots: File too large
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > -rw-r--r--. 1 root root 73G Jan 22 11:04
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > -----------------------
> >
> > We tried restoring to before the snapshot failure, but still have strange
> > errors:
> >
> > ----------------------
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > -rw-r--r--. 1 root root 73G Jan 22 11:04
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > file format: qcow2
> > virtual size: 50G (53687091200 bytes)
> > disk size: 73G
> > cluster_size: 65536
> > Snapshot list:
> > ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 1         a8fdf99f-8219-4032-a9c8-87a6e09e7f95   3.7G 2018-12-23 11:01:43
> > 3099:35:55.242
> > 2         b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd   3.8G 2019-01-06 11:03:16
> > 3431:52:23.942
> > Format specific information:
> >     compat: 1.1
> >     lazy refcounts: false
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check
> > ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > tcmalloc: large alloc 1539750010880 bytes == (nil) @  0x7fb9cbbf7bf3
> > 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc
> 0x55d16ddf2541
> > 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 0x55d16de373e6
> 0x7fb9c63a3c05
> > 0x55d16ddd9f7d
> > No errors were found on the image.
> >
> > [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img
> snapshot
> > -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
> > Snapshot list:
> > ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 1         a8fdf99f-8219-4032-a9c8-87a6e09e7f95   3.7G 2018-12-23 11:01:43
> > 3099:35:55.242
> > 2         b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd   3.8G 2019-01-06 11:03:16
> > 3431:52:23.942
> > --------------------------
> >
> > Everyone is now extremely hesitant to use snapshots in KVM....  We tried
> > deleting the snapshots in the restored disk image, but it errors out...
> >
> >
> > Does anyone else have issues with KVM snapshots?  We are considering just
> > disabling this functionality now...
> >
> > Thanks
> > Sean
> >
> >
> >
> >
> >
> >
> >
>


-- 
With best regards, Ivan Kudryavtsev
Bitworks LLC
Cell RU: +7-923-414-1515
Cell USA: +1-201-257-1512
WWW: http://bitworks.software/ <http://bw-sw.com/>

Re: Snapshots on KVM corrupting disk images

Reply via email to