================================= =new findings/ it's working now = ================================= I cloned the faulty system in order to play with it, and the cloned vm *boots with no problem at all*. so there's clearly an issue with moving between nfs to iscsi SD's with a snapshot. I have both VM's now and if anyone is interested I can keep going on troubleshooting.
if not, I would like to give a very special ***THANK YOU*** to the whole REDHAT+OVIRT team for their daily outstanding work and their excellent products. thanks for your time! JP qemu-img info --backing-chain /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c4405e8379/images/36678881-9686-48b5-b39c-16fafece5c5a/1bbf1375-7469-426a-b68d-adbc3446d51e image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c4405e8379/images/36678881-9686-48b5-b39c-16fafece5c5a/1bbf1375-7469-426a-b68d-adbc3446d51e file format: raw virtual size: 1.1T (1181116006400 bytes) disk size: 0 2018-05-14 14:58 GMT-03:00 Juan Pablo <[email protected]>: > Im still wondering why Ovirt moved an image and left it with errors, but > reported the move as successful in the GUI. If you think it's related to > the snapshot, its really strange as it's the first time I see this odd > behavior, never got an issue like this when moving images+snaps. > > on the other side, if it moved the 700G image(and not the extra 400), why > it's unconsistant? shouldn't be old, but nevertheless in good shape? is > there anything else to try? > > > thanks in advance! > regards, > > > 2018-05-14 12:01 GMT-03:00 Juan Pablo <[email protected]>: > >> Hi Nir, thanks for the reply, here's the output: >> >> *(BASE)* >> *[root@node02 ~]# * qemu-img info --backing-chain >> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440 >> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0 >> 5-970e-4643-9774-96c31796062c >> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440 >> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0 >> 5-970e-4643-9774-96c31796062c >> file format: raw >> virtual size: 700G (751619276800 bytes) >> disk size: 0 >> >> >> >> *(with Snapshot)[root@node02 ~]# * qemu-img info --backing-chain >> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440 >> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/86a6fdb >> 9-b9a4-4b78-8e3d-940f83cedc5a >> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440 >> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/86a6fdb >> 9-b9a4-4b78-8e3d-940f83cedc5a >> file format: qcow2 >> virtual size: 1.1T (1181116006400 bytes) >> disk size: 0 >> cluster_size: 65536 >> backing file: 52532d05-970e-4643-9774-96c31796062c (actual path: >> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440 >> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0 >> 5-970e-4643-9774-96c31796062c) >> backing file format: raw >> Format specific information: >> compat: 1.1 >> lazy refcounts: false >> refcount bits: 16 >> corrupt: false >> >> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440 >> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0 >> 5-970e-4643-9774-96c31796062c >> file format: raw >> virtual size: 700G (751619276800 bytes) >> disk size: 0 >> >> I really appreciate your time helping, >> regards, >> >> >> 2018-05-14 11:36 GMT-03:00 Nir Soffer <[email protected]>: >> >>> On Mon, May 14, 2018 at 5:19 PM Juan Pablo <[email protected]> >>> wrote: >>> >>>> ok, so Im confirming that the image is wrong somehow: >>>> with no snapshot, from inside the vm disk size is reporting 750G. >>>> with a snapshot, from inside the vm disk size is reporting 1100G. >>>> both have no partitions on it, so I guess ovirt migrated the structure >>>> of the 750G disk on a 1100 disk, any ideas to troubleshoot this and see if >>>> there's data to recover? >>>> >>> >>> Maybe you resized the disk after making a snapshot? >>> >>> If the base is raw, the size seen by the guest is the size of the image. >>> >>> The snapshot is qcow2, the size seen by the guest is the size saved in >>> the qcow2 header. >>> >>> Can you share the output of: >>> >>> qemu-img info --backing-chain /path/to/snapshot >>> >>> And: >>> >>> qemu-img info --backing-chain /path/to/base >>> >>> You can see the path in the vm xml, either in vdsm.log, or using virsh: >>> >>> virsh -r list >>> virtsh -r dumpxml vm-id >>> >>> Nir >>> >>> >>>> >>>> regards, >>>> >>>> >>>> 2018-05-13 15:25 GMT-03:00 Juan Pablo <[email protected]>: >>>> >>>>> 2 clues: >>>>> -the original size of the disk was 750G and was extended a month ago >>>>> to 1100G. The System rebooted fine several times, and took the new size >>>>> with no problems. >>>>> >>>>> -I run fdisk from a centos 7 rescue cd and '/dev/vda' reported 750G. >>>>> then, I took a snapshot of the disk to play with recovery tools and now >>>>> fdisk reports 1100G... ¬¬ >>>>> >>>>> so my guess is on the extend and later migration to a different >>>>> storage domain caused the issue. >>>>> Im currently running testdisk to see if theres any partition to >>>>> recover. >>>>> >>>>> regards, >>>>> >>>>> 2018-05-13 12:31 GMT-03:00 Juan Pablo <[email protected]>: >>>>> >>>>>> I removed the auto-snapshot and still no lucky. no bootable disk >>>>>> found. =( >>>>>> ideas? >>>>>> >>>>>> >>>>>> 2018-05-13 12:26 GMT-03:00 Juan Pablo <[email protected]>: >>>>>> >>>>>>> benny, thanks for your reply: >>>>>>> ok, so the steps are : removing the snapshot on the first place. >>>>>>> then what do you suggest? >>>>>>> >>>>>>> >>>>>>> 2018-05-12 15:23 GMT-03:00 Nir Soffer <[email protected]>: >>>>>>> >>>>>>>> On Sat, 12 May 2018, 11:32 Benny Zlotnik, <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Using the auto-generated snapshot is generally a bad idea as it's >>>>>>>>> inconsistent, >>>>>>>>> >>>>>>>> >>>>>>>> What do you mean by inconsistant? >>>>>>>> >>>>>>>> >>>>>>>> you should remove it before moving further >>>>>>>>> >>>>>>>>> On Fri, May 11, 2018 at 7:25 PM, Juan Pablo < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I rebooted it with no luck, them I used the auto-gen snapshot , >>>>>>>>>> same luck. >>>>>>>>>> attaching the logs in gdrive >>>>>>>>>> >>>>>>>>>> thanks in advance >>>>>>>>>> >>>>>>>>>> 2018-05-11 12:50 GMT-03:00 Benny Zlotnik <[email protected]>: >>>>>>>>>> >>>>>>>>>>> I see here a failed attempt: >>>>>>>>>>> 2018-05-09 16:00:20,129-03 ERROR [org.ovirt.engine.core.dal.dbb >>>>>>>>>>> roker.auditloghandling.AuditLogDirector] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-67) >>>>>>>>>>> [bd8eeb1d-f49a-4f91-a521-e0f31b4a7cbd] EVENT_ID: >>>>>>>>>>> USER_MOVED_DISK_FINISHED_FAILURE(2,011), User >>>>>>>>>>> admin@internal-authz have failed to move disk mail02-int_Disk1 >>>>>>>>>>> to domain 2penLA. >>>>>>>>>>> >>>>>>>>>>> Then another: >>>>>>>>>>> 2018-05-09 16:15:06,998-03 ERROR [org.ovirt.engine.core.dal.dbb >>>>>>>>>>> roker.auditloghandling.AuditLogDirector] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] >>>>>>>>>>> EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User >>>>>>>>>>> admin@internal-authz have failed to move disk mail02-int_Disk1 >>>>>>>>>>> to domain 2penLA. >>>>>>>>>>> >>>>>>>>>>> Here I see a successful attempt: >>>>>>>>>>> 2018-05-09 21:58:42,628-03 INFO [org.ovirt.engine.core.dal.dbb >>>>>>>>>>> roker.auditloghandling.AuditLogDirector] (default task-50) >>>>>>>>>>> [940b051c-8c63-4711-baf9-f3520bb2b825] EVENT_ID: >>>>>>>>>>> USER_MOVED_DISK(2,008), User admin@internal-authz moving disk >>>>>>>>>>> mail02-int_Disk1 to domain 2penLA. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Then, in the last attempt I see the attempt was successful but >>>>>>>>>>> live merge failed: >>>>>>>>>>> 2018-05-11 03:37:59,509-03 ERROR >>>>>>>>>>> [org.ovirt.engine.core.bll.MergeStatusCommand] >>>>>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Failed to live merge, >>>>>>>>>>> still in volume chain: [5d9d2958-96bc-49fa-9100-2f33a3ba737f, >>>>>>>>>>> 52532d05-970e-4643-9774-96c31796062c] >>>>>>>>>>> 2018-05-11 03:38:01,495-03 INFO [org.ovirt.engine.core.bll.Ser >>>>>>>>>>> ialChildCommandsExecutionCallback] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-51) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command >>>>>>>>>>> 'LiveMigrateDisk' (id: '115fc375-6018-4d59-b9f2-51ee05ca49f8') >>>>>>>>>>> waiting on child command id: '26bc52a4-4509-4577-b342-44a679bc628f' >>>>>>>>>>> type:'RemoveSnapshot' to complete >>>>>>>>>>> 2018-05-11 03:38:01,501-03 ERROR [org.ovirt.engine.core.bll.sna >>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-51) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command id: >>>>>>>>>>> '4936d196-a891-4484-9cf5-fceaafbf3364 failed child command >>>>>>>>>>> status for step 'MERGE_STATUS' >>>>>>>>>>> 2018-05-11 03:38:01,501-03 INFO [org.ovirt.engine.core.bll.sna >>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommandCallback] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-51) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command >>>>>>>>>>> 'RemoveSnapshotSingleDiskLive' id: >>>>>>>>>>> '4936d196-a891-4484-9cf5-fceaafbf3364' >>>>>>>>>>> child commands '[8da5f261-7edd-4930-8d9d-d34f232d84b3, >>>>>>>>>>> 1c320f4b-7296-43c4-a3e6-8a868e23fc35, >>>>>>>>>>> a0e9e70c-cd65-4dfb-bd00-076c4e99556a]' executions were >>>>>>>>>>> completed, status 'FAILED' >>>>>>>>>>> 2018-05-11 03:38:02,513-03 ERROR [org.ovirt.engine.core.bll.sna >>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-2) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Merging of snapshot >>>>>>>>>>> '319e8bbb-9efe-4de4-a9a6-862e3deb891f' images >>>>>>>>>>> '52532d05-970e-4643-9774-96c31796062c'..'5d9d2958-96bc-49fa-9100-2f33a3ba737f' >>>>>>>>>>> failed. Images have been marked illegal and can no longer be >>>>>>>>>>> previewed or >>>>>>>>>>> reverted to. Please retry Live Merge on the snapshot to complete the >>>>>>>>>>> operation. >>>>>>>>>>> 2018-05-11 03:38:02,519-03 ERROR [org.ovirt.engine.core.bll.sna >>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-2) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Ending command >>>>>>>>>>> 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' >>>>>>>>>>> with failure. >>>>>>>>>>> 2018-05-11 03:38:03,530-03 INFO [org.ovirt.engine.core.bll.Con >>>>>>>>>>> currentChildCommandsExecutionCallback] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-37) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command 'RemoveSnapshot' >>>>>>>>>>> id: '26bc52a4-4509-4577-b342-44a679bc628f' child commands >>>>>>>>>>> '[4936d196-a891-4484-9cf5-fceaafbf3364]' executions were >>>>>>>>>>> completed, status 'FAILED' >>>>>>>>>>> 2018-05-11 03:38:04,548-03 ERROR >>>>>>>>>>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-66) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Ending command >>>>>>>>>>> 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' >>>>>>>>>>> with failure. >>>>>>>>>>> 2018-05-11 03:38:04,557-03 INFO >>>>>>>>>>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] >>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-66) >>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Lock freed to object >>>>>>>>>>> 'EngineLock:{exclusiveLocks='[4808bb70-c9cc-4286-aa39-16b579 >>>>>>>>>>> 8213ac=LIVE_STORAGE_MIGRATION]', sharedLocks=''}' >>>>>>>>>>> >>>>>>>>>>> I do not see the merge attempt in the vdsm.log, so please send >>>>>>>>>>> vdsm logs for node02.phy.eze.ampgn.com.ar from that time. >>>>>>>>>>> >>>>>>>>>>> Also, did you use the auto-generated snapshot to start the vm? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, May 11, 2018 at 6:11 PM, Juan Pablo < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> after the xfs_repair, it says: sorry I could not find valid >>>>>>>>>>>> secondary superblock >>>>>>>>>>>> >>>>>>>>>>>> 2018-05-11 12:09 GMT-03:00 Juan Pablo < >>>>>>>>>>>> [email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> hi, >>>>>>>>>>>>> Alias: >>>>>>>>>>>>> mail02-int_Disk1 >>>>>>>>>>>>> Description: >>>>>>>>>>>>> ID: >>>>>>>>>>>>> 65ec515e-0aae-4fe6-a561-387929c7fb4d >>>>>>>>>>>>> Alignment: >>>>>>>>>>>>> Unknown >>>>>>>>>>>>> Disk Profile: >>>>>>>>>>>>> Wipe After Delete: >>>>>>>>>>>>> No >>>>>>>>>>>>> >>>>>>>>>>>>> that one >>>>>>>>>>>>> >>>>>>>>>>>>> 2018-05-11 11:12 GMT-03:00 Benny Zlotnik <[email protected]> >>>>>>>>>>>>> : >>>>>>>>>>>>> >>>>>>>>>>>>>> I looked at the logs and I see some disks have moved >>>>>>>>>>>>>> successfully and some failed. Which disk is causing the problems? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 11, 2018 at 5:02 PM, Juan Pablo < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, just sent you via drive the files. attaching some extra >>>>>>>>>>>>>>> info, thanks thanks and thanks : >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> from inside the migrated vm I had the following attached >>>>>>>>>>>>>>> dmesg output before rebooting >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> regards and thanks again for the help, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2018-05-11 10:45 GMT-03:00 Benny Zlotnik < >>>>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dropbox or google drive I guess. Also, can you attach >>>>>>>>>>>>>>>> engine.log? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, May 11, 2018 at 4:43 PM, Juan Pablo < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> vdsm is too big for gmail ...any other way I can share it >>>>>>>>>>>>>>>>> with you? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ---------- Forwrded message ---------- >>>>>>>>>>>>>>>>> From: Juan Pablo <[email protected]> >>>>>>>>>>>>>>>>> Date: 2018-05-11 10:40 GMT-03:00 >>>>>>>>>>>>>>>>> Subject: Re: [ovirt-users] strange issue: vm lost info on >>>>>>>>>>>>>>>>> disk >>>>>>>>>>>>>>>>> To: Benny Zlotnik <[email protected]> >>>>>>>>>>>>>>>>> Cc: users <[email protected]> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Benny, thanks for your reply! it was a Live migration. >>>>>>>>>>>>>>>>> sorry, it was from nfs to iscsi, not otherwise. I have reboot >>>>>>>>>>>>>>>>> the vm for >>>>>>>>>>>>>>>>> rescue and it does not detect any partitions with fdisk, Im >>>>>>>>>>>>>>>>> running a >>>>>>>>>>>>>>>>> xfs_repair with -n and found some corrupted primary >>>>>>>>>>>>>>>>> superblock., its still >>>>>>>>>>>>>>>>> running... ( so... there's info in the disk maybe?) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> attaching logs, let me know if those are the ones. >>>>>>>>>>>>>>>>> thanks again! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2018-05-11 9:45 GMT-03:00 Benny Zlotnik < >>>>>>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Can you provide the logs? engine and vdsm. >>>>>>>>>>>>>>>>>> Did you perform a live migration (the VM is running) or >>>>>>>>>>>>>>>>>> cold? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, May 11, 2018 at 2:49 PM, Juan Pablo < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi! , Im strugled about an ongoing problem: >>>>>>>>>>>>>>>>>>> after migrating a vm's disk from an iscsi domain to a >>>>>>>>>>>>>>>>>>> nfs and ovirt reporting the migration was successful, I see >>>>>>>>>>>>>>>>>>> there's no data >>>>>>>>>>>>>>>>>>> 'inside' the vm's disk. we never had this issues with ovirt >>>>>>>>>>>>>>>>>>> so Im stranged >>>>>>>>>>>>>>>>>>> about the root cause and if theres a chance of recovering >>>>>>>>>>>>>>>>>>> the information. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> can you please help me out troubleshooting this one? I >>>>>>>>>>>>>>>>>>> would really appreciate it =) >>>>>>>>>>>>>>>>>>> running ovirt 4.2.1 here! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> thanks in advance, >>>>>>>>>>>>>>>>>>> JP >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Users mailing list -- [email protected] >>>>>>>>>>>>>>>>>>> To unsubscribe send an email to [email protected] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list -- [email protected] >>>>>>>>> To unsubscribe send an email to [email protected] >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected]

