[ovirt-users] Re: strange issue: vm lost info on disk

Juan Pablo Mon, 14 May 2018 16:29:51 -0700

=================================
=new findings/ it's working now =
=================================
I cloned the faulty system in order to play with it, and the cloned vm *boots
with no problem at all*. so there's clearly an issue with moving between
nfs to iscsi SD's with a snapshot. I have both VM's now and if anyone is
interested I can keep going on troubleshooting.


if not, I would like to give a very special ***THANK YOU*** to the whole
REDHAT+OVIRT team for their daily outstanding work and their excellent
products.

thanks for your time!
JP


qemu-img info --backing-chain
/rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c4405e8379/images/36678881-9686-48b5-b39c-16fafece5c5a/1bbf1375-7469-426a-b68d-adbc3446d51e
image:
/rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c4405e8379/images/36678881-9686-48b5-b39c-16fafece5c5a/1bbf1375-7469-426a-b68d-adbc3446d51e
file format: raw
virtual size: 1.1T (1181116006400 bytes)
disk size: 0




2018-05-14 14:58 GMT-03:00 Juan Pablo <[email protected]>:

> Im still wondering why Ovirt moved an image and left it with errors, but
> reported the move as successful in the GUI. If you think it's related to
> the snapshot, its really strange as it's the first time I see this odd
> behavior, never got an issue like this when moving images+snaps.
>
> on the other side, if it moved the 700G image(and not the extra 400), why
> it's unconsistant? shouldn't be old, but nevertheless in good shape? is
> there anything else to try?
>
>
> thanks in advance!
> regards,
>
>
> 2018-05-14 12:01 GMT-03:00 Juan Pablo <[email protected]>:
>
>> Hi Nir, thanks for the reply, here's the output:
>>
>> *(BASE)*
>> *[root@node02 ~]# * qemu-img info --backing-chain
>> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
>> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
>> 5-970e-4643-9774-96c31796062c
>> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
>> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
>> 5-970e-4643-9774-96c31796062c
>> file format: raw
>> virtual size: 700G (751619276800 bytes)
>> disk size: 0
>>
>>
>>
>> *(with Snapshot)[root@node02 ~]# * qemu-img info --backing-chain
>> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
>> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/86a6fdb
>> 9-b9a4-4b78-8e3d-940f83cedc5a
>> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
>> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/86a6fdb
>> 9-b9a4-4b78-8e3d-940f83cedc5a
>> file format: qcow2
>> virtual size: 1.1T (1181116006400 bytes)
>> disk size: 0
>> cluster_size: 65536
>> backing file: 52532d05-970e-4643-9774-96c31796062c (actual path:
>> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
>> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
>> 5-970e-4643-9774-96c31796062c)
>> backing file format: raw
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>>
>> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
>> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
>> 5-970e-4643-9774-96c31796062c
>> file format: raw
>> virtual size: 700G (751619276800 bytes)
>> disk size: 0
>>
>> I really appreciate your time helping,
>> regards,
>>
>>
>> 2018-05-14 11:36 GMT-03:00 Nir Soffer <[email protected]>:
>>
>>> On Mon, May 14, 2018 at 5:19 PM Juan Pablo <[email protected]>
>>> wrote:
>>>
>>>> ok, so Im confirming that the image is wrong somehow:
>>>> with no snapshot, from inside the vm disk size is reporting 750G.
>>>> with a snapshot, from inside the vm disk size is reporting 1100G.
>>>> both have no partitions on it, so I guess ovirt migrated the structure
>>>> of the 750G disk on a 1100 disk, any ideas to troubleshoot this and see if
>>>> there's data to recover?
>>>>
>>>
>>> Maybe you resized the disk after making a snapshot?
>>>
>>> If the base is raw, the size seen by the guest is the size of the image.
>>>
>>> The snapshot is qcow2, the size seen by the guest is the size saved in
>>> the qcow2 header.
>>>
>>> Can you share the output of:
>>>
>>>     qemu-img info --backing-chain /path/to/snapshot
>>>
>>> And:
>>>
>>>     qemu-img info --backing-chain /path/to/base
>>>
>>> You can see the path in the vm xml, either in vdsm.log, or using virsh:
>>>
>>>     virsh -r list
>>>     virtsh -r dumpxml vm-id
>>>
>>> Nir
>>>
>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>> 2018-05-13 15:25 GMT-03:00 Juan Pablo <[email protected]>:
>>>>
>>>>> 2 clues:
>>>>> -the original size of the disk was 750G and was extended a month ago
>>>>> to 1100G. The System rebooted fine several times, and took the new size
>>>>> with no problems.
>>>>>
>>>>> -I run fdisk from a centos 7 rescue cd and '/dev/vda' reported 750G.
>>>>> then, I took a snapshot of the disk to play with recovery tools and now
>>>>> fdisk reports 1100G...  ¬¬
>>>>>
>>>>> so my guess is on the extend and later migration to a different
>>>>> storage domain caused the issue.
>>>>> Im currently running testdisk to see if theres any partition to
>>>>> recover.
>>>>>
>>>>> regards,
>>>>>
>>>>> 2018-05-13 12:31 GMT-03:00 Juan Pablo <[email protected]>:
>>>>>
>>>>>> I removed the auto-snapshot and still no lucky. no bootable disk
>>>>>> found. =(
>>>>>> ideas?
>>>>>>
>>>>>>
>>>>>> 2018-05-13 12:26 GMT-03:00 Juan Pablo <[email protected]>:
>>>>>>
>>>>>>> benny, thanks for your reply:
>>>>>>> ok, so the steps are : removing the snapshot on the first place.
>>>>>>> then what do you suggest?
>>>>>>>
>>>>>>>
>>>>>>> 2018-05-12 15:23 GMT-03:00 Nir Soffer <[email protected]>:
>>>>>>>
>>>>>>>> On Sat, 12 May 2018, 11:32 Benny Zlotnik, <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Using the auto-generated snapshot is generally a bad idea as it's
>>>>>>>>> inconsistent,
>>>>>>>>>
>>>>>>>>
>>>>>>>> What do you mean by inconsistant?
>>>>>>>>
>>>>>>>>
>>>>>>>> you should remove it before moving further
>>>>>>>>>
>>>>>>>>> On Fri, May 11, 2018 at 7:25 PM, Juan Pablo <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I rebooted it with no luck, them I used the auto-gen snapshot ,
>>>>>>>>>> same luck.
>>>>>>>>>> attaching the logs in gdrive
>>>>>>>>>>
>>>>>>>>>> thanks in advance
>>>>>>>>>>
>>>>>>>>>> 2018-05-11 12:50 GMT-03:00 Benny Zlotnik <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>> I see here a failed attempt:
>>>>>>>>>>> 2018-05-09 16:00:20,129-03 ERROR [org.ovirt.engine.core.dal.dbb
>>>>>>>>>>> roker.auditloghandling.AuditLogDirector]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-67)
>>>>>>>>>>> [bd8eeb1d-f49a-4f91-a521-e0f31b4a7cbd] EVENT_ID:
>>>>>>>>>>> USER_MOVED_DISK_FINISHED_FAILURE(2,011), User
>>>>>>>>>>> admin@internal-authz have failed to move disk mail02-int_Disk1
>>>>>>>>>>> to domain 2penLA.
>>>>>>>>>>>
>>>>>>>>>>> Then another:
>>>>>>>>>>> 2018-05-09 16:15:06,998-03 ERROR [org.ovirt.engine.core.dal.dbb
>>>>>>>>>>> roker.auditloghandling.AuditLogDirector]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-34) []
>>>>>>>>>>> EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User
>>>>>>>>>>> admin@internal-authz have failed to move disk mail02-int_Disk1
>>>>>>>>>>> to domain 2penLA.
>>>>>>>>>>>
>>>>>>>>>>> Here I see a successful attempt:
>>>>>>>>>>> 2018-05-09 21:58:42,628-03 INFO  [org.ovirt.engine.core.dal.dbb
>>>>>>>>>>> roker.auditloghandling.AuditLogDirector] (default task-50)
>>>>>>>>>>> [940b051c-8c63-4711-baf9-f3520bb2b825] EVENT_ID:
>>>>>>>>>>> USER_MOVED_DISK(2,008), User admin@internal-authz moving disk
>>>>>>>>>>> mail02-int_Disk1 to domain 2penLA.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then, in the last attempt I see the attempt was successful but
>>>>>>>>>>> live merge failed:
>>>>>>>>>>> 2018-05-11 03:37:59,509-03 ERROR 
>>>>>>>>>>> [org.ovirt.engine.core.bll.MergeStatusCommand]
>>>>>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Failed to live merge,
>>>>>>>>>>> still in volume chain: [5d9d2958-96bc-49fa-9100-2f33a3ba737f,
>>>>>>>>>>> 52532d05-970e-4643-9774-96c31796062c]
>>>>>>>>>>> 2018-05-11 03:38:01,495-03 INFO  [org.ovirt.engine.core.bll.Ser
>>>>>>>>>>> ialChildCommandsExecutionCallback]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-51)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command
>>>>>>>>>>> 'LiveMigrateDisk' (id: '115fc375-6018-4d59-b9f2-51ee05ca49f8')
>>>>>>>>>>> waiting on child command id: '26bc52a4-4509-4577-b342-44a679bc628f'
>>>>>>>>>>> type:'RemoveSnapshot' to complete
>>>>>>>>>>> 2018-05-11 03:38:01,501-03 ERROR [org.ovirt.engine.core.bll.sna
>>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-51)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command id:
>>>>>>>>>>> '4936d196-a891-4484-9cf5-fceaafbf3364 failed child command
>>>>>>>>>>> status for step 'MERGE_STATUS'
>>>>>>>>>>> 2018-05-11 03:38:01,501-03 INFO  [org.ovirt.engine.core.bll.sna
>>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommandCallback]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-51)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command
>>>>>>>>>>> 'RemoveSnapshotSingleDiskLive' id: 
>>>>>>>>>>> '4936d196-a891-4484-9cf5-fceaafbf3364'
>>>>>>>>>>> child commands '[8da5f261-7edd-4930-8d9d-d34f232d84b3,
>>>>>>>>>>> 1c320f4b-7296-43c4-a3e6-8a868e23fc35,
>>>>>>>>>>> a0e9e70c-cd65-4dfb-bd00-076c4e99556a]' executions were
>>>>>>>>>>> completed, status 'FAILED'
>>>>>>>>>>> 2018-05-11 03:38:02,513-03 ERROR [org.ovirt.engine.core.bll.sna
>>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-2)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Merging of snapshot
>>>>>>>>>>> '319e8bbb-9efe-4de4-a9a6-862e3deb891f' images
>>>>>>>>>>> '52532d05-970e-4643-9774-96c31796062c'..'5d9d2958-96bc-49fa-9100-2f33a3ba737f'
>>>>>>>>>>> failed. Images have been marked illegal and can no longer be 
>>>>>>>>>>> previewed or
>>>>>>>>>>> reverted to. Please retry Live Merge on the snapshot to complete the
>>>>>>>>>>> operation.
>>>>>>>>>>> 2018-05-11 03:38:02,519-03 ERROR [org.ovirt.engine.core.bll.sna
>>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-2)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Ending command
>>>>>>>>>>> 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand'
>>>>>>>>>>> with failure.
>>>>>>>>>>> 2018-05-11 03:38:03,530-03 INFO  [org.ovirt.engine.core.bll.Con
>>>>>>>>>>> currentChildCommandsExecutionCallback]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-37)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command 'RemoveSnapshot'
>>>>>>>>>>> id: '26bc52a4-4509-4577-b342-44a679bc628f' child commands
>>>>>>>>>>> '[4936d196-a891-4484-9cf5-fceaafbf3364]' executions were
>>>>>>>>>>> completed, status 'FAILED'
>>>>>>>>>>> 2018-05-11 03:38:04,548-03 ERROR 
>>>>>>>>>>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-66)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Ending command
>>>>>>>>>>> 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand'
>>>>>>>>>>> with failure.
>>>>>>>>>>> 2018-05-11 03:38:04,557-03 INFO  
>>>>>>>>>>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand]
>>>>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-66)
>>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Lock freed to object
>>>>>>>>>>> 'EngineLock:{exclusiveLocks='[4808bb70-c9cc-4286-aa39-16b579
>>>>>>>>>>> 8213ac=LIVE_STORAGE_MIGRATION]', sharedLocks=''}'
>>>>>>>>>>>
>>>>>>>>>>> I do not see the merge attempt in the vdsm.log, so please send
>>>>>>>>>>> vdsm logs for node02.phy.eze.ampgn.com.ar from that time.
>>>>>>>>>>>
>>>>>>>>>>> Also, did you use the auto-generated snapshot to start the vm?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 11, 2018 at 6:11 PM, Juan Pablo <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> after the xfs_repair, it says: sorry I could not find valid
>>>>>>>>>>>> secondary superblock
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-05-11 12:09 GMT-03:00 Juan Pablo <
>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>
>>>>>>>>>>>>> hi,
>>>>>>>>>>>>> Alias:
>>>>>>>>>>>>> mail02-int_Disk1
>>>>>>>>>>>>> Description:
>>>>>>>>>>>>> ID:
>>>>>>>>>>>>> 65ec515e-0aae-4fe6-a561-387929c7fb4d
>>>>>>>>>>>>> Alignment:
>>>>>>>>>>>>> Unknown
>>>>>>>>>>>>> Disk Profile:
>>>>>>>>>>>>> Wipe After Delete:
>>>>>>>>>>>>> No
>>>>>>>>>>>>>
>>>>>>>>>>>>> that one
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-05-11 11:12 GMT-03:00 Benny Zlotnik <[email protected]>
>>>>>>>>>>>>> :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I looked at the logs and I see some disks have moved
>>>>>>>>>>>>>> successfully and some failed. Which disk is causing the problems?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 11, 2018 at 5:02 PM, Juan Pablo <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi, just sent you via drive the files. attaching some extra
>>>>>>>>>>>>>>> info, thanks thanks and thanks :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> from inside the migrated vm I had the following attached
>>>>>>>>>>>>>>> dmesg output before rebooting
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> regards and thanks again for the help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2018-05-11 10:45 GMT-03:00 Benny Zlotnik <
>>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dropbox or google drive I guess. Also, can you attach
>>>>>>>>>>>>>>>> engine.log?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, May 11, 2018 at 4:43 PM, Juan Pablo <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> vdsm is too big for gmail ...any other way I can share it
>>>>>>>>>>>>>>>>> with you?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---------- Forwrded message ----------
>>>>>>>>>>>>>>>>> From: Juan Pablo <[email protected]>
>>>>>>>>>>>>>>>>> Date: 2018-05-11 10:40 GMT-03:00
>>>>>>>>>>>>>>>>> Subject: Re: [ovirt-users] strange issue: vm lost info on
>>>>>>>>>>>>>>>>> disk
>>>>>>>>>>>>>>>>> To: Benny Zlotnik <[email protected]>
>>>>>>>>>>>>>>>>> Cc: users <[email protected]>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Benny, thanks for your reply! it was a Live migration.
>>>>>>>>>>>>>>>>> sorry, it was from nfs to iscsi, not otherwise. I have reboot 
>>>>>>>>>>>>>>>>> the vm for
>>>>>>>>>>>>>>>>> rescue and it does not detect any partitions with fdisk, Im 
>>>>>>>>>>>>>>>>> running a
>>>>>>>>>>>>>>>>> xfs_repair with -n and found some corrupted primary 
>>>>>>>>>>>>>>>>> superblock., its still
>>>>>>>>>>>>>>>>> running... ( so... there's info in the disk maybe?)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> attaching logs, let me know if those are the ones.
>>>>>>>>>>>>>>>>> thanks again!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2018-05-11 9:45 GMT-03:00 Benny Zlotnik <
>>>>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can you provide the logs? engine and vdsm.
>>>>>>>>>>>>>>>>>> Did you perform a live migration (the VM is running) or
>>>>>>>>>>>>>>>>>> cold?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, May 11, 2018 at 2:49 PM, Juan Pablo <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi! , Im strugled about an ongoing problem:
>>>>>>>>>>>>>>>>>>>  after migrating a vm's disk from an iscsi domain to a
>>>>>>>>>>>>>>>>>>> nfs and ovirt reporting the migration was successful, I see 
>>>>>>>>>>>>>>>>>>> there's no data
>>>>>>>>>>>>>>>>>>> 'inside' the vm's disk. we never had this issues with ovirt 
>>>>>>>>>>>>>>>>>>> so Im stranged
>>>>>>>>>>>>>>>>>>> about the root cause and if theres a chance of recovering 
>>>>>>>>>>>>>>>>>>> the information.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> can you please help me out troubleshooting this one? I
>>>>>>>>>>>>>>>>>>> would really appreciate it =)
>>>>>>>>>>>>>>>>>>> running ovirt 4.2.1 here!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> thanks in advance,
>>>>>>>>>>>>>>>>>>> JP
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> Users mailing list -- [email protected]
>>>>>>>>>>>>>>>>>>> To unsubscribe send an email to [email protected]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list -- [email protected]
>>>>>>>>> To unsubscribe send an email to [email protected]
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ovirt-users] Re: strange issue: vm lost info on disk

Reply via email to