Re: [Urgent]: corrupt DB after VM live migration with storage migration

ilya Wed, 04 May 2016 21:55:36 -0700

Yiping,

We've dealt with many corruptions in past. It was more around VMware as
it would eat up disks time to time. Or someone would move the VM out of
bound by doing storage or cluster vmotion.


The solution you described should work.

However, for extra paranoid:

step 1, full db backup
step 2, backup the root and data disks as some other file name - just in
case

Then proceed with your proposed solution.

As long as you have proper backups, you should be ok. If VM start
failed, the logs will tell you where cloudstack expects for volume to
be, you can either move the volume there or update cloudstack volumes
table and point it to correct pool_id.

Regards
ilya


On 5/4/16 8:49 PM, Yiping Zhang wrote:
> Before I try the direct DB modifications, I would first:
> 
> * shutdown the VM instances
> * stop cloudstack-management service
> * do a DB backup with mysqldump
> 
> What I worry the most is that the volumes on new cluster’s primary storage 
> device are marked as “removed”, so if I shutdown the instances, the 
> cloudstack may kick off a storage cleanup job to remove them from new 
> cluster’s primary storage  before I can get the fixes in.
> 
> Is there a way to temporarily disable storage cleanups ?
> 
> Yiping
> 
> 
> 
> 
> On 5/4/16, 3:22 PM, "Yiping Zhang" <[email protected]> wrote:
> 
>> Hi, all:
>>
>> I am in a situation that I need some help:
>>
>> I did a live migration with storage migration required for a production VM 
>> instance from one cluster to another.  The first migration attempt failed 
>> after some time, but the second attempt succeeded. During all this time the 
>> VM instance is accessible (and it is still up and running).  However, when I 
>> use my api script to query volumes, it still reports that the volume is on 
>> the old cluster’s primary storage.  If I shut down this VM,  I am afraid 
>> that it won’t start again as it would try to use non-existing volumes.
>>
>> Checking database, sure enough, the DB still has old info about these 
>> volumes:
>>
>>
>> mysql> select id,name from storage_pool where id=1 or id=8;
>>
>> +----+------------------+
>>
>> | id | name             |
>>
>> +----+------------------+
>>
>> |  1 | abprod-primary1  |
>>
>> |  8 | abprod-p1c2-pri1 |
>>
>> +----+------------------+
>>
>> 2 rows in set (0.01 sec)
>>
>>
>> Here the old cluster’s primary storage has id=1, and the new cluster’s 
>> primary storage has id=8.
>>
>>
>> Here are the entries with wrong info in volumes table:
>>
>>
>> mysql> select id,name, uuid, path,pool_id, removed from volumes where 
>> name='ROOT-97' or name='DATA-97';
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> | id  | name    | uuid                                 | path                
>>                  | pool_id | removed             |
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 
>> 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |       1 | NULL                |
>>
>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 
>> 6b75496d-5907-46c3-8836-5618f11dac8e |       1 | NULL                |
>>
>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL                
>>                  |       8 | 2016-05-03 06:10:40 |
>>
>> | 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL                
>>                  |       8 | 2016-05-03 06:10:45 |
>>
>> +-----+---------+--------------------------------------+--------------------------------------+---------+---------------------+
>>
>> 4 rows in set (0.01 sec)
>>
>> On the xenserver of old cluster, the volumes do not exist:
>>
>>
>> [root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>>
>> [root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>>
>> [root@abmpc-hv01 ~]#
>>
>> But the volumes are on the new cluster’s primary storage:
>>
>>
>> [root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>>
>> uuid ( RO)                : a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>>
>>          name-label ( RW): ROOT-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 34359738368
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): true
>>
>>
>> uuid ( RO)                : c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>>
>>          name-label ( RW): ROOT-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 34359738368
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): false
>>
>>
>> [root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>>
>> uuid ( RO)                : bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>>
>>          name-label ( RW): DATA-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 107374182400
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): false
>>
>>
>> uuid ( RO)                : a8c187cc-2ba0-4928-8acf-2afc012c036c
>>
>>          name-label ( RW): DATA-97
>>
>>    name-description ( RW):
>>
>>             sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>>
>>        virtual-size ( RO): 107374182400
>>
>>            sharable ( RO): false
>>
>>           read-only ( RO): true
>>
>>
>> Following is how I plan to fix the corrupted DB entries. Note: using uuid of 
>> VDI volume with read/write access as the path values:
>>
>>
>> 1. for ROOT-97 volume:
>>
>> Update volumes set removed=NOW() where id=124;
>> Update volumes set removed=NULL where id=317;
>> Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;
>>
>>
>> 2) for DATA-97 volume:
>>
>> Update volumes set pool_id=8 where id=125;
>>
>> Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;
>>
>>
>> Would this work?
>>
>>
>> Thanks for all the helps anyone can provide.  I have a total of 4 VM 
>> instances with 8 volumes in this situation need to be fixed.
>>
>>
>> Yiping

Re: [Urgent]: corrupt DB after VM live migration with storage migration

Reply via email to