Re: Problem with CLOUDSTACK-10240 (Cannot migrate local volume to shared storage)

Tutkowski, Mike Mon, 16 Jul 2018 13:20:41 -0700

Actually, I think I answered both of your questions with these two prior 
e-mails. Please let me know if you need further clarification. Thanks!


On 7/16/18, 2:17 PM, "Tutkowski, Mike" <mike.tutkow...@netapp.com> wrote:

    Allow me to correct what I said here:
    
    “If getDefaultMappingOfVolumesAndStoragePoolForMigration is invoked, we 
silently ignore the (faulty) input (which is a new storage pool) from the user 
and keep the volume in its same managed storage pool (the user may wonder why 
it wasn’t migrated if they don’t get an error message back telling them this is 
not allowed).”
    
    I should have said the following:
    
    If getDefaultMappingOfVolumesAndStoragePoolForMigration is invoked on a VM 
that is using managed storage that is only at the cluster level (managed 
storage can be at either the zone or cluster level) and we are trying to 
migrate the VM from one cluster to another, this operation should fail (as the 
old code detects). The new code tries to keep the volume in the same storage 
pool (but that storage pool will not be visible to the hosts in the destination 
compute cluster).
    
    On 7/16/18, 2:10 PM, "Tutkowski, Mike" <mike.tutkow...@netapp.com> wrote:
    
        Let me answer the questions in two separate e-mails.
        
        This answer deals with what you wrote about this code:
        
            > if (destPool.getId() == currentPool.getId()) {
            >     volumeToPoolObjectMap.put(volume, currentPool);
            > } else {
            >      throw new CloudRuntimeException("Currently, a volume on 
managed
            > storage can only be 'migrated' to itself.");
            > }
            >
        
        The code above is invoked if the user tries to migrate a volume that’s 
on managed storage to another storage pool. At present, such volumes can be 
migrated when a VM is migrated from one compute cluster to another, but those 
volumes have to remain on the same managed storage.
        
        Here’s an example:
        
        Let’s say VM_1 is in Cluster_1. VM_1 has a root (or data) disk on 
managed storage. We try to migrate the VM from Cluster_1 to Cluster_2 and 
specify a new storage pool for the volume. This case should fail. To make it 
work, you need to either 1) not specify a new storage pool or 2) specify the 
same storage pool the volume is already in. If the managed storage in question 
is zone wide, then it can be used from both Cluster_1 and Cluster_2.
        
        The new code might call 
getDefaultMappingOfVolumesAndStoragePoolForMigration (if no storage pools at 
all are passed in to the API) or it might call 
createMappingVolumeAndStoragePoolEnteredByUser.
        
        If getDefaultMappingOfVolumesAndStoragePoolForMigration is invoked, we 
silently ignore the (faulty) input (which is a new storage pool) from the user 
and keep the volume in its same managed storage pool (the user may wonder why 
it wasn’t migrated if they don’t get an error message back telling them this is 
not allowed).
        
        If createMappingVolumeAndStoragePoolEnteredByUser is invoked, we seem 
to have a bigger problem (code is below):
        
        I do not believe you are required to pass in a new storage pool for 
each and every volume of the VM. If the VM has, say, three volumes, you may 
only try to migrate two of the volumes to new storage pools. This logic seems 
to assume if you want to migrate one of the VM’s volumes, then you necessarily 
want to migrate all of the VM’s volumes. I believe it’s possible for targetPool 
to come back null and later throw a NullPointerException. The old code walks 
through each volume of the VM and checks if there is a new storage pool 
specified for it. If so, do one thing; else, do something else.
        
            private Map<Volume, StoragePool> 
createMappingVolumeAndStoragePoolEnteredByUser(VirtualMachineProfile profile, 
Host host, Map<Long, Long> volumeToPool) {
                Map<Volume, StoragePool> volumeToPoolObjectMap = new 
HashMap<Volume, StoragePool>();
                for(Long volumeId: volumeToPool.keySet()) {
                    VolumeVO volume = _volsDao.findById(volumeId);
        
                    Long poolId = volumeToPool.get(volumeId);
                    StoragePoolVO targetPool = _storagePoolDao.findById(poolId);
                    StoragePoolVO currentPool = 
_storagePoolDao.findById(volume.getPoolId());
        
                    if (_poolHostDao.findByPoolHost(targetPool.getId(), 
host.getId()) == null) {
                        throw new CloudRuntimeException(String.format("Cannot 
migrate the volume [%s] to the storage pool [%s] while migrating VM [%s] to 
target host [%s]. The host does not have access to the storage pool entered.", 
volume.getUuid(), targetPool.getUuid(), profile.getUuid(), host.getUuid()));
                    }
                    if (currentPool.getId() == targetPool.getId()) {
                        s_logger.info(String.format("The volume [%s] is already 
allocated in storage pool [%s].", volume.getUuid(), targetPool.getUuid()));
                    }
                    volumeToPoolObjectMap.put(volume, targetPool);
                }
                return volumeToPoolObjectMap;
            }
        
        On 7/16/18, 5:13 AM, "Rafael Weingärtner" <rafaelweingart...@gmail.com> 
wrote:
        
            Ok, I see what happened there with the migration to cluster. When I 
re-did
            the code I did not have this case. And therefore, in the old code, 
I was
            not seeing this use case (convoluted code, lack of documentation, 
and so
            on; we all know the story). I will fix it.
            
            Regarding the managed storage issue, can you describe the “special
            handling” you need?
            
            Are you talking about this:
            
            > if (destPool.getId() == currentPool.getId()) {
            >     volumeToPoolObjectMap.put(volume, currentPool);
            > } else {
            >      throw new CloudRuntimeException("Currently, a volume on 
managed
            > storage can only be 'migrated' to itself.");
            > }
            >
            
            
            That is a simple validation, right? A validation to throw an 
exception if
            the user tries to migrate the volume to some other storage pool. Is 
that
            it? If that is the case, the default method
            “getDefaultMappingOfVolumesAndStoragePoolForMigration” already 
takes care
            of this. Meaning, that it will not try to move the volume to other 
storage
            pool.
            
            On the other hand, we need to add a validation in the
            “createMappingVolumeAndStoragePoolEnteredByUser” method then.
            I will wait for your feedback before starting to code. Thanks for 
spotting
            this issue.
            
            On Sun, Jul 15, 2018 at 9:11 PM, Tutkowski, Mike 
<mike.tutkow...@netapp.com>
            wrote:
            
            > Hi Rafael,
            >
            > Thanks for your time on this.
            >
            > Here is an example where the new code deviates from the old code 
in a
            > critical fashion (code right below is new):
            >
            >     private Map<Volume, StoragePool> 
getDefaultMappingOfVolumesAndS
            > toragePoolForMigration(VirtualMachineProfile profile, Host 
targetHost) {
            >         Map<Volume, StoragePool> volumeToPoolObjectMap = new
            > HashMap<Volume, StoragePool>();
            >         List<VolumeVO> allVolumes = 
_volsDao.findUsableVolumesForInstance(
            > profile.getId());
            >         for (VolumeVO volume : allVolumes) {
            >             StoragePoolVO currentPool = _storagePoolDao.findById(
            > volume.getPoolId());
            >             if (ScopeType.HOST.equals(currentPool.getScope())) {
            >                 createVolumeToStoragePoolMappingIfNeeded(profile,
            > targetHost, volumeToPoolObjectMap, volume, currentPool);
            >             } else {
            >                 volumeToPoolObjectMap.put(volume, currentPool);
            >             }
            >         }
            >         return volumeToPoolObjectMap;
            >     }
            >
            > What happens in the new code (above) is if the user didn’t pass 
in a
            > storage pool to migrate the virtual disk to (but the VM is being 
migrated
            > to a new cluster), this code just assigns the virtual disk to its 
current
            > storage pool (which is not going to be visible to any of the 
hosts in the
            > new compute cluster).
            >
            > In the old code (I’m looking at 4.11.3 here), you could look 
around line
            > 2337 for the following code (in the VirtualMachineManagerImpl.
            > getPoolListForVolumesForMigration method):
            >
            >                     // Find a suitable pool for the volume. Call 
the
            > storage pool allocator to find the list of pools.
            >
            >                     final DiskProfile diskProfile = new
            > DiskProfile(volume, diskOffering, profile.getHypervisorType());
            >                     final DataCenterDeployment plan = new
            > DataCenterDeployment(host.getDataCenterId(), host.getPodId(),
            > host.getClusterId(),
            >                             host.getId(), null, null);
            >
            >                     final List<StoragePool> poolList = new 
ArrayList<>();
            >                     final ExcludeList avoid = new ExcludeList();
            >
            >                     for (final StoragePoolAllocator allocator :
            > _storagePoolAllocators) {
            >                         final List<StoragePool> 
poolListFromAllocator =
            > allocator.allocateToPool(diskProfile, profile, plan, avoid,
            > StoragePoolAllocator.RETURN_UPTO_ALL);
            >
            >                         if (poolListFromAllocator != null &&
            > !poolListFromAllocator.isEmpty()) {
            >                             
poolList.addAll(poolListFromAllocator);
            >                         }
            >                     }
            >
            > This old code would find an applicable storage pool in the 
destination
            > cluster (one that can be seen by the hosts in that compute 
cluster).
            >
            > I think the main error in the new logic is the assumption that a 
VM can
            > only be migrated to a host in the same computer cluster. For 
XenServer
            > (perhaps for other hypervisor types?), we support cross-cluster VM
            > migration.
            >
            > The other issue I noticed is that there is no logic in the new 
code that
            > checks for managed-storage use cases. If you look in the
            > VirtualMachineManagerImpl.getPoolListForVolumesForMigration 
method in the
            > old code, there is special handling for managed storage. I don’t 
see this
            > reproduced in the new logic.
            >
            > I sympathize with your point that all tests passed yet this issue 
was not
            > uncovered. Unfortunately, I suspect we have a fairly low % 
coverage of
            > automated tests on CloudStack. If we ever did get to a high % of 
automated
            > test coverage, we might be able to spin up new releases more 
frequently. As
            > the case stands today, however, there are probably many un-tested 
use cases
            > when it comes to our automated suite of tests.
            >
            > Thanks again!
            > Mike
            >
            > On 7/15/18, 4:19 PM, "Rafael Weingärtner" 
<rafaelweingart...@gmail.com>
            > wrote:
            >
            >     Mike, are you able to pin-point in the old/replaced code the 
bit that
            > was
            >     handling your use case?  I took the most care not to break 
anything.
            >     Also, your test case, isn't it in the ACS' integration test 
suite? In
            >     theory, all test passed when we merged the PR.
            >
            >     I sure can take a look at it. Can you detail your use case? I 
mean, the
            >     high level execution flow. What API methods you do, what you 
expected
            > to
            >     happen, and what is happening today.
            >
            >     On Sun, Jul 15, 2018 at 3:25 AM, Tutkowski, Mike <
            > mike.tutkow...@netapp.com>
            >     wrote:
            >
            >     > It looks like this is the problematic PR:
            >     >
            >     > https://github.com/apache/cloudstack/pull/2425/
            >     >
            >     > On 7/15/18, 12:20 AM, "Tutkowski, Mike" 
<mike.tutkow...@netapp.com>
            > wrote:
            >     >
            >     >     Hi,
            >     >
            >     >     While running managed-storage regression tests tonight, 
I
            > noticed a
            >     > problem that is not related to managed storage.
            >     >
            >     >     CLOUDSTACK-10240 is a ticket asking that we allow the 
migration
            > of a
            >     > virtual disk that’s on local storage to shared storage. In 
the
            > process of
            >     > enabling this feature, the VirtualMachineManagerImpl.
            >     > getPoolListForVolumesForMigration method was re-written in 
a way
            > that
            >     > completely breaks at least one use case: Migrating a VM 
across
            > compute
            >     > clusters (at least supported in XenServer). If, say, a 
virtual disk
            > resides
            >     > on shared storage in the source compute cluster, we must be 
able to
            > copy
            >     > this virtual disk to shared storage in the destination 
compute
            > cluster.
            >     >
            >     >     As the code is currently written, this is no longer 
possible. It
            > also
            >     > seems that the managed-storage logic has been dropped for 
some
            > reason in
            >     > the new implementation.
            >     >
            >     >     Rafael – It seems that you worked on this feature. 
Would you be
            > able
            >     > to look into this and create a PR?
            >     >
            >     >     Thanks,
            >     >     Mike
            >     >
            >     >
            >     >
            >
            >
            >     --
            >     Rafael Weingärtner
            >
            >
            >
            
            
            -- 
            Rafael Weingärtner

Re: Problem with CLOUDSTACK-10240 (Cannot migrate local volume to shared storage)

Reply via email to