Hi Rafael,

Thanks for your time on this.

Here is an example where the new code deviates from the old code in a critical 
fashion (code right below is new):

    private Map<Volume, StoragePool> 
getDefaultMappingOfVolumesAndStoragePoolForMigration(VirtualMachineProfile 
profile, Host targetHost) {
        Map<Volume, StoragePool> volumeToPoolObjectMap = new HashMap<Volume, 
StoragePool>();
        List<VolumeVO> allVolumes = 
_volsDao.findUsableVolumesForInstance(profile.getId());
        for (VolumeVO volume : allVolumes) {
            StoragePoolVO currentPool = 
_storagePoolDao.findById(volume.getPoolId());
            if (ScopeType.HOST.equals(currentPool.getScope())) {
                createVolumeToStoragePoolMappingIfNeeded(profile, targetHost, 
volumeToPoolObjectMap, volume, currentPool);
            } else {
                volumeToPoolObjectMap.put(volume, currentPool);
            }
        }
        return volumeToPoolObjectMap;
    }

What happens in the new code (above) is if the user didn’t pass in a storage 
pool to migrate the virtual disk to (but the VM is being migrated to a new 
cluster), this code just assigns the virtual disk to its current storage pool 
(which is not going to be visible to any of the hosts in the new compute 
cluster).

In the old code (I’m looking at 4.11.3 here), you could look around line 2337 
for the following code (in the 
VirtualMachineManagerImpl.getPoolListForVolumesForMigration method):

                    // Find a suitable pool for the volume. Call the storage 
pool allocator to find the list of pools.

                    final DiskProfile diskProfile = new DiskProfile(volume, 
diskOffering, profile.getHypervisorType());
                    final DataCenterDeployment plan = new 
DataCenterDeployment(host.getDataCenterId(), host.getPodId(), 
host.getClusterId(),
                            host.getId(), null, null);

                    final List<StoragePool> poolList = new ArrayList<>();
                    final ExcludeList avoid = new ExcludeList();

                    for (final StoragePoolAllocator allocator : 
_storagePoolAllocators) {
                        final List<StoragePool> poolListFromAllocator = 
allocator.allocateToPool(diskProfile, profile, plan, avoid, 
StoragePoolAllocator.RETURN_UPTO_ALL);

                        if (poolListFromAllocator != null && 
!poolListFromAllocator.isEmpty()) {
                            poolList.addAll(poolListFromAllocator);
                        }
                    }

This old code would find an applicable storage pool in the destination cluster 
(one that can be seen by the hosts in that compute cluster).

I think the main error in the new logic is the assumption that a VM can only be 
migrated to a host in the same computer cluster. For XenServer (perhaps for 
other hypervisor types?), we support cross-cluster VM migration.

The other issue I noticed is that there is no logic in the new code that checks 
for managed-storage use cases. If you look in the 
VirtualMachineManagerImpl.getPoolListForVolumesForMigration method in the old 
code, there is special handling for managed storage. I don’t see this 
reproduced in the new logic.

I sympathize with your point that all tests passed yet this issue was not 
uncovered. Unfortunately, I suspect we have a fairly low % coverage of 
automated tests on CloudStack. If we ever did get to a high % of automated test 
coverage, we might be able to spin up new releases more frequently. As the case 
stands today, however, there are probably many un-tested use cases when it 
comes to our automated suite of tests.

Thanks again!
Mike

On 7/15/18, 4:19 PM, "Rafael Weingärtner" <rafaelweingart...@gmail.com> wrote:

    Mike, are you able to pin-point in the old/replaced code the bit that was
    handling your use case?  I took the most care not to break anything.
    Also, your test case, isn't it in the ACS' integration test suite? In
    theory, all test passed when we merged the PR.
    
    I sure can take a look at it. Can you detail your use case? I mean, the
    high level execution flow. What API methods you do, what you expected to
    happen, and what is happening today.
    
    On Sun, Jul 15, 2018 at 3:25 AM, Tutkowski, Mike <mike.tutkow...@netapp.com>
    wrote:
    
    > It looks like this is the problematic PR:
    >
    > https://github.com/apache/cloudstack/pull/2425/
    >
    > On 7/15/18, 12:20 AM, "Tutkowski, Mike" <mike.tutkow...@netapp.com> wrote:
    >
    >     Hi,
    >
    >     While running managed-storage regression tests tonight, I noticed a
    > problem that is not related to managed storage.
    >
    >     CLOUDSTACK-10240 is a ticket asking that we allow the migration of a
    > virtual disk that’s on local storage to shared storage. In the process of
    > enabling this feature, the VirtualMachineManagerImpl.
    > getPoolListForVolumesForMigration method was re-written in a way that
    > completely breaks at least one use case: Migrating a VM across compute
    > clusters (at least supported in XenServer). If, say, a virtual disk 
resides
    > on shared storage in the source compute cluster, we must be able to copy
    > this virtual disk to shared storage in the destination compute cluster.
    >
    >     As the code is currently written, this is no longer possible. It also
    > seems that the managed-storage logic has been dropped for some reason in
    > the new implementation.
    >
    >     Rafael – It seems that you worked on this feature. Would you be able
    > to look into this and create a PR?
    >
    >     Thanks,
    >     Mike
    >
    >
    >
    
    
    -- 
    Rafael Weingärtner
    

Reply via email to