Ok, I see what happened there with the migration to cluster. When I re-did the code I did not have this case. And therefore, in the old code, I was not seeing this use case (convoluted code, lack of documentation, and so on; we all know the story). I will fix it.
Regarding the managed storage issue, can you describe the “special handling” you need? Are you talking about this: > if (destPool.getId() == currentPool.getId()) { > volumeToPoolObjectMap.put(volume, currentPool); > } else { > throw new CloudRuntimeException("Currently, a volume on managed > storage can only be 'migrated' to itself."); > } > That is a simple validation, right? A validation to throw an exception if the user tries to migrate the volume to some other storage pool. Is that it? If that is the case, the default method “getDefaultMappingOfVolumesAndStoragePoolForMigration” already takes care of this. Meaning, that it will not try to move the volume to other storage pool. On the other hand, we need to add a validation in the “createMappingVolumeAndStoragePoolEnteredByUser” method then. I will wait for your feedback before starting to code. Thanks for spotting this issue. On Sun, Jul 15, 2018 at 9:11 PM, Tutkowski, Mike <mike.tutkow...@netapp.com> wrote: > Hi Rafael, > > Thanks for your time on this. > > Here is an example where the new code deviates from the old code in a > critical fashion (code right below is new): > > private Map<Volume, StoragePool> getDefaultMappingOfVolumesAndS > toragePoolForMigration(VirtualMachineProfile profile, Host targetHost) { > Map<Volume, StoragePool> volumeToPoolObjectMap = new > HashMap<Volume, StoragePool>(); > List<VolumeVO> allVolumes = _volsDao.findUsableVolumesForInstance( > profile.getId()); > for (VolumeVO volume : allVolumes) { > StoragePoolVO currentPool = _storagePoolDao.findById( > volume.getPoolId()); > if (ScopeType.HOST.equals(currentPool.getScope())) { > createVolumeToStoragePoolMappingIfNeeded(profile, > targetHost, volumeToPoolObjectMap, volume, currentPool); > } else { > volumeToPoolObjectMap.put(volume, currentPool); > } > } > return volumeToPoolObjectMap; > } > > What happens in the new code (above) is if the user didn’t pass in a > storage pool to migrate the virtual disk to (but the VM is being migrated > to a new cluster), this code just assigns the virtual disk to its current > storage pool (which is not going to be visible to any of the hosts in the > new compute cluster). > > In the old code (I’m looking at 4.11.3 here), you could look around line > 2337 for the following code (in the VirtualMachineManagerImpl. > getPoolListForVolumesForMigration method): > > // Find a suitable pool for the volume. Call the > storage pool allocator to find the list of pools. > > final DiskProfile diskProfile = new > DiskProfile(volume, diskOffering, profile.getHypervisorType()); > final DataCenterDeployment plan = new > DataCenterDeployment(host.getDataCenterId(), host.getPodId(), > host.getClusterId(), > host.getId(), null, null); > > final List<StoragePool> poolList = new ArrayList<>(); > final ExcludeList avoid = new ExcludeList(); > > for (final StoragePoolAllocator allocator : > _storagePoolAllocators) { > final List<StoragePool> poolListFromAllocator = > allocator.allocateToPool(diskProfile, profile, plan, avoid, > StoragePoolAllocator.RETURN_UPTO_ALL); > > if (poolListFromAllocator != null && > !poolListFromAllocator.isEmpty()) { > poolList.addAll(poolListFromAllocator); > } > } > > This old code would find an applicable storage pool in the destination > cluster (one that can be seen by the hosts in that compute cluster). > > I think the main error in the new logic is the assumption that a VM can > only be migrated to a host in the same computer cluster. For XenServer > (perhaps for other hypervisor types?), we support cross-cluster VM > migration. > > The other issue I noticed is that there is no logic in the new code that > checks for managed-storage use cases. If you look in the > VirtualMachineManagerImpl.getPoolListForVolumesForMigration method in the > old code, there is special handling for managed storage. I don’t see this > reproduced in the new logic. > > I sympathize with your point that all tests passed yet this issue was not > uncovered. Unfortunately, I suspect we have a fairly low % coverage of > automated tests on CloudStack. If we ever did get to a high % of automated > test coverage, we might be able to spin up new releases more frequently. As > the case stands today, however, there are probably many un-tested use cases > when it comes to our automated suite of tests. > > Thanks again! > Mike > > On 7/15/18, 4:19 PM, "Rafael Weingärtner" <rafaelweingart...@gmail.com> > wrote: > > Mike, are you able to pin-point in the old/replaced code the bit that > was > handling your use case? I took the most care not to break anything. > Also, your test case, isn't it in the ACS' integration test suite? In > theory, all test passed when we merged the PR. > > I sure can take a look at it. Can you detail your use case? I mean, the > high level execution flow. What API methods you do, what you expected > to > happen, and what is happening today. > > On Sun, Jul 15, 2018 at 3:25 AM, Tutkowski, Mike < > mike.tutkow...@netapp.com> > wrote: > > > It looks like this is the problematic PR: > > > > https://github.com/apache/cloudstack/pull/2425/ > > > > On 7/15/18, 12:20 AM, "Tutkowski, Mike" <mike.tutkow...@netapp.com> > wrote: > > > > Hi, > > > > While running managed-storage regression tests tonight, I > noticed a > > problem that is not related to managed storage. > > > > CLOUDSTACK-10240 is a ticket asking that we allow the migration > of a > > virtual disk that’s on local storage to shared storage. In the > process of > > enabling this feature, the VirtualMachineManagerImpl. > > getPoolListForVolumesForMigration method was re-written in a way > that > > completely breaks at least one use case: Migrating a VM across > compute > > clusters (at least supported in XenServer). If, say, a virtual disk > resides > > on shared storage in the source compute cluster, we must be able to > copy > > this virtual disk to shared storage in the destination compute > cluster. > > > > As the code is currently written, this is no longer possible. It > also > > seems that the managed-storage logic has been dropped for some > reason in > > the new implementation. > > > > Rafael – It seems that you worked on this feature. Would you be > able > > to look into this and create a PR? > > > > Thanks, > > Mike > > > > > > > > > -- > Rafael Weingärtner > > > -- Rafael Weingärtner