DaanHoogland commented on code in PR #11143:
URL: https://github.com/apache/cloudstack/pull/11143#discussion_r2189919921


##########
engine/schema/src/main/java/com/cloud/gpu/VGPUTypesVO.java:
##########
@@ -40,19 +40,19 @@ public class VGPUTypesVO implements InternalIdentity {
     private String vgpuType;
 
     @Column(name="video_ram")
-    private long videoRam;
+    private Long videoRam;

Review Comment:
   why should these no longer be basic types?



##########
server/src/main/java/com/cloud/agent/manager/allocator/impl/FirstFitAllocator.java:
##########
@@ -338,7 +342,21 @@ protected List<Host> allocateTo(DeploymentPlan plan, 
ServiceOffering offering, V
             }
 
             // Check if GPU device is required by offering and host has the 
availability
-            if ((offeringDetails   = 
_serviceOfferingDetailsDao.findDetail(serviceOfferingId, 
GPU.Keys.vgpuType.toString())) != null) {
+            if (offering.getVgpuProfileId() != null) {
+                VgpuProfileVO vgpuProfile = 
vgpuProfileDao.findById(offering.getVgpuProfileId());
+                if (vgpuProfile == null) {
+                    logger.debug("Adding host [{}] to avoid set, because this 
host does not have GPU devices available.", host);
+                    avoid.addHost(host.getId());
+                    continue;
+                }
+                int gpuCount = offering.getGpuCount() != null ? 
offering.getGpuCount() : 1;
+
+                if(!_resourceMgr.isGPUDeviceAvailable(host, vmProfile.getId(), 
vgpuProfile, gpuCount)) {
+                    logger.debug("Adding host [{}] to avoid set, because this 
host does not have required GPU devices available.", host);
+                    avoid.addHost(host.getId());
+                    continue;
+                }

Review Comment:
   separate method(s)?



##########
server/src/main/java/com/cloud/server/ManagementServerImpl.java:
##########
@@ -1537,6 +1542,14 @@ public Ternary<Pair<List<? extends Host>, Integer>, 
List<? extends Host>, Map<Ho
             throw new InvalidParameterValueException("Unsupported operation, 
VM uses Local storage, cannot migrate");
         }
 
+        ServiceOffering serviceOffering = vmProfile.getServiceOffering();
+        if (serviceOffering.getVgpuProfileId() != null) {
+            VgpuProfileVO vgpuProfile = 
vgpuProfileDao.findById(serviceOffering.getVgpuProfileId());
+            if (vgpuProfile == null || 
"passthrough".equals(vgpuProfile.getName())) {
+                throw new InvalidParameterValueException("Unsupported 
operation, VM uses host passthrough, cannot migrate");
+            }
+        }

Review Comment:
   `idemDito(..)`



##########
server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java:
##########
@@ -3423,6 +3427,21 @@ public ServiceOffering createServiceOffering(final 
CreateServiceOfferingCmd cmd)
         Integer leaseDuration = cmd.getLeaseDuration();
         VMLeaseManager.ExpiryAction leaseExpiryAction = 
validateAndGetLeaseExpiryAction(leaseDuration, cmd.getLeaseExpiryAction());
 
+        final Long vgpuProfileId = cmd.getVgpuProfileId();
+        Integer gpuCount = cmd.getGpuCount();
+        if (vgpuProfileId != null) {
+            VgpuProfileVO vgpuProfile = vgpuProfileDao.findById(vgpuProfileId);
+            if (vgpuProfile == null) {
+                throw new InvalidParameterValueException("Please specify a 
valid vgpu profile.");
+            }
+            if (gpuCount != null && gpuCount < 1) {
+                throw new InvalidParameterValueException("GPU count must be 
greater than 0.");
+            }
+            if (gpuCount == null) {
+                gpuCount = 1;
+            }
+        }

Review Comment:
   method?



##########
server/src/main/java/com/cloud/api/query/dao/AccountJoinDaoImpl.java:
##########
@@ -220,6 +220,15 @@ public void setResourceLimits(AccountJoinVO account, 
boolean fullView, ResourceL
         response.setMemoryTotal(memoryTotal);
         response.setMemoryAvailable(memoryAvail);
 
+        //get resource limits for gpus
+        long gpuLimit = 
ApiDBUtils.findCorrectResourceLimit(account.getGpuLimit(), account.getId(), 
ResourceType.gpu);
+        String gpuLimitDisplay = (fullView || gpuLimit == -1) ? 
Resource.UNLIMITED : String.valueOf(gpuLimit);
+        long gpuTotal = (account.getGpuTotal() == null) ? 0 : 
account.getGpuTotal();
+        String gpuAvail = (fullView || gpuLimit == -1) ? Resource.UNLIMITED : 
String.valueOf(gpuLimit - gpuTotal);
+        response.setGpuLimit(gpuLimitDisplay);
+        response.setGpuTotal(gpuTotal);
+        response.setGpuAvailable(gpuAvail);

Review Comment:
   ```
   long gpuLimit = getResourceLimitsForGpus(…)
   ```



##########
server/src/main/java/com/cloud/api/query/QueryManagerImpl.java:
##########
@@ -1526,6 +1526,18 @@ private Pair<List<Long>, Integer> 
searchForUserVMIdsAndCount(ListVMsCmd cmd) {
                     userVmSearchBuilder.entity().getId(), 
JoinBuilder.JoinType.INNER);
         }
 
+        if (cmd.getGpuEnabled() != null) {
+            SearchBuilder<ServiceOfferingVO> serviceOfferingSearch;
+            if (cmd.getGpuEnabled()) {
+                serviceOfferingSearch = _srvOfferingDao.createSearchBuilder();
+                serviceOfferingSearch.and("vgpuProfileId", 
serviceOfferingSearch.entity().getVgpuProfileId(), Op.NNULL);
+            } else {
+                serviceOfferingSearch = _srvOfferingDao.createSearchBuilder();
+                serviceOfferingSearch.and("vgpuProfileId", 
serviceOfferingSearch.entity().getVgpuProfileId(), Op.NULL);
+            }
+            userVmSearchBuilder.join("serviceOffering", serviceOfferingSearch, 
serviceOfferingSearch.entity().getId(), 
userVmSearchBuilder.entity().getServiceOfferingId(), 
JoinBuilder.JoinType.INNER);
+        }

Review Comment:
   new method(s)?



##########
engine/schema/src/main/java/com/cloud/gpu/dao/HostGpuGroupsDao.java:
##########
@@ -16,43 +16,48 @@
 // under the License.
 package com.cloud.gpu.dao;
 
-import java.util.List;
-
 import com.cloud.gpu.HostGpuGroupsVO;
 import com.cloud.utils.db.GenericDao;
 
+import java.util.List;
+
 public interface HostGpuGroupsDao extends GenericDao<HostGpuGroupsVO, Long> {
 
     /**
      * Find host device by hostId and groupName
-     * @param hostId the host
+     *

Review Comment:
   javadoc is ignoring this



##########
server/src/main/java/com/cloud/vm/UserVmManagerImpl.java:
##########
@@ -2803,6 +2808,13 @@ protected void verifyVmLimits(UserVmVO vmInstance, 
Map<String, String> details)
         } else if (newMemory > 0 && currentMemory > newMemory){
             
_resourceLimitMgr.decrementVmMemoryResourceCount(owner.getAccountId(), 
vmInstance.isDisplay(), svcOffering, template, currentMemory - newMemory);
         }
+        Long currentGpu = currentServiceOffering.getGpuCount() != null ? 
Long.valueOf(currentServiceOffering.getGpuCount()) : 0L;
+        Long newGpu = svcOffering.getGpuCount() != null ? 
Long.valueOf(svcOffering.getGpuCount()) : 0L;
+        if (newGpu > currentGpu) {
+            
_resourceLimitMgr.incrementVmGpuResourceCount(owner.getAccountId(), 
vmInstance.isDisplay(), svcOffering, template, newGpu - currentGpu);
+        } else if (newGpu > 0 && currentGpu > newGpu){
+            
_resourceLimitMgr.decrementVmGpuResourceCount(owner.getAccountId(), 
vmInstance.isDisplay(), svcOffering, template, currentGpu - newGpu);
+        }

Review Comment:
   method



##########
server/src/main/java/com/cloud/hypervisor/HypervisorGuruBase.java:
##########
@@ -325,7 +329,13 @@ protected VirtualMachineTO 
toVirtualMachineTO(VirtualMachineProfile vmProfile) {
 
         // Set GPU details
         ServiceOfferingDetailsVO offeringDetail = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.vgpuType.toString());
-        if (offeringDetail != null) {
+        if (offering.getVgpuProfileId() != null) {
+            VgpuProfileVO vgpuProfile = 
vgpuProfileDao.findById(offering.getVgpuProfileId());
+            if (vgpuProfile != null) {
+                int gpuCount = offering.getGpuCount() != null ? 
offering.getGpuCount() : 1;
+                to.setGpuDevice(_resourceMgr.getGPUDevice(vm, vgpuProfile, 
gpuCount));
+            }
+        } else if (offeringDetail != null) {
             ServiceOfferingDetailsVO groupName = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.pciDevice.toString());
             to.setGpuDevice(_resourceMgr.getGPUDevice(vm.getHostId(), 
groupName.getValue(), offeringDetail.getValue()));
         }

Review Comment:
   extract()



##########
server/src/main/java/com/cloud/resource/ResourceManagerImpl.java:
##########
@@ -3129,6 +3141,13 @@ protected HostVO createHostVO(final StartupCommand[] 
cmds, final ServerResource
 
         if (newHost) {
             host = _hostDao.persist(host);
+            // Check for GPU devices again because we couldn't persist the GPU 
devices earlier due to missing host ID
+            if (startup instanceof StartupRoutingCommand &&
+                CollectionUtils.isNotEmpty(((StartupRoutingCommand) 
startup).getGpuDevices())) {
+                // Add GPU devices to the host
+                StartupRoutingCommand ssCmd = ((StartupRoutingCommand) 
startup);
+                gpuService.addGpuDevicesToHost(host, ssCmd.getGpuDevices());
+            }
         } else {
             _hostDao.update(host.getId(), host);
         }

Review Comment:
   `persistNewHost()`?



##########
engine/schema/src/main/java/com/cloud/gpu/dao/HostGpuGroupsDao.java:
##########


Review Comment:
   none of the changes in this file are of consequence.



##########
engine/schema/src/main/java/com/cloud/gpu/dao/VGPUTypesDao.java:
##########


Review Comment:
   none of the changes in this file are of consequence.



##########
server/src/main/java/com/cloud/api/query/QueryManagerImpl.java:
##########
@@ -4049,6 +4063,18 @@ private Pair<List<Long>, Integer> 
searchForServiceOfferingIdsAndCount(ListServic
             serviceOfferingSearch.and("state", 
serviceOfferingSearch.entity().getState(), Op.EQ);
         }
 
+        if (vgpuProfileId != null) {
+            serviceOfferingSearch.and("vgpuProfileId", 
serviceOfferingSearch.entity().getVgpuProfileId(), Op.EQ);
+        }
+
+        if (gpuEnabled != null) {
+            if (gpuEnabled) {
+                serviceOfferingSearch.and("gpuEnabledTrue", 
serviceOfferingSearch.entity().getVgpuProfileId(), Op.NNULL);
+            } else {
+                serviceOfferingSearch.and("gpuEnabledTrue", 
serviceOfferingSearch.entity().getVgpuProfileId(), Op.NULL);
+            }
+        }

Review Comment:
   new methods?



##########
server/src/main/java/com/cloud/deploy/DeploymentPlanningManagerImpl.java:
##########
@@ -582,14 +586,30 @@ private boolean canUseLastHost(HostVO host, ExcludeList 
avoids, DeploymentPlan p
             return false;
         }
 
-        ServiceOfferingDetailsVO offeringDetails = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.vgpuType.toString());
-        ServiceOfferingDetailsVO groupName = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.pciDevice.toString());
-        if (offeringDetails != null && 
!_resourceMgr.isGPUDeviceAvailable(host, groupName.getValue(), 
offeringDetails.getValue())) {
-            logger.debug("Cannot deploy VM [{}] in the last host [{}] because 
this host does not have the required GPU devices available. Skipping this and 
trying other available hosts.",
-                    vm, host);
-            return false;
+        if (offering.getVgpuProfileId() != null) {
+            VgpuProfileVO vgpuProfile = 
vgpuProfileDao.findById(offering.getVgpuProfileId());
+            if (vgpuProfile == null) {
+                logger.debug("Cannot deploy VM [{}] in the last host [{}] 
because the vgpu profile is not found. Skipping this and trying other available 
hosts.",
+                        vm, host);
+                return false;
+            }
+            int gpuCount = offering.getGpuCount() != null ? 
offering.getGpuCount() : 1;
+            if (!_resourceMgr.isGPUDeviceAvailable(host, vm.getId(), 
vgpuProfile, gpuCount)) {
+                logger.debug("Cannot deploy VM [{}] in the last host [{}] 
because this host does not have GPU devices. Skipping this and trying other 
available hosts.",
+                        vm, host);
+                return false;
+            }
+        } else {
+            ServiceOfferingDetailsVO offeringDetails = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.vgpuType.toString());
+            ServiceOfferingDetailsVO groupName = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.pciDevice.toString());
+            if (offeringDetails != null && 
!_resourceMgr.isGPUDeviceAvailable(host, groupName.getValue(), 
offeringDetails.getValue())) {
+                logger.debug("Cannot deploy VM [{}] in the last host [{}] 
because this host does not have the required GPU devices available. Skipping 
this and trying other available hosts.",
+                        vm, host);
+                return false;
+            }

Review Comment:
   extra methods?



##########
engine/schema/src/main/java/com/cloud/gpu/dao/HostGpuGroupsDaoImpl.java:
##########


Review Comment:
   none of the changes in this file are of consequence.



##########
server/src/main/java/com/cloud/vm/UserVmManagerImpl.java:
##########
@@ -2788,6 +2788,11 @@ protected void verifyVmLimits(UserVmVO vmInstance, 
Map<String, String> details)
             if (newMemory > currentMemory) {
                 _resourceLimitMgr.checkVmMemoryResourceLimit(owner, 
vmInstance.isDisplay(), svcOffering, template, newMemory - currentMemory);
             }
+            Long currentGpu = currentServiceOffering.getGpuCount() != null ? 
Long.valueOf(currentServiceOffering.getGpuCount()) : 0L;
+            Long newGpu = svcOffering.getGpuCount() != null ? 
Long.valueOf(svcOffering.getGpuCount()) : 0L;
+            if (newGpu > currentGpu) {
+                _resourceLimitMgr.checkVmGpuResourceLimit(owner, 
vmInstance.isDisplay(), svcOffering, template, newGpu - currentGpu);
+            }

Review Comment:
   method



##########
server/src/main/java/com/cloud/resource/ResourceManagerImpl.java:
##########
@@ -3543,10 +3562,19 @@ public HostVO fillRoutingHostVO(final HostVO host, 
final StartupRoutingCommand s
         host.setSpeed(ssCmd.getSpeed());
         host.setHypervisorType(hyType);
         host.setHypervisorVersion(ssCmd.getHypervisorVersion());
-        host.setGpuGroups(ssCmd.getGpuGroupDetails());
+        // The below method needs the host to be persisted in the DB to save 
the GPU devices for the host
+        if (host.getId() > 0) {
+            gpuService.addGpuDevicesToHost(host, ssCmd.getGpuDevices());
+        }
+        if (CollectionUtils.isNotEmpty(ssCmd.getGpuDevices())) {
+            
host.setGpuGroups(gpuService.getGpuGroupDetailsFromGpuDevicesOnHost(host));
+        } else {
+            host.setGpuGroups(ssCmd.getGpuGroupDetails());
+        }

Review Comment:
   methoid()



##########
server/src/main/java/com/cloud/api/query/dao/DomainJoinDaoImpl.java:
##########
@@ -195,6 +195,15 @@ public void setResourceLimits(DomainJoinVO domain, boolean 
fullView, ResourceLim
         response.setMemoryTotal(memoryTotal);
         response.setMemoryAvailable(memoryAvail);
 
+        //get resource limits for gpus
+        long gpuLimit = 
ApiDBUtils.findCorrectResourceLimitForDomain(domain.getGpuLimit(), 
ResourceType.gpu, domain.getId());
+        String gpuLimitDisplay = (fullView || gpuLimit == -1) ? 
Resource.UNLIMITED : String.valueOf(gpuLimit);
+        long gpuTotal = (domain.getGpuTotal() == null) ? 0 : 
domain.getGpuTotal();
+        String gpuAvail = (fullView || gpuLimit == -1) ? Resource.UNLIMITED : 
String.valueOf(gpuLimit - gpuTotal);
+        response.setGpuLimit(gpuLimitDisplay);
+        response.setGpuTotal(gpuTotal);
+        response.setGpuAvailable(gpuAvail);

Review Comment:
   ```
   long gpuLimit = getResourceLimitsForGpus(…)
   ```



##########
server/src/main/java/com/cloud/resource/ResourceManagerImpl.java:
##########
@@ -4239,7 +4317,16 @@ public HashMap<String, HashMap<String, VgpuTypesInfo>> 
getGPUStatistics(final Ho
         } else {
             // now construct the result object
             if (answer instanceof GetGPUStatsAnswer) {
-                return ((GetGPUStatsAnswer)answer).getGroupDetails();
+                GetGPUStatsAnswer gpuStatsAnswer = (GetGPUStatsAnswer) answer;
+                HashMap<String, HashMap<String, VgpuTypesInfo>> groupDetails;
+                gpuService.addGpuDevicesToHost(host, 
gpuStatsAnswer.getGpuDevices());
+                if 
(CollectionUtils.isNotEmpty(gpuStatsAnswer.getGpuDevices())) {
+                    groupDetails = 
gpuService.getGpuGroupDetailsFromGpuDevicesOnHost(host);
+                } else {
+                    groupDetails = gpuStatsAnswer.getGroupDetails();
+                }
+
+                return groupDetails;

Review Comment:
   `return getGroupDetails()`?



##########
server/src/main/java/com/cloud/deploy/DeploymentPlanningManagerImpl.java:
##########
@@ -582,14 +586,30 @@ private boolean canUseLastHost(HostVO host, ExcludeList 
avoids, DeploymentPlan p
             return false;
         }
 
-        ServiceOfferingDetailsVO offeringDetails = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.vgpuType.toString());
-        ServiceOfferingDetailsVO groupName = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.pciDevice.toString());
-        if (offeringDetails != null && 
!_resourceMgr.isGPUDeviceAvailable(host, groupName.getValue(), 
offeringDetails.getValue())) {
-            logger.debug("Cannot deploy VM [{}] in the last host [{}] because 
this host does not have the required GPU devices available. Skipping this and 
trying other available hosts.",
-                    vm, host);
-            return false;
+        if (offering.getVgpuProfileId() != null) {
+            VgpuProfileVO vgpuProfile = 
vgpuProfileDao.findById(offering.getVgpuProfileId());
+            if (vgpuProfile == null) {
+                logger.debug("Cannot deploy VM [{}] in the last host [{}] 
because the vgpu profile is not found. Skipping this and trying other available 
hosts.",
+                        vm, host);
+                return false;
+            }
+            int gpuCount = offering.getGpuCount() != null ? 
offering.getGpuCount() : 1;
+            if (!_resourceMgr.isGPUDeviceAvailable(host, vm.getId(), 
vgpuProfile, gpuCount)) {
+                logger.debug("Cannot deploy VM [{}] in the last host [{}] 
because this host does not have GPU devices. Skipping this and trying other 
available hosts.",
+                        vm, host);
+                return false;
+            }
+        } else {
+            ServiceOfferingDetailsVO offeringDetails = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.vgpuType.toString());
+            ServiceOfferingDetailsVO groupName = 
_serviceOfferingDetailsDao.findDetail(offering.getId(), 
GPU.Keys.pciDevice.toString());
+            if (offeringDetails != null && 
!_resourceMgr.isGPUDeviceAvailable(host, groupName.getValue(), 
offeringDetails.getValue())) {
+                logger.debug("Cannot deploy VM [{}] in the last host [{}] 
because this host does not have the required GPU devices available. Skipping 
this and trying other available hosts.",
+                        vm, host);
+                return false;
+            }
         }
 
+

Review Comment:
   ```suggestion
   ```



##########
server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java:
##########
@@ -1872,19 +1907,27 @@ private void 
updateVmResourceCountForServiceOfferingAndTemplateChange(long accou
                 } else if (newMemory - currentMemory < 0) {
                     decrementResourceCountWithTag(accountId, 
ResourceType.memory, tag, currentMemory - newMemory);
                 }
+
+                if (newGpu - currentGpu > 0) {
+                    incrementResourceCountWithTag(accountId, ResourceType.gpu, 
tag, newGpu - currentGpu);
+                } else if (newGpu - currentGpu < 0) {
+                    decrementResourceCountWithTag(accountId, ResourceType.gpu, 
tag, currentGpu - newGpu);
+                }

Review Comment:
   `adjustResourceCount()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to