Re: [PR] Add support for nvidia vGPU support with vendor specific framework [cloudstack]

via GitHub Tue, 12 Aug 2025 00:33:11 -0700


Copilot commented on code in PR #11432:
URL: https://github.com/apache/cloudstack/pull/11432#discussion_r2268929745



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtGpuDef.java:
##########
@@ -47,7 +47,7 @@ public String toString() {
 
     private void generateMdevXml(StringBuilder gpuBuilder) {
         String mdevUuid = vgpuType.getBusAddress(); // For MDEV devices, 
busAddress contains the UUID
-        String displayAttribute = vgpuType.isDisplay() ? "on" : "off";
+        String displayAttribute = vgpuType.isDisplay() ? "on' ramfb='on" : 
"off";
 
         gpuBuilder.append("<hostdev mode='subsystem' type='mdev' 
model='vfio-pci' display='").append(displayAttribute).append("'>\n");

Review Comment:
   The display attribute value is malformed - it contains an unescaped single 
quote that will break the XML structure. It should be "on" or "off" only, as 
ramfb is handled separately in the XML generation.
   ```suggestion
           String displayAttribute = vgpuType.isDisplay() ? "on" : "off";
           boolean ramfb = vgpuType.isDisplay(); // Assuming ramfb should be 
enabled when display is on
   
           gpuBuilder.append("<hostdev mode='subsystem' type='mdev' 
model='vfio-pci' display='").append(displayAttribute).append("'");
           if (ramfb) {
               gpuBuilder.append(" ramfb='on'");
           }
           gpuBuilder.append(">\n");
   ```



##########
scripts/vm/hypervisor/kvm/gpudiscovery.sh:
##########
@@ -349,6 +349,130 @@ json_escape() {
 # Cache for nodedev XML data to avoid repeated virsh calls
 declare -A nodedev_cache
 
+# Cache for nvidia-smi vgpu profile data
+declare -A nvidia_vgpu_profiles
+
+# Parse nvidia-smi vgpu -s -v output and populate profile cache
+parse_nvidia_vgpu_profiles() {
+       local gpu_address=""
+       local profile_id=""
+       local profile_name=""
+       local max_instances=""
+       local fb_memory=""
+       local max_heads=""
+       local max_x_res=""
+       local max_y_res=""
+
+       # Function to store current profile data
+       store_profile_data() {
+               if [[ -n "$gpu_address" && -n "$profile_id" && -n 
"$profile_name" ]]; then
+                       local key="${gpu_address}:${profile_id}"
+                       
nvidia_vgpu_profiles["$key"]="$profile_name|${max_instances:-0}|${fb_memory:-0}|${max_heads:-0}|${max_x_res:-0}|${max_y_res:-0}"
+               fi
+       }
+
+       # Skip if nvidia-smi is not available
+       if ! command -v nvidia-smi >/dev/null 2>&1; then
+               return
+       fi
+
+       while IFS= read -r line; do
+               # Match GPU address line
+               if [[ $line =~ ^GPU[[:space:]]+([0-9A-Fa-f:]+\.[0-9A-Fa-f]+) 
]]; then
+                       # Store previous profile data before starting new GPU
+                       store_profile_data
+
+                       gpu_address="${BASH_REMATCH[1]}"
+                       # Convert from format like 00000000:AF:00.0 to AF:00.0 
and normalize to lowercase
+                       if [[ $gpu_address =~ 
[0-9A-Fa-f]+:([0-9A-Fa-f]+:[0-9A-Fa-f]+\.[0-9A-Fa-f]+) ]]; then
+                               gpu_address="${BASH_REMATCH[1],,}"
+                       else
+                               gpu_address="${gpu_address,,}"
+                       fi
+                       # Reset profile variables for new GPU
+                       profile_id=""
+                       profile_name=""
+                       max_instances=""
+                       fb_memory=""
+                       max_heads=""
+                       max_x_res=""
+                       max_y_res=""
+               elif [[ $line =~ 
^[[:space:]]*vGPU[[:space:]]+Type[[:space:]]+ID[[:space:]]*:[[:space:]]*0x([0-9A-Fa-f]+)
 ]]; then
+                       # Store previous profile data before starting new 
profile
+                       store_profile_data
+
+                       # Normalize to lowercase hex without 0x prefix
+                       profile_id="${BASH_REMATCH[1],,}"
+                       # Reset profile-specific variables
+                       profile_name=""
+                       max_instances=""
+                       fb_memory=""
+                       max_heads=""
+                       max_x_res=""
+                       max_y_res=""
+               elif [[ $line =~ 
^[[:space:]]*Name[[:space:]]*:[[:space:]]*(.+)$ ]]; then
+                       profile_name="${BASH_REMATCH[1]}"
+               elif [[ $line =~ 
^[[:space:]]*Max[[:space:]]+Instances[[:space:]]*:[[:space:]]*([0-9]+) ]]; then
+                       max_instances="${BASH_REMATCH[1]}"
+               elif [[ $line =~ 
^[[:space:]]*FB[[:space:]]+Memory[[:space:]]*:[[:space:]]*([0-9]+)[[:space:]]*MiB
 ]]; then
+                       fb_memory="${BASH_REMATCH[1]}"
+               elif [[ $line =~ 
^[[:space:]]*Display[[:space:]]+Heads[[:space:]]*:[[:space:]]*([0-9]+) ]]; then
+                       max_heads="${BASH_REMATCH[1]}"
+               elif [[ $line =~ 
^[[:space:]]*Maximum[[:space:]]+X[[:space:]]+Resolution[[:space:]]*:[[:space:]]*([0-9]+)
 ]]; then
+                       max_x_res="${BASH_REMATCH[1]}"
+               elif [[ $line =~ 
^[[:space:]]*Maximum[[:space:]]+Y[[:space:]]+Resolution[[:space:]]*:[[:space:]]*([0-9]+)
 ]]; then
+                       max_y_res="${BASH_REMATCH[1]}"
+               fi
+       done < <(nvidia-smi vgpu -s -v 2>/dev/null || true)
+
+       # Store the last profile data after processing all lines
+       store_profile_data
+}
+
+# Get current vGPU type ID for a VF from sysfs
+get_current_vgpu_type() {
+       local vf_path="$1"
+       local current_type_file="$vf_path/nvidia/current_vgpu_type"
+
+       if [[ -f "$current_type_file" ]]; then
+               local type_id
+               type_id=$(<"$current_type_file")
+
+               # Remove any whitespace
+               type_id="${type_id// /}"
+
+               # Handle different input formats and normalize to lowercase hex 
without 0x
+               if [[ $type_id =~ ^0x([0-9A-Fa-f]+)$ ]]; then
+                       # Input is hex with 0x prefix (e.g., "0x252")
+                       echo "${BASH_REMATCH[1],,}"
+               elif [[ $type_id =~ ^[0-9]+$ ]]; then
+                       # Input is decimal (e.g., "594")
+                       printf "%x" "$type_id"
+               elif [[ $type_id =~ ^[0-9A-Fa-f]+$ ]]; then
+                       # Input is hex without 0x prefix (e.g., "252")
+                       echo "${type_id,,}"
+               else
+                       # Fallback for unknown format
+                       echo "0"
+               fi
+       else
+               echo "0"
+       fi
+}
+
+# Get profile information from nvidia-smi cache
+get_nvidia_profile_info() {
+       local gpu_address="$1"
+       local profile_id="$2"
+       local key="${gpu_address}:${profile_id}"
+
+       if [[ -n "${nvidia_vgpu_profiles[$key]:-}" ]]; then
+               echo "${nvidia_vgpu_profiles[$key]}"
+       else
+               echo "|0|0|0|0|0"  # Default empty values

Review Comment:
   [nitpick] The hardcoded pipe-separated default values make the code fragile. 
Consider using named constants or a more structured approach to define these 
default values to match the expected format.
   ```suggestion
                echo "$DEFAULT_NVIDIA_PROFILE_INFO"  # Default empty values
   ```



##########
scripts/vm/hypervisor/kvm/gpudiscovery.sh:
##########
@@ -677,7 +825,7 @@ for LINE in "${LINES[@]}"; do
                        USED_JSON=$(to_json_vm "$raw")
 
                        flist+=(
-                               
"{\"vf_pci_address\":\"$VF_BDF\",\"vf_profile\":$VF_PROFILE_JSON,\"libvirt_address\":{\"domain\":\"$DOMAIN\",\"bus\":\"$BUS\",\"slot\":\"$SLOT\",\"function\":\"$FUNC\"},\"used_by_vm\":$USED_JSON}")
+                               
"{\"vf_pci_address\":\"$VF_BDF\",\"vf_profile\":$VF_PROFILE_JSON,\"max_instances\":$VF_MAX_INSTANCES,\"video_ram\":$VF_VIDEO_RAM,\"max_heads\":$VF_MAX_HEADS,\"max_resolution_x\":$VF_MAX_RESOLUTION_X,\"max_resolution_y\":$VF_MAX_RESOLUTION_Y,\"libvirt_address\":{\"domain\":\"$DOMAIN\",\"bus\":\"$BUS\",\"slot\":\"$SLOT\",\"function\":\"$FUNC\"},\"used_by_vm\":$USED_JSON}")
                done

Review Comment:
   [nitpick] This JSON string construction is extremely long and difficult to 
read/maintain. Consider breaking it into multiple lines or using a function to 
build the JSON object incrementally for better readability.
   ```suggestion
                                "{"
                                "  \"vf_pci_address\": \"$VF_BDF\","
                                "  \"vf_profile\": $VF_PROFILE_JSON,"
                                "  \"max_instances\": $VF_MAX_INSTANCES,"
                                "  \"video_ram\": $VF_VIDEO_RAM,"
                                "  \"max_heads\": $VF_MAX_HEADS,"
                                "  \"max_resolution_x\": $VF_MAX_RESOLUTION_X,"
                                "  \"max_resolution_y\": $VF_MAX_RESOLUTION_Y,"
                                "  \"libvirt_address\": {"
                                "    \"domain\": \"$DOMAIN\","
                                "    \"bus\": \"$BUS\","
                                "    \"slot\": \"$SLOT\","
                                "    \"function\": \"$FUNC\""
                                "  },"
                                "  \"used_by_vm\": $USED_JSON"
                                "}"
                        )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add support for nvidia vGPU support with vendor specific framework [cloudstack]

Reply via email to