Thanks for the review! On Wed, Oct 30, 2024 at 11:13:36AM GMT, Dominik Csapak wrote: > Hi, > > thanks for tackling this! > > a few comments inline > > On 10/28/24 12:31, Christoph Heiss wrote: > > [..] > > And FWIW, these properties could also be retrieved without going through > > nvidia-smi using the NVML API directly [0], the same API nvidia-smi uses > > anyway under the hood. > > > > But that would require either using something like e.g. DynaLoader in > > perl [1] or calling it from Rust using e.g. the nvml-wrapper-sys [2] and > > wrapping it using perlmod. > > > > Both ways would be a bit involved of course, but also a lot more > > future-proof than parsing the human-readable output from `nvidia-smi`. > > If preferred I'd be happy to re-write it in some way or another. > > another alternative could be to wrap the few necessary calls in a small > c binary that we can call in place of nvidia-smi. > This tool could then output more structured (e.g. json) output, > though not sure if it's worth that,
Sounds like a good idea, TBH. Although that should then be probably be packaged separately from libpve-common-perl, so there is still that "overhead". > since I hope that the nvidia-smi output is relatively stable I'd hope too, but I (personally, at least) wouldn't want to count on it. It is still a proprietary tool after all, so really could be changed any way seen fit w/o notice. > > (we can still change this method should parsing nvidia-smi output > turn out to be too brittle) Using it as a stop-gap measure basically would work for me, so that this isn't blocked unnecessarily long and gives us time to properly implement something like this. > > > [..] > > diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm > > index 0bde6d7..fc6282d 100644 > > --- a/src/PVE/SysFSTools.pm > > +++ b/src/PVE/SysFSTools.pm > > @@ -4,8 +4,10 @@ use strict; > > [..] > > + > > + my $command = ['nvidia-smi', 'vgpu', '--creatable', '--verbose']; > > IMO we should not use 'creatable' here but 'available' (or however it's named) FWIW, for reference, would be `--supported`. > AFAIR the 'creatable' are only the ones currently allowed to be created, > not the ones the cards generally support. > > This could become a problem e.g. when a vm is started with some profile, > then 'creatable' will only return models compatible to that, and even if > the vm is stopped after, since we cached the output, will not show the > description of the others) Oh I see, yeah, that makes sense then. I'll change it for v2, thanks! > > Of course we could instead change the caching logic to reparse it, whenever > we don't find the particular id in the cached config, but that > seems too much work if we can just have a list of all 'available' > > > + my $parsefn = sub { > > + my ($line) = @_; > > + return if $line =~ m/^GPU/; > > + > > + my @parts = split(':', $line); > > + return if scalar(@parts) != 2; > > + > > + my ($key, $value) = @parts; > > + > > + $key =~ s/^\s+|\s+$//g; # trim whitespace from start and end > > + $value =~ s/\s+//g; # trim all whitespace > > + $value =~ s/,/-/g; # replace any commas with dashes > > + > > + if ($key eq 'vGPU Type ID') { > > + $cur_id = hex($value); > > + } elsif (defined($generic_propmap->{$key}) && $value ne 'N/A') { > > + $configs->{$cur_id}->{$generic_propmap->{$key}} = $value; > > + } > > + > > + # `nvidia-smi` prints these keys/values in a deterministic order, > > + # so the order they appear in can be relied upon. > > + if ($key eq 'Maximum X Resolution') { > > + $configs->{$cur_id}->{'max-resolution'} = $value; > > + } elsif ($key eq 'Maximum Y Resolution') { > > + $configs->{$cur_id}->{'max-resolution'} .= "x$value"; > > + } > > + }; > > it would be great to have some tests here with some example output, that > way we can compare output if it changes, and it makes reasoning about the > parsing logic a bit easier. Sure, will add some tests with the next revision! > > > + > > + eval { > > + PVE::Tools::run_command($command, outfunc => $parsefn); > > + }; > > + > > + if (my $err = $@) { > > + warn "failed to run nvidia-smi: $err\n"; > > + return undef; > > + } > > + > > + for my $k (keys %$configs) { > > + $configs->{$k} = PVE::JSONSchema::print_property_string($configs->{$k}, > > $prop_schema); > > + } > > since the schema above is mostly emptly besides defining which keys there are, > is there a specific reason you don't just use something like (untested!): > > join(",", map { $_ ."=". $configs->{$_} } keys $configs->%*); > > ? > that way you could omit the prop_schema entirely No particular reason, other than of not thinking of it :^) I'll rework it, since the property schema is indeed mostly redundant w.r.t to the property map above it. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel