> DERUMIER, Alexandre <alexandre.derum...@groupe-cyllene.com> hat am 23.10.2024 > 14:59 CEST geschrieben: > > > Hi Fabian, > > thanks for the review ! > > >>-------- Message initial -------- > >>De: Fabian Grünbichler <f.gruenbich...@proxmox.com> > >>À: Proxmox VE development discussion <pve-devel@lists.proxmox.com> > >>Cc: Alexandre Derumier <alexandre.derum...@groupe-cyllene.com> > >>Objet: Re: [pve-devel] [PATCH v2 pve-storage 1/2] add external > >>snasphot support > >>Date: 23/10/2024 12:12:46 > >> > >>some high level comments: > >> > >>I am not sure how much we gain here with the raw support? > > They are really qcow2 overhead, mostly with big disk. > as for good performance, the qcow2 l2-cache-size need to be keeped in > memory (and it's 1MB by disk) > https://events.static.linuxfound.org/sites/events/files/slides/kvm-forum-2017-slides.pdf > > Hopefully, they are improvments with the "new" sub-cluster feature > https://people.igalia.com/berto/files/kvm-forum-2020-slides.pdf > I'm already using it at snapshot create, but I think we should also use > it for main qcow2 volume. > > > But even with that, you can still have performance impact. > So yes, I think they are really usecase for workload when you only need > snapshot time to time (before an upgrade for example), but max > performance with no snaphot exist.
my main point here is - all other storages treat snapshots as "cheap". if you combine raw+qcow2 snapshot overlays, suddenly performance will get worse if you keep a snapshot around for whatever reason.. > >> it's a bit confusing to have a volid ending with raw, with the > >>current volume and all but the first snapshot actually being stored > >>in qcow2 files, with the raw file being the "oldest" snapshot in the > >>chain.. > if it's too confusing, we could use for example an .snap extension. > (as we known that it's qcow2 behind) I haven't thought yet about how to encode the snapshot name into the snapshot file name, but yeah, maybe something like that would be good. or maybe snap-VMID-disk-DISK.qcow2 ? > if possible, I'd be much happier with the snapshot name in the snapshot > file being a 1:1 match, see comments inline > > >>- makes it a lot easier to understand (admin wants to manually remove > >>snapshot "foo", if "foo" was the last snapshot then right now the > >>volume called "foo" is actually the current contents!) > > This part is really difficult, because you can't known in advance the > name of the snapshot you'll take in the future. The only way could be > to create a "current" volume , rename it when you took another > snasphot (I'm not sure it's possible to do it live, > and this could break link chain too) > > Also, I don't known how to manage the main volume, when you take the > first snapshot ? we should rename it too. I mean, if we don't allow .raw files to be snapshotted then this problem doesn't exist ;) > so "vm-disk-100-disk-0.raw|qcow2" , become "vm-disk-100-disk-0- > snap1.(raw|qcow2)" + new "vm-disk-100-disk-0-current.qcow2" ? the volid changing on snapshot seems like it would require a lot of adaption.. OTOH, the volid containing a wrong format might also break things. > I'll try to do test again to see what is possible. > > > > > >>- means we don't have to do lookups via the full snapshot list all > >>the time (e.g., if I want to do a full clone from a snapshot "foo", I > >>can just pass the snap-foo volume to qemu-img) > > ok got it > > > > >>the naming scheme for snapshots needs to be adapted to not clash with > >>regular volumes: > > >>$ pvesm alloc extsnap 131314 vm-131314-disk-foobar.qcow2 2G > >>Formatting '/mnt/pve/ext4/extsnap/images/131314/vm-131314-disk- > >>foobar.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off > >>preallocation=off compression_type=zlib size=2147483648 > >>lazy_refcounts=off refcount_bits=16 > >>successfully created 'extsnap:131314/vm-131314-disk-foobar.qcow2' > >>$ qm rescan --vmid 131314 > >>rescan volumes... > >>can't parse snapname from path at > >>/usr/share/perl5/PVE/Storage/Plugin.pm line 1934. > > any preference for naming scheme ? for lvm external snap, I have used > "vm-131314-disk-0-snap-<foobar>"; see above > >>storage_migrate needs to handle external snapshots, or at least error > >>out. > it should already work. (I have tested move_disk, and live migration + > storage migration). qemu_img_convert offline and qemu block job for > live. but don't all of those lose the snapshots? did you test it with snapshots and rollback afterwards? > >>I haven't tested that part or linked clones or a lot of other > >>advanced related actions at all ;) > > For linked clone, we can't have a base image with snapshots (other than > _base_). so It'll be safe. ack > > Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am > > 30.09.2024 13:31 CEST geschrieben: > > Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe- > > cyllene.com> > > --- > > src/PVE/Storage/DirPlugin.pm | 1 + > > src/PVE/Storage/Plugin.pm | 225 +++++++++++++++++++++++++++++++-- > > -- > > 2 files changed, 201 insertions(+), 25 deletions(-) > > > > diff --git a/src/PVE/Storage/DirPlugin.pm > > b/src/PVE/Storage/DirPlugin.pm > > index 2efa8d5..2bef673 100644 > > --- a/src/PVE/Storage/DirPlugin.pm > > +++ b/src/PVE/Storage/DirPlugin.pm > > @@ -80,6 +80,7 @@ sub options { > > is_mountpoint => { optional => 1 }, > > bwlimit => { optional => 1 }, > > preallocation => { optional => 1 }, > > + snapext => { optional => 1 }, > > }; > > } > > > > diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm > > index 6444390..5e5197a 100644 > > --- a/src/PVE/Storage/Plugin.pm > > +++ b/src/PVE/Storage/Plugin.pm > > @@ -214,6 +214,11 @@ my $defaultData = { > > maximum => 65535, > > optional => 1, > > }, > > + 'snapext' => { > > + type => 'boolean', > > + description => 'enable external snapshot.', > > + optional => 1, > > + }, > > }, > > }; > > > > @@ -695,7 +700,7 @@ sub get_subdir { > > } > > > > sub filesystem_path { > > - my ($class, $scfg, $volname, $snapname) = @_; > > + my ($class, $scfg, $volname, $snapname, $current_snap) = @_; > > see comment below > > > > > my ($vtype, $name, $vmid, undef, undef, $isBase, $format) = > > $class->parse_volname($volname); > > @@ -703,7 +708,7 @@ sub filesystem_path { > > # Note: qcow2/qed has internal snapshot, so path is always > > # the same (with or without snapshot => same file). > > die "can't snapshot this image format\n" > > - if defined($snapname) && $format !~ m/^(qcow2|qed)$/; > > + if defined($snapname) && !$scfg->{snapext} && $format !~ > > m/^(qcow2|qed)$/; > > > > my $dir = $class->get_subdir($scfg, $vtype); > > > > @@ -711,13 +716,22 @@ sub filesystem_path { > > > > my $path = "$dir/$name"; > > > > + if($scfg->{snapext}) { > > + my $snappath = get_snap_path($path, $snapname); > > + if($snapname) { > > + $path = $snappath; > > + } elsif ($current_snap) { > > + $path = $current_snap->{file}; > > + } > > + } > > see commente below > > > return wantarray ? ($path, $vmid, $vtype) : $path; > > } > > > > sub path { > > my ($class, $scfg, $volname, $storeid, $snapname) = @_; > > > > - return $class->filesystem_path($scfg, $volname, $snapname); > > + my $current_snapshot = $class->get_current_snapshot($scfg, > > $storeid, $volname); > > >>this is pretty expensive, and would only be needed if $snapname is > >>not set.. > > The main problem is when you start a vm on a specific snasphot, > we don't send the $snapname param. > > One way could be that qemu-server check the current snapshot from > config when doing specific action like start. if we manage to find a way to make the volid always point at the top overlay, then that wouldn't be needed.. > > + return $class->filesystem_path($scfg, $volname, $snapname, > > $current_snapshot); > > >>couldn't we avoid extending the signature of filesystem_path and just > pass the name of the current snapshot as $snapname? > > I need to redo test, I don't remember why I have splitted them, but you > are right, it should be cleaner. > > > } > > > > sub create_base { > > @@ -1074,13 +1088,31 @@ sub volume_resize { > > sub volume_snapshot { > > my ($class, $scfg, $storeid, $volname, $snap) = @_; > > > > - die "can't snapshot this image format\n" if $volname !~ > > m/\.(qcow2|qed)$/; > > + die "can't snapshot this image format\n" if $volname !~ > > m/\.(raw|qcow2|qed)$/; > > > > - my $path = $class->filesystem_path($scfg, $volname); > > + die "external snapshot need to be enabled to snapshot .raw > > volumes\n" if !$scfg->{snapext}; > > >>this condition is definitely wrong - it means no more snapshotting > >>unless external snapshot support is enabled.. > > oops, sorry. > > > > > - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path]; > > + if($scfg->{snapext}) { > > > > - run_command($cmd); > > + my $path = $class->path($scfg, $volname, $storeid); > > + > > + my $snappath = get_snap_path($path, $snap); > > + my $format = ($class->parse_volname($volname))[6]; > > + > > + my $cmd = ['/usr/bin/qemu-img', 'create', '-b', $path, > > + '-F', $format, '-f', 'qcow2', $snappath]; > > >>see comments on qemu-server, but.. wouldn't it be better if the file > >>with $snap in its name would be the one storing that snapshot's data? > >>i.e., rename the "current" volume to be called ...-$snap... , and > >>then create a new "current" file without a suffix with the renamed > >>volume as backing file? > > I'll try it ! > > > + > > + my $options = "extended_l2=on,"; > > + $options .= preallocation_cmd_option($scfg, 'qcow2'); > > + push @$cmd, '-o', $options; > > + run_command($cmd); > > + > > + } else { > > + > > + my $path = $class->filesystem_path($scfg, $volname); > > + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path]; > > + run_command($cmd); > > + } > > > > return undef; > > } > > @@ -1091,19 +1123,39 @@ sub volume_snapshot { > > sub volume_rollback_is_possible { > > my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_; > > > > + if ($scfg->{snapext}) { > > + #technically, we could manage multibranch, we it need lot more work > > for snapshot delete > > >>would multibranch be easier if there is a simple 1:1 correspondence > >>between snapshots and their filenames? > >> > >>switching to a different part of the "hierarchy" is then just > >>- delete current volume > >>- create new current volume using rollback target as backing file > the rollback/branch switch is not too difficult, maybe 1:1 naming could > help. > > >>I guess deletion does become harder then, since it potentially > >>requires multiple rebases.. > > yes, The biggest difficulty is snapshot delete, as you need to create a > block-stream job, mergin/writing to each branch child, and you need to > do it atomically with a transaction with multiple jobs. > So yes, it's possible, but I wanted to keep it easy for now. sure, this restriction could be lifted in a follow-up! > > + my $path = $class->filesystem_path($scfg, $volname); > > + my $snappath = get_snap_path($path, $snap); > > + > > + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, > > $volname); > > + my $currentpath = $snapshots->{current}->{file}; > > + return 1 if !-e $snappath || $currentpath eq $snappath; > > + > > + die "can't rollback, '$snap' is not most recent snapshot on > > '$volname'\n"; > > + } > > + > > return 1; > > } > > > > sub volume_snapshot_rollback { > > my ($class, $scfg, $storeid, $volname, $snap) = @_; > > > > - die "can't rollback snapshot this image format\n" if $volname !~ > > m/\.(qcow2|qed)$/; > > + die "can't rollback snapshot this image format\n" if $volname !~ > > m/\.(raw|qcow2|qed)$/; > > > > - my $path = $class->filesystem_path($scfg, $volname); > > + die "external snapshot need to be enabled to rollback snapshot > > .raw volumes\n" if $volname =~ m/\.(raw)$/ && !$scfg->{snapext}; > > > > - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path]; > > + my $path = $class->filesystem_path($scfg, $volname); > > > > - run_command($cmd); > > + if ($scfg->{snapext}) { > > + #simply delete the current snapshot and recreate it > > + my $snappath = get_snap_path($path, $snap); > > + unlink($snappath); > > + $class->volume_snapshot($scfg, $storeid, $volname, $snap); > > this *reads* so weird ;) it is right given the current semantics > (current snapshot == live image, snapshot data actually stored in > parent snapshot) > > > + } else { > > + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path]; > > + run_command($cmd); > > + } > > > > return undef; > > } > > @@ -1111,17 +1163,50 @@ sub volume_snapshot_rollback { > > sub volume_snapshot_delete { > > my ($class, $scfg, $storeid, $volname, $snap, $running) = @_; > > > > - die "can't delete snapshot for this image format\n" if $volname > > !~ m/\.(qcow2|qed)$/; > > + die "can't delete snapshot for this image format\n" if $volname > > !~ m/\.(raw|qcow2|qed)$/; > > + > > + die "external snapshot need to be enabled to delete snapshot of > > .raw volumes\n" if !$scfg->{snapext}; > > > > return 1 if $running; > > > > - my $path = $class->filesystem_path($scfg, $volname); > > + my $cmd = ""; > > + if ($scfg->{snapext}) { > > + > > + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, > > $volname); > > + my $snappath = $snapshots->{$snap}->{file}; > > + return if !-e $snappath; #already deleted ? > > + > > + my $parentsnap = $snapshots->{$snap}->{parent}; > > + my $childsnap = $snapshots->{$snap}->{child}; > > + die "error: can't find a parent for this snapshot" if > > !$parentsnap; > > > > - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {}); > > + my $parentpath = $snapshots->{$parentsnap}->{file}; > > + my $parentformat = $snapshots->{$parentsnap}->{'format'} if > > $parentsnap; > > + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap; > > + my $childformat = $snapshots->{$childsnap}->{'format'} if > > $childsnap; > > > > - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path]; > > + print "merge snapshot $snap to $parentsnap\n"; > > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath]; > > + run_command($cmd); > > + > > + #if we delete an intermediate snapshot, we need to link upper > > snapshot to base snapshot > > + if($childpath && -e $childpath) { > > + die "missing parentsnap snapshot to rebase child $childpath\n" > > if !$parentpath; > > + print "link $childsnap to $parentsnap\n"; > > + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath, > > '-F', $parentformat, '-f', $childformat, $childpath]; > > + run_command($cmd); > > + } > > >>wouldn't a regular safe rebase work just as well, instead of commit + > >>unsafe rebase? if there is no parent, passing in "" as "new" backing > >>file should work.. > > I'll test it, but I'm pretty sure this is the correct way. > > > + > > + #delete the snapshot > > + unlink($snappath); > > + } else { > > + my $path = $class->filesystem_path($scfg, $volname); > > > > - run_command($cmd); > > + $class->deactivate_volume($storeid, $scfg, $volname, $snap, {}); > > + > > + $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path]; > > + run_command($cmd); > > + } > > > > return undef; > > } > > @@ -1140,10 +1225,6 @@ sub volume_has_feature { > > my ($class, $scfg, $feature, $storeid, $volname, $snapname, > > $running, $opts) = @_; > > > > my $features = { > > - snapshot => { > > - current => { qcow2 => 1 }, > > - snap => { qcow2 => 1 }, > > - }, > > clone => { > > base => { qcow2 => 1, raw => 1, vmdk => 1 }, > > }, > > @@ -1159,11 +1240,23 @@ sub volume_has_feature { > > base => { qcow2 => 1, raw => 1, vmdk => 1 }, > > current => { qcow2 => 1, raw => 1, vmdk => 1 }, > > }, > > - rename => { > > - current => {qcow2 => 1, raw => 1, vmdk => 1}, > > - }, > > + 'rename' => { > > + current => { qcow2 => 1, raw => 1, vmdk => 1}, > > + } > > }; > > > > + if ($scfg->{snapext}) { > > + $features->{snapshot} = { > > + current => { raw => 1, qcow2 => 1 }, > > + snap => { raw => 1, qcow2 => 1 }, > > + } > > + } else { > > + $features->{snapshot} = { > > + current => { qcow2 => 1 }, > > + snap => { qcow2 => 1 }, > > + }; > > + } > > + > > >>this could just leave $features as it is, and add the "raw" bits: > >> > >>if ($scfg->{snapext}) { > >> $features->{snapshot}->{current}->{raw} = 1; > >> $features->{snapshot}->{snap}->{raw} = 1; > >>} > > ok ! > > if ($feature eq 'clone') { > > if ( > > defined($opts->{valid_target_formats}) > > @@ -1222,7 +1315,9 @@ sub list_images { > > } > > > > if ($vollist) { > > - my $found = grep { $_ eq $volid } @$vollist; > > + my $search_volid = $volid; > > + $search_volid =~ s/-snap-.*\./\./; > > + my $found = grep { $_ eq $search_volid } @$vollist; > > next if !$found; > > } > > > > @@ -1380,7 +1475,53 @@ sub status { > > sub volume_snapshot_info { > > my ($class, $scfg, $storeid, $volname) = @_; > > > > - die "volume_snapshot_info is not implemented for $class"; > > + die "volume_snapshot_info is not implemented for $class" if > > !$scfg->{snapext}; > > + > > + my $path = $class->filesystem_path($scfg, $volname); > > + > > + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, > > $format) = $class->parse_volname($volname); > > + > > + my $basevolname = $volname; > > + $basevolname =~ s/\.(raw|qcow2)$//; > > + > > + my $snapshots = $class->list_images($storeid, $scfg, $vmid); > > + my $info = {}; > > + for my $snap (@$snapshots) { > > + > > + my $volid = $snap->{volid}; > > + next if ($volid !~ m/$basevolname/); > > >>this regex is broken w.r.t. partial matching! > >> > >>e.g., if a VM has both a disk -1.qcow2 and -11.qcow2 and I attempt to > >>snapshot it using external snapshots: > ok ! > > > snapshotting 'drive-scsi0' (extsnap:131314/vm-131314-disk-0.raw) > Formatting '/mnt/pve/ext4/extsnap/images/131314/vm-131314-disk-0-snap- > test2.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=on > preallocation=off compression_type=zlib size=200704 > backing_file=/mnt/pve/ext4/extsnap/images/131314/vm-131314-disk-0-snap- > test.qcow2 backing_fmt=raw lazy_refcounts=off refcount_bits=16 > snapshotting 'drive-scsi1' (extsnap:131314/vm-131314-disk-1.qcow2) > Formatting '/mnt/pve/ext4/extsnap/images/131314/vm-131314-disk-11-snap- > test2.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=on > preallocation=off compression_type=zlib size=2147483648 > backing_file=/mnt/pve/ext4/extsnap/images/131314/vm-131314-disk- > 11.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 > snapshotting 'drive-scsi2' (extsnap:131314/vm-131314-disk-11.qcow2) > qemu-img: /mnt/pve/ext4/extsnap/images/131314/vm-131314-disk-11-snap- > test2.qcow2: Error: Trying to create an image with the same filename as > the backing file > snapshot create failed: starting cleanup > merge snapshot test2 to test > Image committed. > merge snapshot test2 to base > Image committed. > TASK ERROR: command '/usr/bin/qemu-img create -b > /mnt/pve/ext4/extsnap/images/131314/vm-131314-disk-11-snap-test2.qcow2 > -F qcow2 -f qcow2 /mnt/pve/ext4/extsnap/images/131314/vm-131314-disk- > 11-snap-test2.qcow2 -o 'extended_l2=on,preallocation=off'' failed: exit > code 1 > > > + > > + my (undef, $snapvolname) = parse_volume_id($volid); > > + my $snapname = get_snapname_from_path($volid); > > + my $snapfile = $class->filesystem_path($scfg, $snapvolname, > > $snapname); > > + $snapname = 'base' if !$snapname; > > + > > + my $format = $snap->{'format'}; > > + my $parentfile = $snap->{parent}; > > + my $parentname = get_snapname_from_path($parentfile) if > > $parentfile; > > + $parentname = 'base' if !$parentname && $parentfile; > > + > > + $info->{$snapname}->{file} = $snapfile; > > + $info->{$snapname}->{volid} = $volid; > > + $info->{$snapname}->{'format'} = $format; > > + $info->{$snapname}->{parent} = $parentname if $parentname; > > + $info->{$parentname}->{child} = $snapname if $parentname; > > + } > > + > > + my $current = undef; > > + for my $id (keys %$info) { > > + my $snap = $info->{$id}; > > + die "error: snap $id: you can't have multiple current snapshot: > > current:$current\n" if !$snap->{child} && $current; > > + $current = $id if !$snap->{child}; > > + } > > + > > + if ($current) { > > + $info->{current}->{file} = $info->{$current}->{file}; > > + $info->{current}->{'format'} = $info->{$current}->{'format'}; > > + $info->{current}->{parent} = $info->{$current}->{parent}; > > + } > > + > > + return $info; > > } > > > > sub activate_storage { > > @@ -1764,4 +1905,38 @@ sub config_aware_base_mkdir { > > } > > } > > > > +sub get_snap_path { > > + my ($path, $snap) = @_; > > + > > + my $basepath = ""; > > + my $baseformat = ""; > > + if ($path =~ m/^((.*)(vm-(\d+)-disk-(\d+)))(-snap- > > (.*))?\.(raw|qcow2)/) { > > >>this regex is wrong - volumes can have arbitrary names after the - > >>disk- part.. > > ah sorry. do you have some example where it's used ? (maybe for efi or > other specific disk ?) no, any vdisk can have (almost) anything after the -disk- part. you can allocate such volumes using `pvesm alloc` or the API (we just are not very good at keeping those custom suffixes when moving/migrating/.. ;)) > > + $basepath = $1; > > + $baseformat = $8; > > + } > > + my $format = $snap ? 'qcow2' : $baseformat; > > + my $snappath = $snap ? $basepath."-snap-$snap.$format" : undef; > > + > > + return $snappath; > > +} > > + > > +sub get_snapname_from_path { > > + my ($path) = @_; > > + > > + if ($path =~ m/^((.*)(vm-(\d+)-disk-(\d+)))(-snap- > > (.*))?\.(raw|qcow2)/) { > > >>here as well.. and this whole helper is just used twice in > >>volume_snapshot_info, maybe it could be inlined or made private > ok ! > > > > + my $snapname = $7; > > + return $snapname; > > + } > > + die "can't parse snapname from path"; > > +} > > + > > +sub get_current_snapshot { > > + my ($class, $scfg, $storeid, $volname) = @_; > > + #IMPROVE ME: faster way to find current snapshot? (search the > > most recent created snapshot file ? need to works with lvm volume > > too) > > + > > + return if !$scfg->{snapext}; > > + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, > > $volname); > > + return $snapshots->{current}; > > +} > > + > > 1; > > -- > > 2.39.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel