Re: [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information
On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote: > Existing information is slightly modified and retained. > > Add information: > * List which logs are usually helpful for troubleshooting > * Explain how to acknowledge listed Ceph crashes and view details > * List common causes of Ceph problems and link to recommendations for a > healthy cluster > * Briefly describe the common problem "OSDs down/crashed" > > Signed-off-by: Alexander Zeidler > --- > pveceph.adoc | 72 ++-- > 1 file changed, 64 insertions(+), 8 deletions(-) > > diff --git a/pveceph.adoc b/pveceph.adoc > index 90bb975..4e1c1e2 100644 > --- a/pveceph.adoc > +++ b/pveceph.adoc > @@ -1150,22 +1150,78 @@ The following Ceph commands can be used to see if the > cluster is healthy > ('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors > ('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands > below will also give you an overview of the current events and actions to > take. > +To stop their execution, press CTRL-C. > > > -# single time output > -pve# ceph -s > -# continuously output status changes (press CTRL+C to stop) > -pve# ceph -w > +# Continuously watch the cluster status > +pve# watch ceph --status > + > +# Print the cluster status once (not being updated) > +# and continuously append lines of status events > +pve# ceph --watch > > > +[[pve_ceph_ts]] > +Troubleshooting > +~~~ > + > +This section includes frequently used troubleshooting information. > +More information can be found on the official Ceph website under > +Troubleshooting > +footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]. > + > +[[pve_ceph_ts_logs]] > +.Relevant Logs on Affected Node > + > +* xref:_disk_health_monitoring[Disk Health Monitoring] For some reason, the "_disk_health_monitoring" anchor above breaks building the docs for me -- "make update" exits with an error, complaining that it can't find the anchor. The one-page docs ("pve-admin-guide.html") seems to build just fine, though. The anchor works there too, so I'm not sure what's going wrong there exactly. > +* __System -> System Log__ (or, for example, > + `journalctl --since "2 days ago"`) > +* IPMI and RAID controller logs > + > +Ceph service crashes can be listed and viewed in detail by running > +`ceph crash ls` and `ceph crash info `. Crashes marked as > +new can be acknowledged by running, for example, > +`ceph crash archive-all`. > + > To get a more detailed view, every Ceph service has a log file under > `/var/log/ceph/`. If more detail is required, the log level can be > adjusted footnote:[Ceph log and debugging > {cephdocs-url}/rados/troubleshooting/log-and-debug/]. > > -You can find more information about troubleshooting > -footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/] > -a Ceph cluster on the official website. > - > +[[pve_ceph_ts_causes]] > +.Common Causes of Ceph Problems > + > +* Network problems like congestion, a faulty switch, a shut down > +interface or a blocking firewall. Check whether all {pve} nodes are > +reliably reachable on the xref:_cluster_network[corosync] network and Would personally prefer "xref:_cluster_network[corosync network]" above, but no hard opinions there. > +on the xref:pve_ceph_install_wizard[configured] Ceph public and > +cluster network. Would also prefer [configured Ceph public and cluster network] as a whole here. > + > +* Disk or connection parts which are: > +** defective > +** not firmly mounted > +** lacking I/O performance under higher load (e.g. when using HDDs, > +consumer hardware or xref:pve_ceph_recommendation_raid[inadvisable] > +RAID controllers) Same here; I would prefer to highlight [inadvisable RAID controllers] as a whole. > + > +* Not fulfilling the xref:pve_ceph_recommendation[recommendations] for > +a healthy Ceph cluster. > + > +[[pve_ceph_ts_problems]] > +.Common Ceph Problems > + :: > + > +OSDs `down`/crashed::: > +A faulty OSD will be reported as `down` and mostly (auto) `out` 10 > +minutes later. Depending on the cause, it can also automatically > +become `up` and `in` again. To try a manual activation via web > +interface, go to __Any node -> Ceph -> OSD__, select the OSD and click > +on **Start**, **In** and **Reload**. When using the shell, run on the > +affected node `ceph-volume lvm activate --all`. > ++ > +To activate a failed OSD, it may be necessary to > +xref:ha_manager_node_maintenance[safely] reboot the respective node And again here: Would personally prefer [safely reboot] in the anchor ref. > +or, as a last resort, to > +xref:pve_ceph_osd_replace[recreate or replace] the OSD. > > ifdef::manvolnum[] > include::pve-copyright.adoc[] Note: The only thing that really stood out to me was the "_disk_health_monitoring" refusing to build on my system; the other comments here are just tiny style suggestions. If you disagree with them, no hard feelin
Re: [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs"
On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote: > Existing information is slightly modified and retained. > > Add information: > * Mention and link to the sections "Troubleshooting" and "Replace OSDs" > * CLI commands (pveceph) must be executed on the affected node > * Check in advance the "Used (%)" of OSDs to avoid blocked I/O > * Check and wait until the OSD can be stopped safely > * Use `pveceph stop` instead of `systemctl stop ceph-osd@.service` > * Explain cleanup option a bit more > > Signed-off-by: Alexander Zeidler > --- > pveceph.adoc | 58 > 1 file changed, 31 insertions(+), 27 deletions(-) > > diff --git a/pveceph.adoc b/pveceph.adoc > index 4e1c1e2..754c401 100644 > --- a/pveceph.adoc > +++ b/pveceph.adoc > @@ -502,33 +502,37 @@ ceph-volume lvm create --filestore --data /dev/sd[X] > --journal /dev/sd[Y] > Destroy OSDs > > > -To remove an OSD via the GUI, first select a {PVE} node in the tree view and > go > -to the **Ceph -> OSD** panel. Then select the OSD to destroy and click the > **OUT** > -button. Once the OSD status has changed from `in` to `out`, click the > **STOP** > -button. Finally, after the status has changed from `up` to `down`, select > -**Destroy** from the `More` drop-down menu. > - > -To remove an OSD via the CLI run the following commands. > - > -[source,bash] > - > -ceph osd out > -systemctl stop ceph-osd@.service > - > - > -NOTE: The first command instructs Ceph not to include the OSD in the data > -distribution. The second command stops the OSD service. Until this time, no > -data is lost. > - > -The following command destroys the OSD. Specify the '-cleanup' option to > -additionally destroy the partition table. > - > -[source,bash] > - > -pveceph osd destroy > - > - > -WARNING: The above command will destroy all data on the disk! > +If you experience problems with an OSD or its disk, try to > +xref:pve_ceph_mon_and_ts[troubleshoot] them first to decide if a > +xref:pve_ceph_osd_replace[replacement] is needed. > + > +To destroy an OSD: > + > +. Either open the web interface and select any {pve} node in the tree > +view, or open a shell on the node where the OSD to be deleted is > +located. > + > +. Go to the __Ceph -> OSD__ panel (`ceph osd df tree`). If the OSD to > +be deleted is still `up` and `in` (non-zero value at `AVAIL`), make > +sure that all OSDs have their `Used (%)` value well below the > +`nearfull_ratio` of default `85%`. In this way you can reduce the risk > +from the upcoming rebalancing, which may cause OSDs to run full and > +thereby blocking I/O on Ceph pools. > + > +. If the deletable OSD is not `out` yet, select the OSD and click on > +**Out** (`ceph osd out `). This will exclude it from data > +distribution and starts a rebalance. > + > +. Click on **Stop**, and if a warning appears, click on **Cancel** and > +try again shortly afterwards. When using the shell, check if it is What kind of warning can appear in this case here, though? Is there something that the user could perhaps miss, if they just proceed to click on Cancel? > +safe to stop by reading the output from `ceph osd ok-to-stop `, > +once true, run `pveceph stop --service osd.` . > + > +. **Attention, this step removes the OSD from Ceph and deletes all Would prefer to keep the "WARNING: " here instead of "Attention, ", personally. > +disk data.** To continue, first click on **More -> Destroy**. Use the > +cleanup option to clean up the partition table and similar, enabling > +an immediate reuse of the disk in {pve}. Finally, click on **Remove** > +(`pveceph osd destroy [--cleanup]`). > > > [[pve_ceph_pools]] ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote: > Signed-off-by: Alexander Zeidler > --- Some high-level feedback (see comments inline and in patches otherwise): - The writing style is IMO quite clear and straightforward, nice work! - In patch 03, the "_disk_health_monitoring" anchor reference seems to break my build for some reason. Does this also happen on your end? The single-page docs ("pve-admin-guide.html") seem to build just fine otherwise. - Regarding implicitly / auto-generated anchors, is it fine to break those in general or not? See my other comments inline here. - There are a few tiny style things I personally would correct, but if you disagree with them, feel free to leave them as they are. All in all this seems pretty solid; the stuff regarding the anchors needs to be clarified first (whether it's okay to break auto-generated ones & the one anchor that makes my build fail). Otherwise, pretty good! > pveceph.adoc | 8 > 1 file changed, 8 insertions(+) > > diff --git a/pveceph.adoc b/pveceph.adoc > index da39e7f..93c2f8d 100644 > --- a/pveceph.adoc > +++ b/pveceph.adoc > @@ -82,6 +82,7 @@ and vocabulary > footnote:[Ceph glossary {cephdocs-url}/glossary]. > > > +[[pve_ceph_recommendation]] > Recommendations for a Healthy Ceph Cluster > -- AsciiDoc automatically generated an anchor for the heading above already, and it's "_recommendations_for_a_healthy_ceph_cluster" apparently. So, there's no need to provide one here explicitly, since it already exists; it also might break old links that refer to the documentation. Though, perhaps in a separate series, you could look for all implicitly defined anchors and set them explicitly..? Not sure if that's something we want, though. > > @@ -95,6 +96,7 @@ NOTE: The recommendations below should be seen as a rough > guidance for choosing > hardware. Therefore, it is still essential to adapt it to your specific > needs. > You should test your setup and monitor health and performance continuously. > > +[[pve_ceph_recommendation_cpu]] > .CPU > Ceph services can be classified into two categories: > > @@ -122,6 +124,7 @@ IOPS load over 100'000 with sub millisecond latency, each > OSD can use multiple > CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is > likely for very high performance disks. > > +[[pve_ceph_recommendation_memory]] > .Memory > Especially in a hyper-converged setup, the memory consumption needs to be > carefully planned out and monitored. In addition to the predicted memory > usage > @@ -137,6 +140,7 @@ normal operation, but rather leave some headroom to cope > with outages. > The OSD service itself will use additional memory. The Ceph BlueStore > backend of > the daemon requires by default **3-5 GiB of memory** (adjustable). > > +[[pve_ceph_recommendation_network]] > .Network > We recommend a network bandwidth of at least 10 Gbps, or more, to be used > exclusively for Ceph traffic. A meshed network setup > @@ -172,6 +176,7 @@ high-performance setups: > * one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync >cluster communication. > > +[[pve_ceph_recommendation_disk]] > .Disks > When planning the size of your Ceph cluster, it is important to take the > recovery time into consideration. Especially with small clusters, recovery > @@ -197,6 +202,7 @@ You also need to balance OSD count and single OSD > capacity. More capacity > allows you to increase storage density, but it also means that a single OSD > failure forces Ceph to recover more data at once. > > +[[pve_ceph_recommendation_raid]] > .Avoid RAID > As Ceph handles data object redundancy and multiple parallel writes to disks > (OSDs) on its own, using a RAID controller normally doesn’t improve > @@ -1018,6 +1024,7 @@ to act as standbys. > Ceph maintenance > > > +[[pve_ceph_osd_replace]] > Replace OSDs > This one here is also implicitly defined already, unfortunately. > > @@ -1131,6 +1138,7 @@ ceph osd unset noout > You can now start up the guests. Highly available guests will change their > state > to 'started' when they power on. > > +[[pve_ceph_mon_and_ts]] > Ceph Monitoring and Troubleshooting > --- > So is this one. Actually, now I do wonder: I think it's better to define them in the AsciiDoc code directly, but how would we do that with existing anchors? Just use the automatically generated anchor name? Or are we fine with breaking links? Would be nice if someone could chime in here. (Personally, I think it's fine to break these things, but I stand corrected if that's a no-go.) ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager v2] proxy: ui: vm console: autodetect novnc or xtermjs
Am 25.11.24 um 11:04 schrieb Aaron Lauterer: > Some users configure their VMs to use serial as their display. The big > benefit is that in combination with the xtermjs remote console, copy & > paste works a lot better than via novnc. > > Unfortunately, the main console panel defaulted to novnc, not matter > what the guest had configured. One always had to open the console of the > VM in a separate window to make use of xtermjs. > > This patch changes the behavior and lets it autodetect the guest > configuration to either use xtermjs or novnc. > > If we previously selected the console submenu in one VM and then > switched to another VM, the console submenu is the preselect item for > the VM we switched to. But at this point, the default would be used > (novnc), making it an unpleasant experience. If we would use the same > mechanism as for the top right console button - `me.mon()` - it would > work, but only if we (re)select the console submenu after the first API > call to `nodes/{nola}/qemu/{vmid}/status` finished. On the initial > load, if the console is the active submenu, it would still default to > novnc. > > While we probably could have implemented in just in the UI, for example > by waiting until the first call to `status` is done, this would > potentially introduce "laggyness" when opening the console. Wondering why you would need vmstatus() for this? Isn't all the information already present in the VM configuration? > > Another option is to handle it in the backend. The backend can then > check the VM config and override the novnc/xtermjs setting. The result > is, that even when switching VMs in the web UI with the console submenu > selected, it will load xtermjs for the VMs that use it. > > We only do it if the HTTP call has the new 'autodetect' parameter > enabled. Additionally we introduce a permission check for 'VM.Console' > at this level and only adjust the used console if the user does have the > correct permissions. Otherwise we leak the existence of the VM to > unauthorized users if it has 'serial' configured due to the change in > the returned console (noVNC/xtermjs). > > The actual check if the user is allowed to access the console is > happening once the loaded console implementation establishes a > connection to the console API endpoint of the guest. But at this point, > either the noVNC or xtermjs console is already sent to the user's > browser. > > Potential further improvements could be to also check the console > settings in datacenter.cfg and consider it. As that isn't checked in the > inline console panel for CTs and VMs at all, with out without this > patch. > The setting does have an impact on the buttons that open the console in > a new window. > > Signed-off-by: Aaron Lauterer > --- > Another thing that I noticed is that the property we use to decide if we > enable xtermjs for VMs in the top right console button, and for now also > in this patch, only checks if the VM has a serial device configured. > PVE::QemuServer::vmstatus() calls `conf_has_serial()`. > > A better approach would be to have a vmstatus property that actually > tells us if the VM has serial set as display option. As the display > could be set to something else, even if a serial device exists. There > are other use-cases for a serial device in the VM, like passing through > an actual serial port. > > But I didn't want to open that can of worms just yet ;) > Again, can't this be done just via the configuration? > diff --git a/PVE/Service/pveproxy.pm b/PVE/Service/pveproxy.pm > index ac108545..4cd281c7 100755 > --- a/PVE/Service/pveproxy.pm > +++ b/PVE/Service/pveproxy.pm > @@ -228,6 +228,22 @@ sub get_index { > my $novnc = defined($args->{console}) && $args->{novnc}; > my $xtermjs = defined($args->{console}) && $args->{xtermjs}; > > +my $rpcenv = PVE::RPCEnvironment::get(); > +if ( > + defined($args->{console}) > + && $args->{console} eq 'kvm' > + && $args->{autodetect} The name 'autodetect' is too generic, maybe 'console-autodetect'? To me, it really feels like the caller should just directly pass in either novnc or xtermjs as the argument depending on what it actually wants. pveproxy calling into vmstatus() feels weird. > + && $username > + && $rpcenv->check_full($username, "/vms/$args->{vmid}", ["VM.Console"], > 1, 1) > +) { > + my $vmid = $args->{vmid}; > + my $vmstatus = PVE::QemuServer::vmstatus($vmid); Missing "use" statement for PVE::QemuServer. > + if (defined($vmstatus->{$vmid}) && $vmstatus->{$vmid}->{serial}) { > + $novnc = 0; > + $xtermjs = 1; > + } > +} > + > my $langfile = -f "$basedirs->{i18n}/pve-lang-$lang.js" ? 1 : 0; > > my $version = PVE::pvecfg::version(); ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server v3 2/6] fix #4225: qemuserver: introduce sub eject_nonrequired_isos
Fiona Ebner writes: > Am 31.01.25 um 14:58 schrieb Daniel Herzig: >> Fiona Ebner writes: >> >>> Am 31.01.25 um 10:36 schrieb Fiona Ebner: Am 30.01.25 um 12:31 schrieb Daniel Herzig: > + > +$drive->{essential} = 1 if !defined($drive->{essential}); This should rather be done when parsing the drive. >>> >>> Or I guess to avoid (potentially) writing it out for every drive, it >>> should only be a local variable here and not set in the drive hash. >> >> Thanks, I'll double check on this for v4. But I'd assume the hash to be >> scoped to >> `config_to_command` here, or am I missing something? >> > > But you don't need the modified version in config_to_command() later, > or? And if yes, that can just check the same way again, i.e. using > default value if not set. If we want to explicitly set the value, that > should happen during parsing the drive string. Most of the time it's > surprising to implicitly modify a passed-in value. Better to either > avoid that or if really necessary, explicitly mention it in the function > description. No, I do not need the modified version of the VM-config later. What I'm trying to achieve is a more 'forgiving' behaviour in the case of accidently server-side-deleted iso file/unavailable server (for whatever reason) attached to a VM. So this is actually aiming at `qm start`, which implicitly calls `config_to_command` -- without modifying the existing VM config at all. If the parameter 'essential' is set to '0', `config_to_command` would -- in case of file unavailability of the iso file -- generate a kvm startup command that contains "-drive 'if=none,media=cdrom,[...]" instead of "-drive 'file=$SOME_PATH_TO_ISO,[..]' when we at this point already know that $SOME_PATH_TO_ISO is unavailable/non-existent. The VM-config itself is not changed, as an eg nfs-server might come back at a point in time later and the user might want to do something with the iso stored there. If the parameter 'essential' is unset, or set to '1', the die would happen before `qm start` leads to an invocation of kvm, as it cannot be expected to lead to a successful action, if $SOME_PATH_TO_ISO is not reachable. This would be the exact behaviour as we have it now, just not letting kvm run into this situation, but detect and exit earlier, while `config_to_commands` iterates over all volumes via `foreach_volume` anyway before producing the 'final' kvm command. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH i18n] de: update translation
Am 28.11.24 um 10:12 schrieb Shannon Sterz: > Signed-off-by: Shannon Sterz applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server v3 2/6] fix #4225: qemuserver: introduce sub eject_nonrequired_isos
Am 31.01.25 um 14:58 schrieb Daniel Herzig: > Fiona Ebner writes: > >> Am 31.01.25 um 10:36 schrieb Fiona Ebner: >>> Am 30.01.25 um 12:31 schrieb Daniel Herzig: + +$drive->{essential} = 1 if !defined($drive->{essential}); >>> >>> This should rather be done when parsing the drive. >> >> Or I guess to avoid (potentially) writing it out for every drive, it >> should only be a local variable here and not set in the drive hash. > > Thanks, I'll double check on this for v4. But I'd assume the hash to be > scoped to > `config_to_command` here, or am I missing something? > But you don't need the modified version in config_to_command() later, or? And if yes, that can just check the same way again, i.e. using default value if not set. If we want to explicitly set the value, that should happen during parsing the drive string. Most of the time it's surprising to implicitly modify a passed-in value. Better to either avoid that or if really necessary, explicitly mention it in the function description. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path
On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote: > Am 09.01.25 um 15:48 schrieb Max Carrara: > > +=head3 path_join(@paths) > > + > > +Joins multiple paths together. All kinds of paths are supported. > > + > > +Does not perform any C>>. > > + > > +my $joined = path_join("foo", "bar/baz", "qux.txt"); > > +# foo/bar/baz/qux.txt > > + > > +my $joined = path_join("/", "etc/pve/", "storage.cfg"); > > +# /etc/pve/storage.cfg > > + > > +Similar to C>>, should any of > > the > > +C<@paths> be an absolute path, it I all preceding paths. > > Seems like the docs are somehow broken here, looking at it with perldoc, > it just states 'Similar to "path_push()"' and the rest of the sentence > is missing. My bad; made a typo in the docstring for path_components() (just above this function). Thanks for spotting this! > > Why this kind of behavior with absolute paths? Seems surprising to me. > Wouldn't failing the call be better? I decided to stick with Rust's behaviour [pathbuf] here and in most other places, simply in order to keep the behaviour consistent. I'm not *really* a fan of it either, but I think it's better than failing the call -- IMO having to wrap path_join / path_push into an eval-block would be quite awkward for what's a rather "small" operation :s Perhaps I should provide some additional context here as well so as to make my "design philosophy" more clear: The reason why I modeled PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because: 1. Most (newer) developers will be much more likely to be familiar with Rust instead of Perl, so if they should see this library, they'll have a much easier time using it (or at least that's what I *hope* would happen). Sort of as in: "Oh, this is just like the path stuff in Rust, except for a few Perl-esque exceptions!" 2. I wanted the library to be essentially exception-free, except where obvious invariants aren't upheld by the caller (e.g. passing undef). Unless you match the error message with a regex, there's no structurally typed way to differ between errors in Perl, so I'd like the "types" of errors to be mostly the same wherever possible. 3. I didn't want to reinvent the wheel and instead stick to what other popular libraries do as much as possible, while also avoiding any additional abstractions (e.g. introducing an object for paths). With all that being said, since that behaviour is surprising to you, I have no quarrels changing it! :P After all I'd like to have something that doesn't cause any friction when using it. [pathbuf]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push > > > + > > +my $joined = path_join("foo/bar", "/etc", "resolv.conf"); > > +# /etc/resolv.conf > > + > > +my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts"); > > +# /etc/hosts > > + > > +Throws an exception if any of the passed C<@paths> is C. > > + > > +=cut > > + > > +sub path_join : prototype(@) { > > +my (@paths) = @_; > > + > > +if (!scalar(@paths)) { > > + return ''; > > +} > > + > > +croak "one of the provided paths is undef" if any { !defined($_) } > > @paths; > > + > > I think the rest could be written more efficiently like (untested): > > my $resulting_path = shift @paths; > for my $path (@paths) { > if ($path =~ m#^/#) { > $resulting_path = $path; > } else { > $resulting_path = path_push($resulting_path, $path); > } > } Note that I'm doing a reverse iteration below; I think doing it either way should be fine, because I doubt we'll ever need to join that many path components where it's actually going to make a difference ;P > > > +# Find the last occurrence of a root directory and start conjoining the > > +# components from there onwards > > +my $index = scalar(@paths) - 1; > > +while ($index > 0) { > > + last if $paths[$index] =~ m#^/#; > > + $index--; > > +} > > + > > +@paths = @paths[$index .. (scalar(@paths) - 1)]; > > + > > +my $resulting_path = shift @paths; > > + > > +for my $path (@paths) { > > + $resulting_path = path_push($resulting_path, $path); > > +} > > + > > +return $resulting_path; > > +} > > + > > +=head3 path_normalize($path) > > + > > +Wrapper for L>. Performs a logical cleanup of the > > given > > +C<$path>. > > + > > +This removes unnecessary components of a path that can be safely > > +removed, such as C<.>, trailing C or repeated occurrences of C. > > + > > +For example, C and C will both become > > +C. > > + > > +B This will I remove components referencing the parent > > directory, > > +i.e. C<..>. For example, C and C will > > therefore > > +remain as they are. However, the parent directory of C is C, so > > +C will be normalized to C. > > + > > +Throws an exception if C<$path> is C or the call to C > > failed. > > + > > +=cut > > + > > +sub path_normalize : prototype($) { > > +my ($path) = @_; > > + > > +croak "\$path
[pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
Signed-off-by: Alexander Zeidler --- pveceph.adoc | 8 1 file changed, 8 insertions(+) diff --git a/pveceph.adoc b/pveceph.adoc index da39e7f..93c2f8d 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -82,6 +82,7 @@ and vocabulary footnote:[Ceph glossary {cephdocs-url}/glossary]. +[[pve_ceph_recommendation]] Recommendations for a Healthy Ceph Cluster -- @@ -95,6 +96,7 @@ NOTE: The recommendations below should be seen as a rough guidance for choosing hardware. Therefore, it is still essential to adapt it to your specific needs. You should test your setup and monitor health and performance continuously. +[[pve_ceph_recommendation_cpu]] .CPU Ceph services can be classified into two categories: @@ -122,6 +124,7 @@ IOPS load over 100'000 with sub millisecond latency, each OSD can use multiple CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is likely for very high performance disks. +[[pve_ceph_recommendation_memory]] .Memory Especially in a hyper-converged setup, the memory consumption needs to be carefully planned out and monitored. In addition to the predicted memory usage @@ -137,6 +140,7 @@ normal operation, but rather leave some headroom to cope with outages. The OSD service itself will use additional memory. The Ceph BlueStore backend of the daemon requires by default **3-5 GiB of memory** (adjustable). +[[pve_ceph_recommendation_network]] .Network We recommend a network bandwidth of at least 10 Gbps, or more, to be used exclusively for Ceph traffic. A meshed network setup @@ -172,6 +176,7 @@ high-performance setups: * one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync cluster communication. +[[pve_ceph_recommendation_disk]] .Disks When planning the size of your Ceph cluster, it is important to take the recovery time into consideration. Especially with small clusters, recovery @@ -197,6 +202,7 @@ You also need to balance OSD count and single OSD capacity. More capacity allows you to increase storage density, but it also means that a single OSD failure forces Ceph to recover more data at once. +[[pve_ceph_recommendation_raid]] .Avoid RAID As Ceph handles data object redundancy and multiple parallel writes to disks (OSDs) on its own, using a RAID controller normally doesn’t improve @@ -1018,6 +1024,7 @@ to act as standbys. Ceph maintenance +[[pve_ceph_osd_replace]] Replace OSDs @@ -1131,6 +1138,7 @@ ceph osd unset noout You can now start up the guests. Highly available guests will change their state to 'started' when they power on. +[[pve_ceph_mon_and_ts]] Ceph Monitoring and Troubleshooting --- -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information
Existing information is slightly modified and retained. Add information: * List which logs are usually helpful for troubleshooting * Explain how to acknowledge listed Ceph crashes and view details * List common causes of Ceph problems and link to recommendations for a healthy cluster * Briefly describe the common problem "OSDs down/crashed" Signed-off-by: Alexander Zeidler --- pveceph.adoc | 72 ++-- 1 file changed, 64 insertions(+), 8 deletions(-) diff --git a/pveceph.adoc b/pveceph.adoc index 90bb975..4e1c1e2 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -1150,22 +1150,78 @@ The following Ceph commands can be used to see if the cluster is healthy ('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors ('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands below will also give you an overview of the current events and actions to take. +To stop their execution, press CTRL-C. -# single time output -pve# ceph -s -# continuously output status changes (press CTRL+C to stop) -pve# ceph -w +# Continuously watch the cluster status +pve# watch ceph --status + +# Print the cluster status once (not being updated) +# and continuously append lines of status events +pve# ceph --watch +[[pve_ceph_ts]] +Troubleshooting +~~~ + +This section includes frequently used troubleshooting information. +More information can be found on the official Ceph website under +Troubleshooting +footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]. + +[[pve_ceph_ts_logs]] +.Relevant Logs on Affected Node + +* xref:_disk_health_monitoring[Disk Health Monitoring] +* __System -> System Log__ (or, for example, + `journalctl --since "2 days ago"`) +* IPMI and RAID controller logs + +Ceph service crashes can be listed and viewed in detail by running +`ceph crash ls` and `ceph crash info `. Crashes marked as +new can be acknowledged by running, for example, +`ceph crash archive-all`. + To get a more detailed view, every Ceph service has a log file under `/var/log/ceph/`. If more detail is required, the log level can be adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/]. -You can find more information about troubleshooting -footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/] -a Ceph cluster on the official website. - +[[pve_ceph_ts_causes]] +.Common Causes of Ceph Problems + +* Network problems like congestion, a faulty switch, a shut down +interface or a blocking firewall. Check whether all {pve} nodes are +reliably reachable on the xref:_cluster_network[corosync] network and +on the xref:pve_ceph_install_wizard[configured] Ceph public and +cluster network. + +* Disk or connection parts which are: +** defective +** not firmly mounted +** lacking I/O performance under higher load (e.g. when using HDDs, +consumer hardware or xref:pve_ceph_recommendation_raid[inadvisable] +RAID controllers) + +* Not fulfilling the xref:pve_ceph_recommendation[recommendations] for +a healthy Ceph cluster. + +[[pve_ceph_ts_problems]] +.Common Ceph Problems + :: + +OSDs `down`/crashed::: +A faulty OSD will be reported as `down` and mostly (auto) `out` 10 +minutes later. Depending on the cause, it can also automatically +become `up` and `in` again. To try a manual activation via web +interface, go to __Any node -> Ceph -> OSD__, select the OSD and click +on **Start**, **In** and **Reload**. When using the shell, run on the +affected node `ceph-volume lvm activate --all`. ++ +To activate a failed OSD, it may be necessary to +xref:ha_manager_node_maintenance[safely] reboot the respective node +or, as a last resort, to +xref:pve_ceph_osd_replace[recreate or replace] the OSD. ifdef::manvolnum[] include::pve-copyright.adoc[] -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs"
Existing information is slightly modified and retained. Add information: * Mention and link to the sections "Troubleshooting" and "Replace OSDs" * CLI commands (pveceph) must be executed on the affected node * Check in advance the "Used (%)" of OSDs to avoid blocked I/O * Check and wait until the OSD can be stopped safely * Use `pveceph stop` instead of `systemctl stop ceph-osd@.service` * Explain cleanup option a bit more Signed-off-by: Alexander Zeidler --- pveceph.adoc | 58 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/pveceph.adoc b/pveceph.adoc index 4e1c1e2..754c401 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -502,33 +502,37 @@ ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y] Destroy OSDs -To remove an OSD via the GUI, first select a {PVE} node in the tree view and go -to the **Ceph -> OSD** panel. Then select the OSD to destroy and click the **OUT** -button. Once the OSD status has changed from `in` to `out`, click the **STOP** -button. Finally, after the status has changed from `up` to `down`, select -**Destroy** from the `More` drop-down menu. - -To remove an OSD via the CLI run the following commands. - -[source,bash] - -ceph osd out -systemctl stop ceph-osd@.service - - -NOTE: The first command instructs Ceph not to include the OSD in the data -distribution. The second command stops the OSD service. Until this time, no -data is lost. - -The following command destroys the OSD. Specify the '-cleanup' option to -additionally destroy the partition table. - -[source,bash] - -pveceph osd destroy - - -WARNING: The above command will destroy all data on the disk! +If you experience problems with an OSD or its disk, try to +xref:pve_ceph_mon_and_ts[troubleshoot] them first to decide if a +xref:pve_ceph_osd_replace[replacement] is needed. + +To destroy an OSD: + +. Either open the web interface and select any {pve} node in the tree +view, or open a shell on the node where the OSD to be deleted is +located. + +. Go to the __Ceph -> OSD__ panel (`ceph osd df tree`). If the OSD to +be deleted is still `up` and `in` (non-zero value at `AVAIL`), make +sure that all OSDs have their `Used (%)` value well below the +`nearfull_ratio` of default `85%`. In this way you can reduce the risk +from the upcoming rebalancing, which may cause OSDs to run full and +thereby blocking I/O on Ceph pools. + +. If the deletable OSD is not `out` yet, select the OSD and click on +**Out** (`ceph osd out `). This will exclude it from data +distribution and starts a rebalance. + +. Click on **Stop**, and if a warning appears, click on **Cancel** and +try again shortly afterwards. When using the shell, check if it is +safe to stop by reading the output from `ceph osd ok-to-stop `, +once true, run `pveceph stop --service osd.` . + +. **Attention, this step removes the OSD from Ceph and deletes all +disk data.** To continue, first click on **More -> Destroy**. Use the +cleanup option to clean up the partition table and similar, enabling +an immediate reuse of the disk in {pve}. Finally, click on **Remove** +(`pveceph osd destroy [--cleanup]`). [[pve_ceph_pools]] -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH docs 6/6] pvecm: remove node: mention Ceph and its steps for safe removal
as it has already been missed in the past or the proper procedure was not known. Signed-off-by: Alexander Zeidler --- pvecm.adoc | 47 +++ 1 file changed, 47 insertions(+) diff --git a/pvecm.adoc b/pvecm.adoc index 15dda4e..8026de4 100644 --- a/pvecm.adoc +++ b/pvecm.adoc @@ -320,6 +320,53 @@ replication automatically switches direction if a replicated VM is migrated, so by migrating a replicated VM from a node to be deleted, replication jobs will be set up to that node automatically. +If the node to be removed has been configured for +xref:chapter_pveceph[Ceph]: + +. Ensure that sufficient {pve} nodes with running OSDs (`up` and `in`) +continue to exist. ++ +NOTE: By default, Ceph pools have a `size/min_size` of `3/2` and a +full node as `failure domain` at the object balancer +xref:pve_ceph_device_classes[CRUSH]. So if less than `size` (`3`) +nodes with running OSDs are online, data redundancy will be degraded. +If less than `min_size` are online, pool I/O will be blocked and +affected guests may crash. + +. Ensure that sufficient xref:pve_ceph_monitors[monitors], +xref:pve_ceph_manager[managers] and, if using CephFS, +xref:pveceph_fs_mds[metadata servers] remain available. + +. To maintain data redundancy, each destruction of an OSD, especially +the last one on a node, will trigger a data rebalance. Therefore, +ensure that the OSDs on the remaining nodes have sufficient free space +left. + +. To remove Ceph from the node to be deleted, start by +xref:pve_ceph_osd_destroy[destroying] its OSDs, one after the other. + +. Once the xref:pve_ceph_mon_and_ts[CEPH status] is `HEALTH_OK` again, +proceed by: + +[arabic] +.. destroying its xref:pveceph_fs_mds[metadata server] via web +interface at __Ceph -> CephFS__ or by running: ++ + +# pveceph mds destroy + + +.. xref:pveceph_destroy_mon[destroying its monitor] + +.. xref:pveceph_destroy_mgr[destroying its manager] + +. Finally, remove the now empty bucket ({pve} node to be removed) from +the CRUSH hierarchy by running: ++ + +# ceph osd crush remove + + In the following example, we will remove the node hp4 from the cluster. Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes` -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 pve-common 11/12] introduce PVE::Filesystem
On Fri Jan 31, 2025 at 1:44 PM CET, Fiona Ebner wrote: > Am 09.01.25 um 15:48 schrieb Max Carrara: > > + > > +sub fs_canonicalize : prototype($) { > > +my ($path) = @_; > > + > > +croak "\$path is undef" if !defined($path); > > + > > +my $canonicalized_path = Cwd::abs_path($path); > > + > > +croak "failed to canonicalize path: $!" if > > !defined($canonicalized_path); > > + > > +return $canonicalized_path; > > +} > > + > > Missing the 1; at the end of module Woops, thanks for spotting this! Will correct that. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH docs 5/6] ceph: maintenance: revise and expand section "Replace OSDs"
Remove redundant information that is already described in section “Destroy OSDs” and link to it. Mention and link to the troubleshooting section, as replacing the OSD may not fix the underyling problem. Mention that the replacement disk should be of the same type and size and comply with the recommendations. Mention how to acknowledge warnings of crashed OSDs. Signed-off-by: Alexander Zeidler --- pveceph.adoc | 45 + 1 file changed, 13 insertions(+), 32 deletions(-) diff --git a/pveceph.adoc b/pveceph.adoc index 754c401..010f48c 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -1032,43 +1032,24 @@ Ceph Maintenance Replace OSDs -One of the most common maintenance tasks in Ceph is to replace the disk of an -OSD. If a disk is already in a failed state, then you can go ahead and run -through the steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate -those copies on the remaining OSDs if possible. This rebalancing will start as -soon as an OSD failure is detected or an OSD was actively stopped. +With the following steps you can replace the disk of an OSD, which is +one of the most common maintenance tasks in Ceph. If there is a +problem with an OSD while its disk still seems to be healthy, read the +xref:pve_ceph_mon_and_ts[troubleshooting] section first. -NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when -`size + 1` nodes are available. The reason for this is that the Ceph object -balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as -`failure domain'. +. If the disk failed, get a +xref:pve_ceph_recommendation_disk[recommended] replacement disk of the +same type and size. -To replace a functioning disk from the GUI, go through the steps in -xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until -the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it. +. xref:pve_ceph_osd_destroy[Destroy] the OSD in question. -On the command line, use the following commands: +. Detach the old disk from the server and attach the new one. - -ceph osd out osd. - - -You can check with the command below if the OSD can be safely removed. - - -ceph osd safe-to-destroy osd. - - -Once the above check tells you that it is safe to remove the OSD, you can -continue with the following commands: - - -systemctl stop ceph-osd@.service -pveceph osd destroy - +. xref:pve_ceph_osd_create[Create] the OSD again. -Replace the old disk with the new one and use the same procedure as described -in xref:pve_ceph_osd_create[Create OSDs]. +. After automatic rebalancing, the cluster status should switch back +to `HEALTH_OK`. Any still listed crashes can be acknowledged by +running, for example, `ceph crash archive-all`. Trim/Discard -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path
Am 03.02.25 um 15:21 schrieb Max Carrara: > On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote: >> Am 09.01.25 um 15:48 schrieb Max Carrara: >> >> Why this kind of behavior with absolute paths? Seems surprising to me. >> Wouldn't failing the call be better? > > I decided to stick with Rust's behaviour [pathbuf] here and in most > other places, simply in order to keep the behaviour consistent. > > I'm not *really* a fan of it either, but I think it's better than > failing the call -- IMO having to wrap path_join / path_push into an > eval-block would be quite awkward for what's a rather "small" operation > :s > > Perhaps I should provide some additional context here as well so as to > make my "design philosophy" more clear: The reason why I modeled > PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because: > > 1. Most (newer) developers will be much more likely to be familiar > with Rust instead of Perl, so if they should see this library, > they'll have a much easier time using it (or at least that's what > I *hope* would happen). Sort of as in: "Oh, this is just like the > path stuff in Rust, except for a few Perl-esque exceptions!" > > 2. I wanted the library to be essentially exception-free, except where > obvious invariants aren't upheld by the caller (e.g. passing undef). > Unless you match the error message with a regex, there's no > structurally typed way to differ between errors in Perl, so I'd > like the "types" of errors to be mostly the same wherever possible. > > 3. I didn't want to reinvent the wheel and instead stick to what other > popular libraries do as much as possible, while also avoiding any > additional abstractions (e.g. introducing an object for paths). > > With all that being said, since that behaviour is surprising to you, I > have no quarrels changing it! :P After all I'd like to have something > that doesn't cause any friction when using it. > > [pathbuf]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push > You don't need to change it if there was good rationale behind it, which you provided :) A short comment in the function documentation regarding the inspiration would still be nice. Puts the responsibility on the caller to not accidentally use an absolute path though. Maybe we can at least warn instead? I suspect almost all calls with an absolute path in the middle will be accidental. >> >>> + >>> +my $joined = path_join("foo/bar", "/etc", "resolv.conf"); >>> +# /etc/resolv.conf >>> + >>> +my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts"); >>> +# /etc/hosts >>> + >>> +Throws an exception if any of the passed C<@paths> is C. >>> + >>> +=cut >>> + >>> +sub path_join : prototype(@) { >>> +my (@paths) = @_; >>> + >>> +if (!scalar(@paths)) { >>> + return ''; >>> +} >>> + >>> +croak "one of the provided paths is undef" if any { !defined($_) } >>> @paths; >>> + >> >> I think the rest could be written more efficiently like (untested): >> >> my $resulting_path = shift @paths; >> for my $path (@paths) { >> if ($path =~ m#^/#) { >> $resulting_path = $path; >> } else { >> $resulting_path = path_push($resulting_path, $path); >> } >> } > > Note that I'm doing a reverse iteration below; I think doing it either > way should be fine, because I doubt we'll ever need to join that many > path components where it's actually going to make a difference ;P > I also meant "efficiently" in terms of lines/readability ;) Shorter, less loops and avoids index arithmetic. >> >>> +# Find the last occurrence of a root directory and start conjoining the >>> +# components from there onwards >>> +my $index = scalar(@paths) - 1; >>> +while ($index > 0) { >>> + last if $paths[$index] =~ m#^/#; >>> + $index--; >>> +} >>> + >>> +@paths = @paths[$index .. (scalar(@paths) - 1)]; >>> + >>> +my $resulting_path = shift @paths; >>> + >>> +for my $path (@paths) { >>> + $resulting_path = path_push($resulting_path, $path); >>> +} >>> + >>> +return $resulting_path; >>> +} >>> + >>> +=head3 path_normalize($path) >>> + >>> +Wrapper for L>. Performs a logical cleanup of the >>> given >>> +C<$path>. >>> + >>> +This removes unnecessary components of a path that can be safely >>> +removed, such as C<.>, trailing C or repeated occurrences of C. >>> + >>> +For example, C and C will both become >>> +C. >>> + >>> +B This will I remove components referencing the parent >>> directory, >>> +i.e. C<..>. For example, C and C will >>> therefore >>> +remain as they are. However, the parent directory of C is C, so >>> +C will be normalized to C. >>> + >>> +Throws an exception if C<$path> is C or the call to C >>> failed. >>> + >>> +=cut >>> + >>> +sub path_normalize : prototype($) { >>> +my ($path) = @_; >>> + >>> +croak "\$path is undef" if !defined($path); >>> + >>> +my $cleaned_path = eval { >>> + File::Spec->can
[pve-devel] applied: [PATCH common] tools: explain reason for the explicit PerlIO load
Am 02.12.24 um 17:03 schrieb Filip Schauer: > Explain the reason for the explicit `use PerlIO::scalar;` statement > introduced in c4945bf ("tools: load PerlIO explicitly to avoid odd > failures") > > Signed-off-by: Filip Schauer > --- > src/PVE/Tools.pm | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm > index 0325f53..57eb86c 100644 > --- a/src/PVE/Tools.pm > +++ b/src/PVE/Tools.pm > @@ -290,12 +290,12 @@ sub file_set_contents { > } else { > # Encode wide characters with print before passing them to syswrite > my $unencoded_data = $data; > - # Without this we get some "Can't locate PerlIO.pm in @INC" errors > _sometimes_, and the > - # odd thing about it is that they can be "fixed" by calling > file_set_contents in the > - # parent methode/code before the method, from another module, is > called. > - # Anyway, loading PerlIO here should be fine as the in-memory > variable writing is in > - # fact backed by the PerlIO based "scalar" module. This comment can > be removed once the > - # odd behavior is really understood. > + # Preload PerlIO::scalar at compile time to prevent runtime loading > issues when > + # file_set_contents is called with PVE::LXC::Setup::protected_call. > Normally, > + # PerlIO::scalar is loaded implicitly during the execution of > + # `open(my $data_fh, '>', \$data)`. However, this fails if it is > executed within a > + # chroot environment where the necessary PerlIO.pm module file is > inaccessible. > + # Preloading the module ensures it is available regardless of the > execution context. > use PerlIO::scalar; > open(my $data_fh, '>', \$data) or die "failed to open in-memory > variable - $!\n"; > print $data_fh $unencoded_data; Nice detective work! Applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-docs v2 0/2] PVE docs: Made small edits.
Split a long sentence in the section on Swap on ZFS and fixed some typos. Alexander Abraham (2): Section on ZFS and swap corrected. PVE docs: Made small edits. local-zfs.adoc | 37 ++--- 1 file changed, 2 insertions(+), 35 deletions(-) -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-docs v2 1/2] Section on ZFS and swap corrected.
--- local-zfs.adoc | 36 +--- 1 file changed, 1 insertion(+), 35 deletions(-) diff --git a/local-zfs.adoc b/local-zfs.adoc index c64fb27..bcd02f7 100644 --- a/local-zfs.adoc +++ b/local-zfs.adoc @@ -627,41 +627,7 @@ You *must reboot* to activate these changes. SWAP on ZFS ~~~ -Swap-space created on a zvol may generate some troubles, like blocking the -server or generating a high IO load, often seen when starting a Backup -to an external Storage. - -We strongly recommend to use enough memory, so that you normally do not -run into low memory situations. Should you need or want to add swap, it is -preferred to create a partition on a physical disk and use it as a swap device. -You can leave some space free for this purpose in the advanced options of the -installer. Additionally, you can lower the -``swappiness'' value. A good value for servers is 10: - - -# sysctl -w vm.swappiness=10 - - -To make the swappiness persistent, open `/etc/sysctl.conf` with -an editor of your choice and add the following line: - - -vm.swappiness = 10 - - -.Linux kernel `swappiness` parameter values -[width="100%",cols="https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux[here], warns against using a ZFS volume for a swap partition. [[zfs_encryption]] Encrypted ZFS Datasets -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-docs v2 2/2] PVE docs: Made small edits.
A long sentence in the section about Swap on ZFS was split into two sentences and some typos were corrected. Signed-off-by: Alexander Abraham --- local-zfs.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/local-zfs.adoc b/local-zfs.adoc index bcd02f7..b662233 100644 --- a/local-zfs.adoc +++ b/local-zfs.adoc @@ -627,7 +627,8 @@ You *must reboot* to activate these changes. SWAP on ZFS ~~~ -It is strongly recommended to not use a ZFS volume for a swap partition because this could lead to deadlocks freezing thge affected system and other unpredictable behavior. The OpenZFS documentation, which can be viewed https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux[here], warns against using a ZFS volume for a swap partition. +It is strongly recommended to not use a ZFS volume for a swap partition because this could lead to deadlocks. These deadlocks could cause the affected system to freeze. +The OpenZFS documentation, which can be viewed https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux[here], warns about using a ZFS volume for a swap partition. [[zfs_encryption]] Encrypted ZFS Datasets -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization
Signed-off-by: Alexander Zeidler --- pveceph.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pveceph.adoc b/pveceph.adoc index 93c2f8d..90bb975 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -768,7 +768,7 @@ Nautilus: PG merging and autotuning]. [[pve_ceph_device_classes]] -Ceph CRUSH & device classes +Ceph CRUSH & Device Classes --- [thumbnail="screenshot/gui-ceph-config.png"] @@ -1021,7 +1021,7 @@ After these steps, the CephFS should be completely removed and if you have other CephFS instances, the stopped metadata servers can be started again to act as standbys. -Ceph maintenance +Ceph Maintenance [[pve_ceph_osd_replace]] @@ -1089,7 +1089,7 @@ are executed. [[pveceph_shutdown]] -Shutdown {pve} + Ceph HCI cluster +Shutdown {pve} + Ceph HCI Cluster ~ To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. These -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH manager] api: nodes: add more return descriptions for node status
Am 16.12.24 um 17:03 schrieb Dominik Csapak: > it's not all fields, but many useful ones > > Signed-off-by: Dominik Csapak For the record, this was already applied: https://git.proxmox.com/?p=pve-manager.git;a=commit;h=80cc262f255d409a06c4b64a5a87330407dc2b46 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH access-control] api: role: remove role references from acl rules on role deletion
Am 04.12.24 um 16:11 schrieb Daniel Kral: > Let the API endpoint `DELETE /access/roles/{roleid}` or command > `pveum role delete ` remove any ACL rules in the user > configuration, which reference the removed role. > > Before this change, the removal of a role has caused the role to remain > in existing ACL rules, which referenced the removed role. Therefore, on > each parse of the user configuration, a warning was be displayed: > > user config - ignore invalid acl role '' > Might be good to note that the next modification of the configuration would drop the unknown role (even if a role with the same name is re-added right away). > Signed-off-by: Daniel Kral Just a minor nit below, otherwise: Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner What would be really nice is to have some tests for various add/modify/delete sequences touching user.cfg :) I don't think current tests cover that yet. > --- > src/PVE/API2/Role.pm | 2 +- > src/PVE/AccessControl.pm | 23 +++ > 2 files changed, 24 insertions(+), 1 deletion(-) > > diff --git a/src/PVE/API2/Role.pm b/src/PVE/API2/Role.pm > index a924018..6e75a3c 100644 > --- a/src/PVE/API2/Role.pm > +++ b/src/PVE/API2/Role.pm > @@ -212,7 +212,7 @@ __PACKAGE__->register_method ({ > > delete ($usercfg->{roles}->{$role}); > > - # fixme: delete role from acl? > + PVE::AccessControl::delete_role_acl($role, $usercfg); > > cfs_write_file("user.cfg", $usercfg); > }, "delete role failed"); > diff --git a/src/PVE/AccessControl.pm b/src/PVE/AccessControl.pm > index 47f2d38..4bbbe80 100644 > --- a/src/PVE/AccessControl.pm > +++ b/src/PVE/AccessControl.pm > @@ -1022,6 +1022,29 @@ sub delete_group_acl { > iterate_acl_tree("/", $usercfg->{acl_root}, $code); > } > > +sub delete_role_acl_for_each { Nit: could be a private "my sub" > +my ($role, $acl_subjects) = @_; > + > +for my $subject (sort keys %$acl_subjects) { > + delete ($acl_subjects->{$subject}->{$role}) > + if $acl_subjects->{$subject}->{$role}; > +} > +} > + > +sub delete_role_acl { > +my ($role, $usercfg) = @_; > + > +my $code = sub { > + my ($path, $acl_node) = @_; > + > + delete_role_acl_for_each($role, $acl_node->{groups}); > + delete_role_acl_for_each($role, $acl_node->{users}); > + delete_role_acl_for_each($role, $acl_node->{tokens}); > +}; > + > +iterate_acl_tree("/", $usercfg->{acl_root}, $code); > +} > + > sub delete_pool_acl { > my ($pool, $usercfg) = @_; > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path
On Mon Feb 3, 2025 at 4:42 PM CET, Fiona Ebner wrote: > Am 03.02.25 um 15:21 schrieb Max Carrara: > > On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote: > >> Am 09.01.25 um 15:48 schrieb Max Carrara: > >> > >> Why this kind of behavior with absolute paths? Seems surprising to me. > >> Wouldn't failing the call be better? > > > > I decided to stick with Rust's behaviour [pathbuf] here and in most > > other places, simply in order to keep the behaviour consistent. > > > > I'm not *really* a fan of it either, but I think it's better than > > failing the call -- IMO having to wrap path_join / path_push into an > > eval-block would be quite awkward for what's a rather "small" operation > > :s > > > > Perhaps I should provide some additional context here as well so as to > > make my "design philosophy" more clear: The reason why I modeled > > PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because: > > > > 1. Most (newer) developers will be much more likely to be familiar > > with Rust instead of Perl, so if they should see this library, > > they'll have a much easier time using it (or at least that's what > > I *hope* would happen). Sort of as in: "Oh, this is just like the > > path stuff in Rust, except for a few Perl-esque exceptions!" > > > > 2. I wanted the library to be essentially exception-free, except where > > obvious invariants aren't upheld by the caller (e.g. passing undef). > > Unless you match the error message with a regex, there's no > > structurally typed way to differ between errors in Perl, so I'd > > like the "types" of errors to be mostly the same wherever possible. > > > > 3. I didn't want to reinvent the wheel and instead stick to what other > > popular libraries do as much as possible, while also avoiding any > > additional abstractions (e.g. introducing an object for paths). > > > > With all that being said, since that behaviour is surprising to you, I > > have no quarrels changing it! :P After all I'd like to have something > > that doesn't cause any friction when using it. > > > > [pathbuf]: > > https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push > > > > You don't need to change it if there was good rationale behind it, which > you provided :) A short comment in the function documentation regarding > the inspiration would still be nice. Puts the responsibility on the > caller to not accidentally use an absolute path though. Maybe we can at > least warn instead? I suspect almost all calls with an absolute path in > the middle will be accidental. Sure, that sounds fine by me! :) > > >> > >>> + > >>> +my $joined = path_join("foo/bar", "/etc", "resolv.conf"); > >>> +# /etc/resolv.conf > >>> + > >>> +my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts"); > >>> +# /etc/hosts > >>> + > >>> +Throws an exception if any of the passed C<@paths> is C. > >>> + > >>> +=cut > >>> + > >>> +sub path_join : prototype(@) { > >>> +my (@paths) = @_; > >>> + > >>> +if (!scalar(@paths)) { > >>> + return ''; > >>> +} > >>> + > >>> +croak "one of the provided paths is undef" if any { !defined($_) } > >>> @paths; > >>> + > >> > >> I think the rest could be written more efficiently like (untested): > >> > >> my $resulting_path = shift @paths; > >> for my $path (@paths) { > >> if ($path =~ m#^/#) { > >> $resulting_path = $path; > >> } else { > >> $resulting_path = path_push($resulting_path, $path); > >> } > >> } > > > > Note that I'm doing a reverse iteration below; I think doing it either > > way should be fine, because I doubt we'll ever need to join that many > > path components where it's actually going to make a difference ;P > > > > I also meant "efficiently" in terms of lines/readability ;) Shorter, > less loops and avoids index arithmetic. Oooh, that makes sense then! Sure, I'll give it a try :) > > >> > >>> +# Find the last occurrence of a root directory and start conjoining > >>> the > >>> +# components from there onwards > >>> +my $index = scalar(@paths) - 1; > >>> +while ($index > 0) { > >>> + last if $paths[$index] =~ m#^/#; > >>> + $index--; > >>> +} > >>> + > >>> +@paths = @paths[$index .. (scalar(@paths) - 1)]; > >>> + > >>> +my $resulting_path = shift @paths; > >>> + > >>> +for my $path (@paths) { > >>> + $resulting_path = path_push($resulting_path, $path); > >>> +} > >>> + > >>> +return $resulting_path; > >>> +} > >>> + > >>> +=head3 path_normalize($path) > >>> + > >>> +Wrapper for L>. Performs a logical cleanup of > >>> the given > >>> +C<$path>. > >>> + > >>> +This removes unnecessary components of a path that can be safely > >>> +removed, such as C<.>, trailing C or repeated occurrences of C. > >>> + > >>> +For example, C and C will both become > >>> +C. > >>> + > >>> +B This will I remove components referencing the parent > >>> directory, > >>> +i.e. C<..>. For example, C and C will > >>>
Re: [pve-devel] [PATCH pve-docs v2 1/2] Section on ZFS and swap corrected.
The subject prefix is correct now. But it seems like most of the other feedback given for v1 still applies [0]. Also, there is no need to have two separate patches if you just fix up the changes you just introduced in the second one. Then the patches can be squashed together into a single one. [0]: https://lore.proxmox.com/pve-devel/71eb4140-1e06-43ba-bbe4-76a39d5e9...@proxmox.com/ Am 03.02.25 um 12:14 schrieb Alexander Abraham: > --- > local-zfs.adoc | 36 +--- > 1 file changed, 1 insertion(+), 35 deletions(-) > > diff --git a/local-zfs.adoc b/local-zfs.adoc > index c64fb27..bcd02f7 100644 > --- a/local-zfs.adoc > +++ b/local-zfs.adoc > @@ -627,41 +627,7 @@ You *must reboot* to activate these changes. > SWAP on ZFS > ~~~ > > -Swap-space created on a zvol may generate some troubles, like blocking the > -server or generating a high IO load, often seen when starting a Backup > -to an external Storage. > - > -We strongly recommend to use enough memory, so that you normally do not > -run into low memory situations. Should you need or want to add swap, it is > -preferred to create a partition on a physical disk and use it as a swap > device. > -You can leave some space free for this purpose in the advanced options of the > -installer. Additionally, you can lower the > -``swappiness'' value. A good value for servers is 10: > - > - > -# sysctl -w vm.swappiness=10 > - > - > -To make the swappiness persistent, open `/etc/sysctl.conf` with > -an editor of your choice and add the following line: > - > - > -vm.swappiness = 10 > - > - > -.Linux kernel `swappiness` parameter values > -[width="100%",cols=" -|=== > -| Value | Strategy > -| vm.swappiness = 0 | The kernel will swap only to avoid > -an 'out of memory' condition > -| vm.swappiness = 1 | Minimum amount of swapping without > -disabling it entirely. > -| vm.swappiness = 10 | This value is sometimes recommended to > -improve performance when sufficient memory exists in a system. > -| vm.swappiness = 60 | The default value. > -| vm.swappiness = 100 | The kernel will swap aggressively. > -|=== > +It is strongly recommended to not use a ZFS volume for a swap partition > because this could lead to deadlocks freezing thge affected system and other > unpredictable behavior. The OpenZFS documentation, which can be viewed > https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux[here], > warns against using a ZFS volume for a swap partition. > > [[zfs_encryption]] > Encrypted ZFS Datasets ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH pve-docs v2 1/2] Section on ZFS and swap corrected.
IIRC, there have been some considerations in the past to move the "SWAP on ZFS" section to the chapter "Host System Administration". If doing so: 1. It could be renamed to "SWAP" and, with the current layout, placed between "Disk Health Monitoring" and "Logical Volume Manager (LVM)"? 2. Update the first paragraph accordingly, so that, for example, it includes your new sentences and maybe starts with something like "If using ZFS, [it is strongly recommended ...]". On Mon Feb 3, 2025 at 12:14 PM CET, Alexander Abraham wrote: > --- > local-zfs.adoc | 36 +--- > 1 file changed, 1 insertion(+), 35 deletions(-) > > diff --git a/local-zfs.adoc b/local-zfs.adoc > index c64fb27..bcd02f7 100644 > --- a/local-zfs.adoc > +++ b/local-zfs.adoc > @@ -627,41 +627,7 @@ You *must reboot* to activate these changes. > SWAP on ZFS > ~~~ > > -Swap-space created on a zvol may generate some troubles, like blocking the > -server or generating a high IO load, often seen when starting a Backup > -to an external Storage. > - > -We strongly recommend to use enough memory, so that you normally do not > -run into low memory situations. Should you need or want to add swap, it is > -preferred to create a partition on a physical disk and use it as a swap > device. > -You can leave some space free for this purpose in the advanced options of the > -installer. Additionally, you can lower the > -``swappiness'' value. A good value for servers is 10: > - > - > -# sysctl -w vm.swappiness=10 > - > - > -To make the swappiness persistent, open `/etc/sysctl.conf` with > -an editor of your choice and add the following line: > - > - > -vm.swappiness = 10 > - > - > -.Linux kernel `swappiness` parameter values > -[width="100%",cols=" -|=== > -| Value | Strategy > -| vm.swappiness = 0 | The kernel will swap only to avoid > -an 'out of memory' condition > -| vm.swappiness = 1 | Minimum amount of swapping without > -disabling it entirely. > -| vm.swappiness = 10 | This value is sometimes recommended to > -improve performance when sufficient memory exists in a system. > -| vm.swappiness = 60 | The default value. > -| vm.swappiness = 100 | The kernel will swap aggressively. > -|=== > +It is strongly recommended to not use a ZFS volume for a swap partition > because this could lead to deadlocks freezing thge affected system and other > unpredictable behavior. The OpenZFS documentation, which can be viewed > https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux[here], > warns against using a ZFS volume for a swap partition. > > [[zfs_encryption]] > Encrypted ZFS Datasets ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server v3 2/6] fix #4225: qemuserver: introduce sub eject_nonrequired_isos
Am 03.02.25 um 11:15 schrieb Daniel Herzig: > Fiona Ebner writes: > >> Am 31.01.25 um 14:58 schrieb Daniel Herzig: >>> Fiona Ebner writes: >>> Am 31.01.25 um 10:36 schrieb Fiona Ebner: > Am 30.01.25 um 12:31 schrieb Daniel Herzig: >> + >> +$drive->{essential} = 1 if !defined($drive->{essential}); > > This should rather be done when parsing the drive. Or I guess to avoid (potentially) writing it out for every drive, it should only be a local variable here and not set in the drive hash. >>> >>> Thanks, I'll double check on this for v4. But I'd assume the hash to be >>> scoped to >>> `config_to_command` here, or am I missing something? >>> >> >> But you don't need the modified version in config_to_command() later, >> or? And if yes, that can just check the same way again, i.e. using >> default value if not set. If we want to explicitly set the value, that >> should happen during parsing the drive string. Most of the time it's >> surprising to implicitly modify a passed-in value. Better to either >> avoid that or if really necessary, explicitly mention it in the function >> description. > > No, I do not need the modified version of the VM-config later. Yes, but you manipulate the drive hash, which would then get written out modified. Maybe you are lucky and there is no writer, but I'm telling you how to avoid that by considering the default on access. E.g. see how this is done for the 'bios' option, we do not modify the setting but apply the default when accessing it. > > What I'm trying to achieve is a more 'forgiving' behaviour in the case > of accidently server-side-deleted iso file/unavailable server (for whatever > reason) > attached to a VM. So this is actually aiming at `qm start`, which > implicitly calls `config_to_command` -- without modifying the existing > VM config at all. > > If the parameter 'essential' is set to '0', `config_to_command` would -- > in case of file unavailability of the iso file -- generate a kvm startup > command that contains "-drive 'if=none,media=cdrom,[...]" instead of > "-drive 'file=$SOME_PATH_TO_ISO,[..]' when we at this point already know > that $SOME_PATH_TO_ISO is unavailable/non-existent. Yes, I understand that. > > The VM-config itself is not changed, as an eg nfs-server might come back > at a point in time later and the user might want to do something with the iso > stored there. This is problematic for live migration, see my initial reply. Either we need to drop the CD from the config or we need another option "missing" that gets set if it is missing during start-up (and cleared if it is there during startup). Then the target of migration can also know about it when starting its own instance of the VM. We could also do this via a special section in the configuration, see the following series[0] since this would be internal-only information. > > If the parameter 'essential' is unset, or set to '1', the die would > happen before `qm start` leads to an invocation of kvm, as it cannot be > expected to lead to a successful action, if $SOME_PATH_TO_ISO is not > reachable. This would be the exact behaviour as we have it now, just not > letting kvm run into this situation, but detect and exit earlier, while > `config_to_commands` iterates over all volumes via `foreach_volume` > anyway before producing the 'final' kvm command. > Right, you also modify the config itself. But you cannot rely on nobody else later writing the config. Just because it doesn't happen right now in the tested code paths, it doesn't mean it never will. (There can be safe-guards [1] if we really want to ensure this). But I'd either simply have the function return which devices should not be added later, or explicitly write out the modified config. Because doing config modifications implicitly is error prone and future changes can easily activate bugs lurking in that implicit behavior. [0]: https://lore.proxmox.com/pve-devel/20250127112923.31703-10-f.eb...@proxmox.com/ [1]: https://lore.proxmox.com/pve-devel/20250124105351.43428-3-f.eb...@proxmox.com/ ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server v3 2/6] fix #4225: qemuserver: introduce sub eject_nonrequired_isos
Am 03.02.25 um 14:09 schrieb Fiona Ebner: > This is problematic for live migration, see my initial reply. Either we > need to drop the CD from the config or we need another option "missing" > that gets set if it is missing during start-up (and cleared if it is > there during startup). Then the target of migration can also know about To clarify, here I mean "cleared if the CD is there during a startup that is not for an incoming migration". > it when starting its own instance of the VM. We could also do this via a > special section in the configuration, see the following series[0] since > this would be internal-only information. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH pve-storage] qcow2: resize: add preallocation support
Am 19.12.24 um 17:18 schrieb Alexandre Derumier via pve-devel: > Seem that we totally forgot to add it, it's available since 2017 > https://www.mail-archive.com/qemu-devel@nongnu.org/msg436979.html Nit: it's better to link to the commit rather than the mailing list for things that are already applied. Missing your Signed-off-by. Hmm, I wanted to suggest to query the image to see what kind of preallocation it was created with and then use that setting to stay consistent. But that information doesn't seem to get recorded (on an image-wide level) AFAICS. It might be surprising that changes to the storage configuration setting will also apply to already existing images and we should document the behavior for resize in the description of the 'preallocation' setting. Seems like the "block_resize" QMP command does not have the setting at all, so if we add it here, the behavior would still be inconsistent in that regard :/ But oh well, could still be added on top later if we can get that feature in upstream. But should also be documented, that it doesn't apply for live resize. > --- > src/PVE/Storage/Plugin.pm | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm > index 6a6a804..d0ce3ae 100644 > --- a/src/PVE/Storage/Plugin.pm > +++ b/src/PVE/Storage/Plugin.pm > @@ -1166,7 +1166,9 @@ sub volume_resize { > > my $format = ($class->parse_volname($volname))[6]; > > -my $cmd = ['/usr/bin/qemu-img', 'resize', '-f', $format, $path , $size]; > +my $prealloc_opt = preallocation_cmd_option($scfg, $format); > + > +my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', > $format, $path , $size]; > > run_command($cmd, timeout => 10); > > -- > 2.39.5 > > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH pve-storage] qcow2: resize: add preallocation support
--- Begin Message --- Message initial De: Fiona Ebner À: Proxmox VE development discussion Cc: Alexandre Derumier Objet: Re: [pve-devel] [PATCH pve-storage] qcow2: resize: add preallocation support Date: 03/02/2025 15:39:41 Am 19.12.24 um 17:18 schrieb Alexandre Derumier via pve-devel: > Seem that we totally forgot to add it, it's available since 2017 > https://antiphishing.vadesecure.com/v4?f=TmtFVlNVNmxSYnFaWFhxYhCRpRpL > 9Z3oy4_Tk4UXcP5N_qAOXqIRqmal4wpM8L7y&i=d09ZU0Z5WTAyTG85WWdYbKbcMR2wMI > IXhqLwqdlSH4I&k=DWI7&r=UTEzTUpQcktwRVdhdEg1TKCFOzhw8CGaAiMfyFTpTR_LTs > pF9zP2JS-LN0ctA-XBzHeMG- > sD1OqL3ihNxDMXJg&s=ee8fad1f3a3cde35c58d5e5735a648efe5c5270d76a0a57b9a > 6909d8d3104966&u=https%3A%2F%2Fwww.mail-archive.com%2Fqemu- > devel%40nongnu.org%2Fmsg436979.html Nit: it's better to link to the commit rather than the mailing list for things that are already applied. >>Missing your Signed-off-by. oops,sorry >>Hmm, I wanted to suggest to query the image to see what kind of >>preallocation it was created with and then use that setting to stay >>consistent. >>But that information doesn't seem to get recorded (on an >>image-wide level) AFAICS. for full pre-allocation, I think we can simply check the current qcow2 usage vs the size configured. for qcow2 metadatas, I really don't known any way to do it. >> It might be surprising that changes to the >>storage configuration setting will also apply to already existing >>images Personnaly, I was more surprised than this never have worked on resize before ^_^. That don't shock me that it's respect the current assigned option at the moment of the resize. >>and we should document the behavior for resize in the description of >>the >>'preallocation' setting. But yes, it should be documented. I'll write a patch of pve-docs >>Seems like the "block_resize" QMP command does not have the setting >>at >>all, so if we add it here, the behavior would still be inconsistent >>in >>that regard :/ But oh well, could still be added on top later if we >>can >>get that feature in upstream. But should also be documented, that it >>doesn't apply for live resize. yes, indeed, it doesn't exist for live running image. (I think to have seen discussion on the qemu mailing about it, but it require some kind of block job if I remember correctly). It's existing a preallocate-filter https://qemu.googlesource.com/qemu/+/refs/tags/v8.0.3/block/preallocate.c but it's a little bit different, it's preallocating live. (allocating by chunk of 1MB for example, when you have a 4k write reaching EOF) Thanks for the review ! > --- > src/PVE/Storage/Plugin.pm | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm > index 6a6a804..d0ce3ae 100644 > --- a/src/PVE/Storage/Plugin.pm > +++ b/src/PVE/Storage/Plugin.pm > @@ -1166,7 +1166,9 @@ sub volume_resize { > > my $format = ($class->parse_volname($volname))[6]; > > - my $cmd = ['/usr/bin/qemu-img', 'resize', '-f', $format, $path , > $size]; > + my $prealloc_opt = preallocation_cmd_option($scfg, $format); > + > + my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '- > f', $format, $path , $size]; > > run_command($cmd, timeout => 10); > > -- > 2.39.5 > > --- End Message --- ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel