On Wed Jan 8, 2025 at 3:05 PM CET, Wolfgang Bumiller wrote: > On Fri, Dec 20, 2024 at 07:51:56PM +0100, Max Carrara wrote: > > The PVE::Path module concerns itself with file / directory path > > operations, like getting the parent directory of a path, extracting > > the file name of a path, splitting a path into its individual > > components, joining path components together, comparing paths, and so > > on. > > > > This module is added here in order to address the shortcomings of the > > Perl core modules (such as lacking a lot the aforementioned > > functionalities) without relying on some kind of object-based > > abstraction. > > > > PVE::Path strictly only contains functions that manipulate file paths > > without meddling with the filesystem. The reasoning here is to be able > > to just import any function when its necessary without having to > > change the surrounding code in order to adapt to some kind of > > abstraction. > > > > The other motivation for this module is that path manipulation > > operations have been getting more and more common recently, especially > > in regards to storage. > > > > Signed-off-by: Max Carrara <m.carr...@proxmox.com> > > --- > > Changes v1 --> v2: > > * Improve some wording in the docstring of path_components > > * Simplify some logic in path_parent and remove an unnecessary sanity > > check > > * Actually treat "foo" as "./foo" in path_parent as mentioned in the > > docstring -- This means that path_parent("foo") now returns "." > > instead of "". > > * Adapt the path_with_file_* functions to the above accordingly, so > > that e.g. path_with_file_name("foo", "bar") returns "bar" instead of > > "./bar". > > * Improve the "boolean" behaviour of path_is_absolute and > > path_is_absolute and return 1 when true, but use an empty return > > when false. > > - An empty return means "undef" in scalar context and an empty list > > in list context, so those functions will always return something > > that's correctly truthy or falsy for Perl, regardless of context > > > > src/Makefile | 1 + > > src/PVE/Path.pm | 987 ++++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 988 insertions(+) > > create mode 100644 src/PVE/Path.pm > > > > diff --git a/src/Makefile b/src/Makefile > > index 2d8bdc4..25bc490 100644 > > --- a/src/Makefile > > +++ b/src/Makefile > > @@ -23,6 +23,7 @@ LIB_SOURCES = \ > > LDAP.pm \ > > Network.pm \ > > OTP.pm \ > > + Path.pm \ > > PBSClient.pm \ > > PTY.pm \ > > ProcFSTools.pm \ > > diff --git a/src/PVE/Path.pm b/src/PVE/Path.pm > > new file mode 100644 > > index 0000000..221e662 > > --- /dev/null > > +++ b/src/PVE/Path.pm > > @@ -0,0 +1,987 @@ > > +=head1 NAME > > + > > +C<PVE::Path> - Utilities related to handling file and directory paths > > + > > +=head1 DESCRIPTION > > + > > +This module provides functions concerned with file and directory path > > +manipulation. > > + > > +None of the functions provided alter the filesystem in any way. > > + > > +The reason for this module's existence is to address a couple shortcomings: > > + > > +=over > > + > > +=item 1. The Perl core modules lack most of what is required for > > manipulating > > +paths, for example getting the parent directory of a path, extracting the > > +prefix of a file name (the "stem"), extracting the suffixes of a file name > > (the > > +"endings" or "extensions"), checking whether two paths are the same, and > > so on. > > + > > +=item 2. If the Perl core modules provide something in that regard, it's > > usually > > +provided in a not very ergonomic manner (L<C<File::Basename>>). > > + > > +=item 3. Additionally, the path utilities of the core modules are scattered > > +across multiple modules, making them hard to discover. > > + > > +=item 4. Third-party libraries on CPAN mostly provide objects representing > > +paths. Using any of these would require fundamental changes on how file > > paths > > +are handled throughout our code, for almost no benefit. > > + > > +=back > > + > > +C<L<PVE::Path>> instead does without objects and strictly provides > > functions > > +for path manipulation only. Any operation that is needed can simply be > > +performed ad hoc by importing the corresponding function and doesn't > > require > > +the surrounding code to conform to an abstraction like a path object. > > + > > +Additionally, some of the core modules' functionality is re-exported or > > +re-implemented for ergonomic or logical purposes. The goal is to provide > > +functions that don't come with any surprises and just behave like one > > assumes > > +they would. > > + > > +This module takes inspiration from Rust's C<std::path> and Python's > > C<pathlib>, > > +which are more modern path manipulation libraries. > > + > > +=head1 LIMITATIONS > > + > > +This module is limited to manipulating Unix-like / Linux file paths. > > + > > +=cut > > + > > +package PVE::Path; > > + > > +use strict; > > +use warnings; > > + > > +use Carp qw(carp croak confess); > > +use File::Spec (); > > +use List::Util qw(any zip_shortest zip_longest); > > + > > +use Exporter qw(import); > > + > > +our @EXPORT_OK = qw( > > + path_is_absolute > > + path_is_relative > > + > > + path_components > > + path_join > > + > > + path_normalize > > + > > + path_parent > > + path_push > > + path_pop > > + > > + path_file_name > > + path_with_file_name > > + > > + path_file_prefix > > + path_with_file_prefix > > + > > + path_file_suffixes > > + path_with_file_suffixes > > + > > + path_file_suffix > > + path_with_file_suffix > > + > > + path_file_parts > > + > > + path_starts_with > > + path_ends_with > > + path_equals > > +); > > + > > +=head2 FUNCTIONS > > + > > +=cut > > + > > +=head3 path_is_absolute($path) > > + > > +Returns C<1> if C<$path> is absolute (starts with a C</>). > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_is_absolute : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + if ($path =~ m#^/#) { > > + return 1; > > + } > > + > > + return; > > +} > > + > > +=head3 path_is_relative($path) > > + > > +Returns C<1> if C<$path> is relative (doesn't start with a C</>). > > + > > +The opposite of C<L<< path_is_absolute()|/"path_is_absolute($path)" >>>. > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_is_relative : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + if ($path !~ m#^/#) { > > + return 1; > > + } > > + > > + return; > > +} > > + > > +=head3 path_components($path) > > + > > +Returns a list of the given C<$path>'s individual components. > > + > > +In scalar context, returns a reference to a list. > > + > > +The C<$path> is normalized a little during the parse: > > + > > +=over > > + > > +=item Repeated occurrences of C</> are removed, so C<foo/bar> and > > C<foo//bar> > > +both have C<foo> and C<bar> as components. > > + > > +=item Trailing slashes C</> are removed. > > + > > +=item Occurrences of C<.> are normalized away, except the first C<.> at > > +beginning of a path. This means that C<foo/bar>, C<foo/./bar>, > > C<foo/bar/.>, > > +C<foo/././bar/./.>, etc. all have C<foo> and C<bar> as components, while > > +C<./foo/bar>, C<./././foo/bar>, C<./foo/./bar/.> have C<.>, C<foo> and > > C<bar> > > +as components. > > + > > +=back > > + > > +No other normalization is performed to account for the possibility of > > symlinks > > +existing. This means that C<foo/baz> and C<foo/bar/../baz> are distinct > > (because > > +C<bar> could be a symlink and thus C<foo> isn't its parent). > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_components : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my $is_abs = path_is_absolute($path); > > + my $has_cur_dir = $path =~ m#^\.$|^\./#; > > + > > + my @components = split('/', $path); > > + my @normalized_components = (); > > + > > + for my $component (@components) { > > + next if $component eq '' || $component eq '.'; > > + > > + push(@normalized_components, $component); > > + } > > + > > + unshift(@normalized_components, '/') if $is_abs; > > ^ This case should probably also be explicitly mentioned in the docs. > (`path_file_name()` also relies on it)
Good catch! Will add that. > > > + unshift(@normalized_components, '.') if $has_cur_dir; > > + > > + return @normalized_components if wantarray; > > + return \@normalized_components; > > +} > > + > > + > > +=head3 path_join(@paths) > > + > > +Joins multiple paths together. All kinds of paths are supported. > > + > > +Does not perform any C<L<< normalization|/"path_normalize($path)" >>>. > > + > > +Throws an exception if any of the passed C<@paths> is C<undef>. > > This should probably point to the `path_push` documentation mentioning > the absolute path special case. Another good catch; I agree. > > > + > > +=cut > > + > > +sub path_join : prototype(@) { > > + my (@paths) = @_; > > + > > + if (!scalar(@paths)) { > > + return ''; > > + } > > + > > + croak "one of the provided paths is undef" if any { !defined($_) } > > @paths; > > + > > + # Find the last occurrence of a root directory and start conjoining the > > + # components from there onwards > > + my $index = scalar(@paths) - 1; > > + while ($index > 0) { > > + last if $paths[$index] =~ m#^/#; > > + $index--; > > + } > > + > > + @paths = @paths[$index .. (scalar(@paths) - 1)]; > > + > > + my $resulting_path = shift @paths; > > + > > + for my $path (@paths) { > > + $resulting_path = path_push($resulting_path, $path); > > + } > > + > > + return $resulting_path; > > +} > > + > > +=head3 path_normalize($path) > > + > > +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the > > given > > +C<$path>. > > + > > +This removes unnecessary components of a path that can be safely > > +removed, such as C<.>, trailing C</> or repeated occurrences of C</>. > > + > > +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become > > +C<foo/bar/baz>. > > + > > +B<Note:> This will I<not> remove components referencing the parent > > directory, > > +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will > > therefore > > +remain as they are. However, the parent directory of C</> is C</>, so > > +C</../../foo> will be normalized to C</foo>. > > + > > +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> > > failed. > > + > > +=cut > > + > > +sub path_normalize : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my $cleaned_path = eval { > > + File::Spec->canonpath($path); > > + }; > > + > > + croak "failed to clean up path: $@" if $@; > > + > > + return $cleaned_path; > > +} > > + > > +=head3 path_parent($path) > > + > > +Returns the given C<$path> without its final component, if there is one. > > + > > +Trailing and repeated occurrences of C</> and C<.> are normalized on the > > fly > > +when needed. This means that e.g. C<foo////bar///.//> becomes C<foo>, but > > +C<foo/.//bar//./baz> becomes C<foo/.//bar>. > > + > > +This function's behaviour is almost identical to Rust's > > +L<< > > Path::parent|https://doc.rust-lang.org/std/path/struct.Path.html#method.parent > > >>, > > +with a few adaptations made wherever Perl treats things differently: > > + > > +=over > > + > > +=item * C</foo/bar> becomes C</foo>, C<foo/bar> becomes C<foo>. > > + > > +=item * C</foo> becomes C</>. > > + > > +=item * C<foo/bar/..> becomes C<foo/bar>. > > + > > +=item * C<foo/../bar> becomes C<foo/..>. > > + > > +=item * C<foo> is interpreted as C<./foo> and becomes C<.>. This is > > because Perl's > > +C<L<File::Spec/canonpath>> interprets C<./foo> and C<foo> as the same > > thing. > > + > > +=item * C</> and an I<empty string> result in C<undef> being returned. > > + > > +=item * C<.> results in an empty string. > > ^ Like in `path_components()`, this should mention symlinks, since the > parent of `/foo/..` is not necessarily `/foo` in the file system, but > will be here. I agree with this as well; will add a note to the docs. > > > + > > +=back > > + > > +Will raise an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_parent : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + if ($path eq '') { > > + return; > > + } > > + > > + # A limit of -1 retains empty components at the end > > + my @components = split('/', $path, -1); > > ^ Meh, if we didn't mind normalizing all multi-slash occurrances we could > just split on `m@/+@` :S We could, but I'd rather keep it as it is; one can always use path_normalize to get rid of those. > > > + > > + # Trim off needless extra components until actual final component is > > encountered, e.g. > > + # foo////bar////baz//// -> foo////bar////baz > > + # /// -> / > > + # ././//.///./ -> . > > + while (scalar(@components) > 1 && ($components[-1] eq '' || > > $components[-1] eq '.')) { > > + pop(@components); > > + } > > + > > + my $final_component = pop(@components); > > + > > + # We had a root dir with needless extra components, e.g. "//" or > > "////" or "//.///./" etc. > > + return if $final_component eq ''; > > + > > + # We had a current dir reference with needless extra components, e.g. > > + # "././" or ".///////" or "./././//./././//" etc. > > + return '' if $final_component eq '.'; > > + > > + # We had some other kind of single component like "foo", "bar" or "..", > > + # and because File::Spec->canonpath treats "foo" and "./foo" the same, > > + # return a single current dir reference > > + return '.' if !scalar(@components); > > + > > + # Trim off needless extra components until actual parent component is > > encountered, like above > > + while (scalar(@components) > 1 && ($components[-1] eq '' || > > $components[-1] eq '.')) { > > + pop(@components); > > + } > > + > > + # Handle lone root dir (@components with only one empty string) > > + if (scalar(@components) == 1 && $components[0] eq '') { > > + return '/'; > > + } > > + > > + return join('/', @components); > > +} > > + > > +=head3 path_push($path, $other) > > + > > +Extends C<$path> with C<$other>, returning a new path. > > + > > +If C<$other> is absolute, it will be returned instead. > > I was never a fan of this. But I guess we cannot change that in rust, so > for the sake of consistency... > > > + > > +Throws an exception if any of the arguments is C<undef>. > > + > > +=cut > > + > > +sub path_push : prototype($$) { > > + my ($path, $other) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + croak "\$other is undef" if !defined($other); > > + > > + return $other if path_is_absolute($other); > > + return $path if $other eq ''; > > (^ Could swap the above 2 checks) ACK > > > + > > + my $need_sep = $path ne '' && $path !~ m#/$#; > > + > > + $path .= "/" if $need_sep; > > + $path .= $other; > > + > > + return $path; > > +} > > + > > +=head3 path_pop($path) > > + > > +Alias for C<L<< path_parent()|/"path_parent($path)" >>>. > > + > > +=cut > > + > > +sub path_pop : prototype($) { > > + my ($path) = @_; > > + return path_parent($path); > > +} > > + > > +=head3 path_file_name($path) > > + > > +Returns the last component of the given C<$path>, if it is a legal file > > name, > > +or C<undef> otherwise. > > + > > +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component, > > +there is no valid file name. > > + > > +B<Note:> This does not check whether the given C<$path> actually points to > > a > > +file or a directory etc. > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_file_name : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my @components = path_components($path); > > + > > + if (!scalar(@components)) { > > + return; > > + } > > + > > + if ( > > + scalar(@components) == 1 > > + && ($components[0] eq '/' || $components[0] eq '.') > > + ) { > > + return; > > + } > > + > > + if ($components[-1] eq '..') { > > + return; > > + } > > + > > + return $components[-1]; > > +} > > + > > +=head3 path_with_file_name($path, $file_name) > > + > > +Returns C<$path> with C<$file_name> as the new last component. > > + > > +This is essentially like calling C<L<< path_parent()|/"path_parent($path)" > > >>> > > +and using C<L<< path_push()|/"path_push($path, $other)" >>> to append the > > new > > +file name, but handles a few extra cases: > > + > > +=over > > + > > +=item * If C<$path> is C</>, appends C<$file_name>. > > + > > +=item * If C<$path> is an empty string, appends C<$file_name>. > > + > > +=item * If C<$path> ends with a parent directory reference (C<..>), > > replaces it > > +with C<$file_name>. > > + > > +=back > > + > > +Throws an exception if any of the arguments is C<undef> or if C<$file_name> > > +contains a path separator (C</>). > > + > > +=cut > > + > > +sub path_with_file_name : prototype($$) { > > + my ($path, $file_name) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + croak "\$file_name is undef" if !defined($file_name); > > + croak "\$file_name contains a path separator: $file_name" if > > $file_name =~ m|/|; > > + > > + my $parent = path_parent($path); > > + > > + # undef means that $path was either "" or "/", so we can just append > > to it > > + return ($path . $file_name) if !defined($parent); > > + > > + # Because the parent of "foo" is ".", return $file_name to stay > > consistent. > > + # Otherwise, we'd end up with a current path ref prepended > > ("./$file_name") > > + if ($parent eq '.' && $path !~ m|/|) { > > + return $file_name; > > + } > > + > > + return path_push($parent, $file_name); > > +} > > + > > +my sub _path_file_prefix : prototype($) { > > + my ($file_name) = @_; > > + > > + confess "\$file_name is undef" if !defined($file_name); > > + > > + $file_name =~ s|^(\.*[^\.]*)||; > > + my $prefix = $1; > > + > > + # sanity check > > + confess "\$prefix not matched" if !defined($prefix); > > ^ For the above 4 lines you should be able to just test the `=~` > directly: > > confess "..." if $file_name !~ s|^(\.*[^\.]*)||; > > (note =~ -> !~) Ah, another good catch! Will incorporate this in v3 also. > > Would it make sense to instead *match* those parts instead of s// and > return the extension as a 3rd value? I think `path_with_file_prefix()` > could use this instead of building and joining a suffix list? > > No hard feelings, though. Hmm, I'll give it a shot. > > > + > > + return ($prefix, $file_name); > > +} > > + > > +=head3 path_file_prefix($path) > > + > > +Returns the prefix of the file name of the given C<$path>. If the C<$path> > > does > > +not have a valid file name and thus no prefix, C<undef> is returned > > instead. > > + > > +The prefix of a file name is the part before any extensions (suffixes). > > + > > + my $prefix = path_file_prefix("/etc/resolv.conf"); > > + # resolv > > + > > + my $prefix = path_file_prefix("/tmp/archive.tar.zst"); > > + # archive > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_file_prefix : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my $file_name = path_file_name($path); > > + return undef if !defined($file_name); > > + > > + my ($prefix, undef) = _path_file_prefix($file_name); > > + return $prefix; > > +} > > + > > +=head3 path_with_file_prefix($path, $prefix) > > + > > +Returns C<$path> with a new C<$prefix>. This is similar to > > +C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>, > > +except that the file prefix is replaced or appended. > > + > > +If C<$path> does not have a file name or if C<$prefix> is an empty string, > > +C<undef> is returned. > > + > > + my $new_path = path_with_file_prefix("/tmp/archive.tar.zst", "backup"); > > + # /tmp/backup.tar.zst > > + > > + my $new_path = path_with_file_prefix("/etc/pve", "ceph"); > > + # /etc/ceph > > + > > +Throws an exception if any of the arguments is C<undef>, or if C<$prefix> > > +contains a path separator (C</>), ends with C<.>, or is an empty string. > > + > > +=cut > > + > > +sub path_with_file_prefix : prototype($$) { > > + my ($path, $prefix) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + croak "\$prefix is undef" if !defined($prefix); > > + croak "\$prefix contains a path separator" if $prefix =~ m|/|; > > + croak "\$prefix ends with a dot" if $prefix =~ m|\.$|; > > + > > + return undef if $prefix eq ''; > > + return undef if !defined(path_file_name($path)); > > + > > + return $path if $prefix eq ''; > > + > > + my $parent = path_parent($path); > > + > > + # sanity check -- should not happen because we checked for file name, > > + # and the existence of a file name implies there's a parent > > + confess "parent of \$path is undef" if !defined($parent); > > + > > + my @suffixes = path_file_suffixes($path); > > + > > + my $file_name = join(".", $prefix, @suffixes); > > + > > + # Because the parent of "foo" is ".", return $file_name to stay > > consistent. > > + # Otherwise, we'd end up with a current path ref prepended > > ("./$file_name") > > + # (Done also in path_with_file_name) > > + if ($parent eq '.' && $path !~ m|/|) { > > + return $file_name; > > + } > > + > > + return path_push($parent, $file_name); > > +} > > + > > +my sub _path_file_suffixes : prototype($) { > > + my ($file_name_no_prefix) = @_; > > + > > + confess "\$file_name_no_prefix is undef" if > > !defined($file_name_no_prefix); > > + > > + # Suffixes are extracted "manually" because join()ing the result of > > split() > > + # results in a different file name than the original. Let's say you > > have a > > + # file named "foo.bar.". The correct suffixes would be ("bar", ""). > > + # With split, you get the following: > > + # split(/\./, ".bar.") --> ("", "bar") --> join()ed to > > "foo..bar" > > + # split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to > > "foo..bar." > > + my @suffixes = (); > > + while ($file_name_no_prefix =~ s|^(\.[^\.]*)||) { > > + my $suffix = $1; > > + $suffix =~ s|^\.||; > > + push(@suffixes, $suffix); > > + } > > + > > + return @suffixes; > > +} > > + > > +=head3 path_file_suffixes($path) > > + > > +Returns the suffixes of the C<$path>'s file name as a list. If the > > C<$path> does > > +not have a valid file name, an empty list is returned instead. > > + > > +In scalar context, returns a reference to a list. > > (^ Isn't this a bit awkward?) Well, do you think it is? I've been using that kind of "return style" here and there and have been liking it, but if it's weird to others, I can adapt it. :P > > > + > > +The suffixes of a path are essentially the file name's extensions, the > > parts > > +that come after the L<< prefix|/"path_file_prefix($path)" >>. > > + > > + my @suffixes = path_file_suffixes("/etc/resolv.conf"); > > + # ("conf") > > + > > + my $suffixes = path_file_prefix("/tmp/archive.tar.zst"); > > + # ["tar", "zst"] > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_file_suffixes : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my $file_name = path_file_name($path); > > + if (!defined($file_name)) { > > + return wantarray ? () : []; > > + } > > + > > + (undef, $file_name) = _path_file_prefix($file_name); > > + > > + my @suffixes = _path_file_suffixes($file_name); > > + > > + return wantarray ? @suffixes : \@suffixes; > > +} > > + > > +=head3 path_with_file_suffixes($path, @suffixes) > > + > > +Returns C<$path> with new C<@suffixes>. This is similar to > > +C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>, > > +except that the suffixes of the file name are replaced. > > + > > +If the C<$path> does not have a file name, C<undef> is returned. > > + > > + my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "pxar", > > "gz"); > > + # /tmp/archive.pxar.gz > > + > > + my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "gz"); > > + # /tmp/archive.gz > > + > > +If the file name has no suffixes, the C<@suffixes> are appended instead: > > + > > + my $new_path = path_with_file_suffixes("/etc/resolv", "conf"); > > + # /etc/resolv.conf > > + > > + my $new_path = path_with_file_suffixes("/etc/resolv", "conf", "zst"); > > + # /etc/resolv.conf.zst > > + > > +If there are no C<@suffixes> provided, the file name's suffixes will > > +be removed (if there are any): > > + > > + my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst"); > > + # /tmp/archive > > + > > +Note that an empty string is still a valid suffix (an "empty" file ending): > > + > > + my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "", "", > > "", "zst"); > > + # /tmp/archive....zst > > + > > +Throws an exception if C<$path> or any of the C<@suffixes> is C<undef>, or > > +if any suffix contains a path separator (C</>) or a C<.>. > > + > > +=cut > > + > > +sub path_with_file_suffixes : prototype($@) { > > + my ($path, @suffixes) = @_; > > I am questioning a bit the sanity of having "suffixes" throughout this > module instead of simply an "extension" that covers them all and can be > split if needed... > > Do we have/anticipate particular use cases where this is more > convenient? To be really honest, it was 50/50 on the "extension" vs "suffixes" decision. I then opted for suffixes instead, because they seemed to be less ambiguous than an "extension". Let me elaborate. Let's say we have a file called "foo.tar.gz" -- what would its extension be? Some might say that ".tar.gz" is its extension, while some others might say ".gz"; both have different reasonings but aren't necessarily more or less correct than the other. In our case, both you and I would say that ".tar.gz" is the extension, but... Rust's std::path::Path::extension [1] would return "gz" for the above, while C++'s std::filesystem::path::extension [2] returns ".gz" (but they don't have such a case in their docs!). Java has java.nio.file.Path, but conveniently doesn't care about extensions. Kotlin fixest his by adding an "extension" property to that class, which returns "gz" [3]. Python's pathlib is the only one I've seen go the "suffix route" and provides the pathlib.PurePath.suffix and .suffixes methods [4]. This type of approach also came up on the (Rust) tracking issue for Path::file_prefix [5]. When I was looking up all that, I decided to go for the prefix + suffix(es) route, as that just seemed to be the most unambiguous. Convenience wasn't really a factor here, because there's the "path_with_file_* family" of functions that should handle most of the replacement cases. (Besides, I found it quite nice that I could join() on the prefix + suffixes (e.g. join(".", "foo", "tar", "gz")) that I first extracted with path_file_prefix() and path_file_suffixes(); I like that there's an "inverse" operation.) *But,* I wouldn't be opposed to adding a function that just returns "tar.gz" for the above case. Perhaps with a different name though :P [1] https://doc.rust-lang.org/std/path/struct.Path.html#method.extension [2] https://en.cppreference.com/w/cpp/filesystem/path/extension [3] https://github.com/JetBrains/kotlin/blob/rrr/2.1.0/core-docs/libraries/stdlib/jdk7/src/kotlin/io/path/PathUtils.kt#L46 [4] https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffix [5] https://github.com/rust-lang/rust/issues/86319 > > > + > > + croak "\$path is undef" if !defined($path); > > + croak "one of the provided suffixes is undef" > > + if any { !defined($_) } @suffixes; > > + croak "one of the provided suffixes contains a path separator" > > + if any { $_ =~ m|/| } @suffixes; > > + croak "one of the provided suffixes contains a dot" > > + if any { $_ =~ m|\.| } @suffixes; > > + > > + return undef if !defined(path_file_name($path)); > > + > > + my $parent = path_parent($path); > > + > > + # sanity check -- should not happen because we checked for file name, > > + # and the existence of a file name implies there's a parent > > + confess "parent of \$path is undef" if !defined($parent); > > + > > + # Don't modify $path if there are no suffixes to be removed > > + my @existing_suffixes = path_file_suffixes($path); > > + return $path if !scalar(@suffixes) && !scalar(@existing_suffixes); > > + > > + my $prefix = path_file_prefix($path); > > + > > + # sanity check > > + confess "\$prefix is undef" if !defined($prefix); > > + > > + my $file_name = join(".", $prefix, @suffixes); > > + > > + # Because the parent of "foo" is ".", return $file_name to stay > > consistent. > > + # Otherwise, we'd end up with a current path ref prepended > > ("./$file_name") > > + # (Done also in path_with_file_name) > > + if ($parent eq '.' && $path !~ m|/|) { > > + return $file_name; > > + } > > + > > + return path_push($parent, $file_name); > > +} > > + > > +=head3 path_file_suffix($path) > > + > > +Returns the suffix of the C<$path>'s file name. If the C<$path> does not > > have a > > +valid file name or if the file name has no suffix, C<undef> is returned > > +instead. > > + > > +The suffix of a file name is essentially its extension, e.g. > > +C</etc/resolv.conf> has the suffix C<conf>. If there are multiple suffixes, > > +only the last will be returned; e.g. C</tmp/archive.tar.gz> has the suffix > > C<gz>. > > + > > +B<Note:> Files like e.g. C</tmp/foo.> have an empty string as suffix. > > + > > +For getting all suffixes of a path, see C<L<< > > path_file_suffixes()|/"path_file_suffixes($path)" >>>. > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_file_suffix : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my $file_name = path_file_name($path); > > + return undef if !defined($file_name); > > + > > + (undef, $file_name) = _path_file_prefix($file_name); > > + > > + my @suffixes = _path_file_suffixes($file_name); > > + > > + return pop(@suffixes); > > +} > > + > > +=head3 path_with_file_suffix($path, $suffix) > > + > > +Returns C<$path> with a new C<$suffix>. This is similar to > > +C<L<< path_with_file_suffixes()|/"path_with_file_suffixes($path, > > @suffixes)" >>>, > > +except that only the last suffix of the file name is replaced. > > + > > +If the C<$path> does not have a file name, C<undef> is returned. > > + > > + my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "gz"); > > + # /tmp/archive.tar.gz > > + > > +If the file name has no suffixes, the C<$suffix> is appended instead: > > + > > + my $new_path = path_with_file_suffix("/etc/resolv", "conf"); > > + # /etc/resolv.conf > > + > > +If C<$suffix> is C<undef>, the file name's (last) suffix will be removed > > (if > > +there is one): > > + > > + my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", undef); > > + # /tmp/archive.tar > > + > > + my $new_path = path_with_file_suffix("/etc/resolv", undef); > > + # /etc/resolv > > + > > +Note that an empty string is still a valid suffix (an "empty" file ending): > > + > > + my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", ""); > > + # /tmp/archive.tar. > > + > > + my $new_path = path_with_file_suffix("/etc/resolv", ""); > > + # /etc/resolv. > > + > > +Throws an exception if any of the arguments is C<undef>, or if C<$suffix> > > +contains a path separator (C</>) or a C<.>. > > + > > +=cut > > + > > +sub path_with_file_suffix : prototype($$) { > > + my ($path, $suffix) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + if (defined($suffix)) { > > + croak "\$suffix contains a path separator" if $suffix =~ m|/|; > > + croak "\$suffix contains a dot" if $suffix =~ m|\.|; > > + } > > + > > + return undef if !defined(path_file_name($path)); > > + > > + my $parent = path_parent($path); > > + > > + # sanity check -- should not happen because we checked for file name, > > + # and the existence of a file name implies there's a parent > > + confess "parent of \$path is undef" if !defined($parent); > > + > > + my @suffixes = path_file_suffixes($path); > > + > > + # Don't modify $path if there is no suffix to be removed > > + return $path if !scalar(@suffixes) && !defined($suffix); > > + > > + pop(@suffixes); > > + push(@suffixes, $suffix) if defined($suffix); > > + > > + my $prefix = path_file_prefix($path); > > + > > + # sanity check > > + confess "\$prefix is undef" if !defined($prefix); > > + > > + my $file_name = join(".", $prefix, @suffixes); > > + > > + # Because the parent of "foo" is ".", return $file_name to stay > > consistent. > > + # Otherwise, we'd end up with a current path ref prepended > > ("./$file_name") > > + # (Done also in path_with_file_name) > > + if ($parent eq '.' && $path !~ m|/|) { > > + return $file_name; > > + } > > + > > + return path_push($parent, $file_name); > > +} > > + > > +=head3 path_file_parts($path) > > + > > +Returns the parts that constitute the file name (prefix and suffixes) of a > > +C<$path> as a list. If the C<$path> does not have a valid file name, an > > empty > > +list is returned instead. > > + > > +In scalar context, returns a reference to a list. > > + > > +These parts are split in such a way that allows them to be C<join>ed > > together, > > +resulting in the original file name of the given C<$path> again. > > + > > + my $file_parts = path_file_parts("/etc/pve/firewall/cluster.fw"); > > + # ("cluster", "fw") > > + my $file_name = join(".", $file_parts->@*); > > + > > + my @file_parts = path_file_parts("/tmp/archive.tar.gz"); > > + # ("archive", "tar", "gz") > > + my $file_name = join(".", @file_parts); > > + > > +Throws an exception if C<$path> is C<undef>. > > + > > +=cut > > + > > +sub path_file_parts : prototype($) { > > + my ($path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + > > + my $file_name = path_file_name($path); > > + if (!defined($file_name)) { > > + return wantarray ? () : []; > > + } > > + > > + my $prefix; > > + ($prefix, $file_name) = _path_file_prefix($file_name); > > + > > + my @suffixes = _path_file_suffixes($file_name); > > + > > + my @file_parts = ($prefix, @suffixes); > > + > > + return wantarray ? @file_parts : \@file_parts; > > +} > > + > > +=head3 path_starts_with($path, $other_path) > > + > > +Checks whether a C<$path> starts with the components of C<$other_path>. > > + > > + my $starts_with = path_starts_with("/etc/pve/firewall/cluster.fw", > > "/etc/pve"); > > + # 1 > > + > > +Throws an exception if any of the arguments is C<undef>. > > + > > +=cut > > + > > +sub path_starts_with : prototype($$) { > > + my ($path, $other_path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + croak "\$other_path if undef" if !defined($other_path); > > + > > + # Nothing starts with nothing > > + return 1 if ($path eq '' && $other_path eq ''); > > + > > + # Nothing cannot start with something > > + # Something cannot start with nothing > > + return if ($path eq '' || $other_path eq ''); > > + > > + my @components = path_components($path); > > + my @other_components = path_components($other_path); > > + > > + my @pairs = zip_shortest(\@components, \@other_components); > > + > > + # for my ($comp, $other_comp) (@pairs) is experimental > > + for my $pair (@pairs) { > > + my ($comp, $other_comp) = $pair->@*; > > + > > + if ($comp ne $other_comp) { > > + return; > > + } > > + } > > + > > + return 1; > > +} > > + > > +=head3 path_ends_with($path, $other_path) > > + > > +Checks whether a C<$path> ends with the components of C<$other_path>. > > + > > + my $ends_with = path_ends_with("/etc/pve/firewall/cluster.fw", > > "firewall/cluster.fw"); > > + # 1 > > + > > +Throws an exception if any of the arguments is C<undef>. > > + > > +=cut > > + > > +sub path_ends_with : prototype($$) { > > + my ($path, $other_path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + croak "\$other_path if undef" if !defined($other_path); > > + > > + # Nothing ends with nothing > > + return 1 if ($path eq '' && $other_path eq ''); > > + > > + # Nothing cannot end with something > > + # Something cannot end with nothing > > + return if ($path eq '' || $other_path eq ''); > > + > > + my @components_rev = reverse(path_components($path)); > > + my @other_components_rev = reverse(path_components($other_path)); > > + > > + my @pairs_rev = zip_shortest(\@components_rev, \@other_components_rev); > > + > > + # for my ($comp, $other_comp) (@pairs_rev) is experimental > > + for my $pair (@pairs_rev) { > > + my ($comp, $other_comp) = $pair->@*; > > + > > + if ($comp ne $other_comp) { > > + return; > > + } > > + } > > + > > + return 1; > > +} > > + > > +=head3 path_equals($path, $other_path) > > + > > +Checks whether C<$path> equals C<$other_path>. The paths are compared > > +by their components, meaning that it's not necessary to > > +L<< normalize|/"path_normalize($path)" >> them beforehand. > > + > > +=cut > > + > > +sub path_equals : prototype($$) { > > + my ($path, $other_path) = @_; > > + > > + croak "\$path is undef" if !defined($path); > > + croak "\$other_path if undef" if !defined($other_path); > > + > > + # Nothing is nothing > > + return 1 if ($path eq '' && $other_path eq ''); > > + > > + # Nothing is not something > > + # Something is not nothing > > + return if ($path eq '' || $other_path eq ''); > > + > > + my @components = path_components($path); > > + my @other_components = path_components($other_path); > > + > > + return if scalar(@components) != scalar(@other_components); > > + > > + my @pairs = zip_longest(\@components, \@other_components); > > + > > + # for my ($comp, $other_comp) (@pairs_rev) is experimental > > + for my $pair (@pairs) { > > + my ($comp, $other_comp) = $pair->@*; > > + > > + return if !defined($comp) || !defined($other_comp); > > + > > + if ($comp ne $other_comp) { > > + return; > > + } > > + } > > + > > + return 1; > > +} > > + > > +1; > > -- > > 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel