Re: [pve-devel] [PATCH v2 pve-common 01/12] introduce PVE::Path

Max Carrara Thu, 09 Jan 2025 01:56:40 -0800

On Wed Jan 8, 2025 at 3:05 PM CET, Wolfgang Bumiller wrote:
> On Fri, Dec 20, 2024 at 07:51:56PM +0100, Max Carrara wrote:
> > The PVE::Path module concerns itself with file / directory path
> > operations, like getting the parent directory of a path, extracting
> > the file name of a path, splitting a path into its individual
> > components, joining path components together, comparing paths, and so
> > on.
> > 
> > This module is added here in order to address the shortcomings of the
> > Perl core modules (such as lacking a lot the aforementioned
> > functionalities) without relying on some kind of object-based
> > abstraction.
> > 
> > PVE::Path strictly only contains functions that manipulate file paths
> > without meddling with the filesystem. The reasoning here is to be able
> > to just import any function when its necessary without having to
> > change the surrounding code in order to adapt to some kind of
> > abstraction.
> > 
> > The other motivation for this module is that path manipulation
> > operations have been getting more and more common recently, especially
> > in regards to storage.
> > 
> > Signed-off-by: Max Carrara <m.carr...@proxmox.com>
> > ---
> > Changes v1 --> v2:
> >   * Improve some wording in the docstring of path_components
> >   * Simplify some logic in path_parent and remove an unnecessary sanity
> >     check
> >   * Actually treat "foo" as "./foo" in path_parent as mentioned in the
> >     docstring -- This means that path_parent("foo") now returns "."
> >     instead of "".
> >   * Adapt the path_with_file_* functions to the above accordingly, so
> >     that e.g. path_with_file_name("foo", "bar") returns "bar" instead of
> >     "./bar".
> >   * Improve the "boolean" behaviour of path_is_absolute and
> >     path_is_absolute and return 1 when true, but use an empty return
> >     when false.
> >     - An empty return means "undef" in scalar context and an empty list
> >       in list context, so those functions will always return something
> >       that's correctly truthy or falsy for Perl, regardless of context
> > 
> >  src/Makefile    |   1 +
> >  src/PVE/Path.pm | 987 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 988 insertions(+)
> >  create mode 100644 src/PVE/Path.pm
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index 2d8bdc4..25bc490 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -23,6 +23,7 @@ LIB_SOURCES = \
> >     LDAP.pm \
> >     Network.pm \
> >     OTP.pm \
> > +   Path.pm \
> >     PBSClient.pm \
> >     PTY.pm \
> >     ProcFSTools.pm \
> > diff --git a/src/PVE/Path.pm b/src/PVE/Path.pm
> > new file mode 100644
> > index 0000000..221e662
> > --- /dev/null
> > +++ b/src/PVE/Path.pm
> > @@ -0,0 +1,987 @@
> > +=head1 NAME
> > +
> > +C<PVE::Path> - Utilities related to handling file and directory paths
> > +
> > +=head1 DESCRIPTION
> > +
> > +This module provides functions concerned with file and directory path
> > +manipulation.
> > +
> > +None of the functions provided alter the filesystem in any way.
> > +
> > +The reason for this module's existence is to address a couple shortcomings:
> > +
> > +=over
> > +
> > +=item 1. The Perl core modules lack most of what is required for 
> > manipulating
> > +paths, for example getting the parent directory of a path, extracting the
> > +prefix of a file name (the "stem"), extracting the suffixes of a file name 
> > (the
> > +"endings" or "extensions"), checking whether two paths are the same, and 
> > so on.
> > +
> > +=item 2. If the Perl core modules provide something in that regard, it's 
> > usually
> > +provided in a not very ergonomic manner (L<C<File::Basename>>).
> > +
> > +=item 3. Additionally, the path utilities of the core modules are scattered
> > +across multiple modules, making them hard to discover.
> > +
> > +=item 4. Third-party libraries on CPAN mostly provide objects representing
> > +paths. Using any of these would require fundamental changes on how file 
> > paths
> > +are handled throughout our code, for almost no benefit.
> > +
> > +=back
> > +
> > +C<L<PVE::Path>> instead does without objects and strictly provides 
> > functions
> > +for path manipulation only. Any operation that is needed can simply be
> > +performed ad hoc by importing the corresponding function and doesn't 
> > require
> > +the surrounding code to conform to an abstraction like a path object.
> > +
> > +Additionally, some of the core modules' functionality is re-exported or
> > +re-implemented for ergonomic or logical purposes. The goal is to provide
> > +functions that don't come with any surprises and just behave like one 
> > assumes
> > +they would.
> > +
> > +This module takes inspiration from Rust's C<std::path> and Python's 
> > C<pathlib>,
> > +which are more modern path manipulation libraries.
> > +
> > +=head1 LIMITATIONS
> > +
> > +This module is limited to manipulating Unix-like / Linux file paths.
> > +
> > +=cut
> > +
> > +package PVE::Path;
> > +
> > +use strict;
> > +use warnings;
> > +
> > +use Carp qw(carp croak confess);
> > +use File::Spec ();
> > +use List::Util qw(any zip_shortest zip_longest);
> > +
> > +use Exporter qw(import);
> > +
> > +our @EXPORT_OK = qw(
> > +    path_is_absolute
> > +    path_is_relative
> > +
> > +    path_components
> > +    path_join
> > +
> > +    path_normalize
> > +
> > +    path_parent
> > +    path_push
> > +    path_pop
> > +
> > +    path_file_name
> > +    path_with_file_name
> > +
> > +    path_file_prefix
> > +    path_with_file_prefix
> > +
> > +    path_file_suffixes
> > +    path_with_file_suffixes
> > +
> > +    path_file_suffix
> > +    path_with_file_suffix
> > +
> > +    path_file_parts
> > +
> > +    path_starts_with
> > +    path_ends_with
> > +    path_equals
> > +);
> > +
> > +=head2 FUNCTIONS
> > +
> > +=cut
> > +
> > +=head3 path_is_absolute($path)
> > +
> > +Returns C<1> if C<$path> is absolute (starts with a C</>).
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_is_absolute : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    if ($path =~ m#^/#) {
> > +   return 1;
> > +    }
> > +
> > +    return;
> > +}
> > +
> > +=head3 path_is_relative($path)
> > +
> > +Returns C<1> if C<$path> is relative (doesn't start with a C</>).
> > +
> > +The opposite of C<L<< path_is_absolute()|/"path_is_absolute($path)" >>>.
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_is_relative : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    if ($path !~ m#^/#) {
> > +   return 1;
> > +    }
> > +
> > +    return;
> > +}
> > +
> > +=head3 path_components($path)
> > +
> > +Returns a list of the given C<$path>'s individual components.
> > +
> > +In scalar context, returns a reference to a list.
> > +
> > +The C<$path> is normalized a little during the parse:
> > +
> > +=over
> > +
> > +=item Repeated occurrences of C</> are removed, so C<foo/bar> and 
> > C<foo//bar>
> > +both have C<foo> and C<bar> as components.
> > +
> > +=item Trailing slashes C</> are removed.
> > +
> > +=item Occurrences of C<.> are normalized away, except the first C<.> at
> > +beginning of a path. This means that C<foo/bar>, C<foo/./bar>, 
> > C<foo/bar/.>,
> > +C<foo/././bar/./.>, etc. all have C<foo> and C<bar> as components, while
> > +C<./foo/bar>, C<./././foo/bar>, C<./foo/./bar/.> have C<.>, C<foo> and 
> > C<bar>
> > +as components.
> > +
> > +=back
> > +
> > +No other normalization is performed to account for the possibility of 
> > symlinks
> > +existing. This means that C<foo/baz> and C<foo/bar/../baz> are distinct 
> > (because
> > +C<bar> could be a symlink and thus C<foo> isn't its parent).
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_components : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my $is_abs = path_is_absolute($path);
> > +    my $has_cur_dir = $path =~ m#^\.$|^\./#;
> > +
> > +    my @components = split('/', $path);
> > +    my @normalized_components = ();
> > +
> > +    for my $component (@components) {
> > +   next if $component eq '' || $component eq '.';
> > +
> > +   push(@normalized_components, $component);
> > +    }
> > +
> > +    unshift(@normalized_components, '/') if $is_abs;
>
> ^ This case should probably also be explicitly mentioned in the docs.
> (`path_file_name()` also relies on it)


Good catch! Will add that.

>
> > +    unshift(@normalized_components, '.') if $has_cur_dir;
> > +
> > +    return @normalized_components if wantarray;
> > +    return \@normalized_components;
> > +}
> > +
> > +
> > +=head3 path_join(@paths)
> > +
> > +Joins multiple paths together. All kinds of paths are supported.
> > +
> > +Does not perform any C<L<< normalization|/"path_normalize($path)" >>>.
> > +
> > +Throws an exception if any of the passed C<@paths> is C<undef>.
>
> This should probably point to the `path_push` documentation mentioning
> the absolute path special case.

Another good catch; I agree.

>
> > +
> > +=cut
> > +
> > +sub path_join : prototype(@) {
> > +    my (@paths) = @_;
> > +
> > +    if (!scalar(@paths)) {
> > +   return '';
> > +    }
> > +
> > +    croak "one of the provided paths is undef" if any { !defined($_) } 
> > @paths;
> > +
> > +    # Find the last occurrence of a root directory and start conjoining the
> > +    # components from there onwards
> > +    my $index = scalar(@paths) - 1;
> > +    while ($index > 0) {
> > +   last if $paths[$index] =~ m#^/#;
> > +   $index--;
> > +    }
> > +
> > +    @paths = @paths[$index .. (scalar(@paths) - 1)];
> > +
> > +    my $resulting_path = shift @paths;
> > +
> > +    for my $path (@paths) {
> > +   $resulting_path = path_push($resulting_path, $path);
> > +    }
> > +
> > +    return $resulting_path;
> > +}
> > +
> > +=head3 path_normalize($path)
> > +
> > +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the 
> > given
> > +C<$path>.
> > +
> > +This removes unnecessary components of a path that can be safely
> > +removed, such as C<.>, trailing C</> or repeated occurrences of C</>.
> > +
> > +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
> > +C<foo/bar/baz>.
> > +
> > +B<Note:> This will I<not> remove components referencing the parent 
> > directory,
> > +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will 
> > therefore
> > +remain as they are. However, the parent directory of C</> is C</>, so
> > +C</../../foo> will be normalized to C</foo>.
> > +
> > +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> 
> > failed.
> > +
> > +=cut
> > +
> > +sub path_normalize : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my $cleaned_path = eval {
> > +   File::Spec->canonpath($path);
> > +    };
> > +
> > +    croak "failed to clean up path: $@" if $@;
> > +
> > +    return $cleaned_path;
> > +}
> > +
> > +=head3 path_parent($path)
> > +
> > +Returns the given C<$path> without its final component, if there is one.
> > +
> > +Trailing and repeated occurrences of C</> and C<.> are normalized on the 
> > fly
> > +when needed. This means that e.g. C<foo////bar///.//> becomes C<foo>, but
> > +C<foo/.//bar//./baz> becomes C<foo/.//bar>.
> > +
> > +This function's behaviour is almost identical to Rust's
> > +L<< 
> > Path::parent|https://doc.rust-lang.org/std/path/struct.Path.html#method.parent
> >  >>,
> > +with a few adaptations made wherever Perl treats things differently:
> > +
> > +=over
> > +
> > +=item * C</foo/bar> becomes C</foo>, C<foo/bar> becomes C<foo>.
> > +
> > +=item * C</foo> becomes C</>.
> > +
> > +=item * C<foo/bar/..> becomes C<foo/bar>.
> > +
> > +=item * C<foo/../bar> becomes C<foo/..>.
> > +
> > +=item * C<foo> is interpreted as C<./foo> and becomes C<.>. This is 
> > because Perl's
> > +C<L<File::Spec/canonpath>> interprets C<./foo> and C<foo> as the same 
> > thing.
> > +
> > +=item * C</> and an I<empty string> result in C<undef> being returned.
> > +
> > +=item * C<.> results in an empty string.
>
> ^ Like in `path_components()`, this should mention symlinks, since the
> parent of `/foo/..` is not necessarily `/foo` in the file system, but
> will be here.

I agree with this as well; will add a note to the docs.

>
> > +
> > +=back
> > +
> > +Will raise an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_parent : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    if ($path eq '') {
> > +   return;
> > +    }
> > +
> > +    # A limit of -1 retains empty components at the end
> > +    my @components = split('/', $path, -1);
>
> ^ Meh, if we didn't mind normalizing all multi-slash occurrances we could 
> just split on `m@/+@` :S

We could, but I'd rather keep it as it is; one can always use
path_normalize to get rid of those.

>
> > +
> > +    # Trim off needless extra components until actual final component is 
> > encountered, e.g.
> > +    # foo////bar////baz//// -> foo////bar////baz
> > +    # /// -> /
> > +    # ././//.///./ -> .
> > +    while (scalar(@components) > 1 && ($components[-1] eq '' || 
> > $components[-1] eq '.')) {
> > +   pop(@components);
> > +    }
> > +
> > +    my $final_component = pop(@components);
> > +
> > +    # We had a root dir with needless extra components, e.g. "//" or 
> > "////" or "//.///./" etc.
> > +    return if $final_component eq '';
> > +
> > +    # We had a current dir reference with needless extra components, e.g.
> > +    # "././" or ".///////" or "./././//./././//" etc.
> > +    return '' if $final_component eq '.';
> > +
> > +    # We had some other kind of single component like "foo", "bar" or "..",
> > +    # and because File::Spec->canonpath treats "foo" and "./foo" the same,
> > +    # return a single current dir reference
> > +    return '.' if !scalar(@components);
> > +
> > +    # Trim off needless extra components until actual parent component is 
> > encountered, like above
> > +    while (scalar(@components) > 1 && ($components[-1] eq '' || 
> > $components[-1] eq '.')) {
> > +   pop(@components);
> > +    }
> > +
> > +    # Handle lone root dir (@components with only one empty string)
> > +    if (scalar(@components) == 1 && $components[0] eq '') {
> > +   return '/';
> > +    }
> > +
> > +    return join('/', @components);
> > +}
> > +
> > +=head3 path_push($path, $other)
> > +
> > +Extends C<$path> with C<$other>, returning a new path.
> > +
> > +If C<$other> is absolute, it will be returned instead.
>
> I was never a fan of this. But I guess we cannot change that in rust, so
> for the sake of consistency...
>
> > +
> > +Throws an exception if any of the arguments is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_push : prototype($$) {
> > +    my ($path, $other) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "\$other is undef" if !defined($other);
> > +
> > +    return $other if path_is_absolute($other);
> > +    return $path if $other eq '';
>
> (^ Could swap the above 2 checks)

ACK

>
> > +
> > +    my $need_sep = $path ne '' && $path !~ m#/$#;
> > +
> > +    $path .= "/" if $need_sep;
> > +    $path .= $other;
> > +
> > +    return $path;
> > +}
> > +
> > +=head3 path_pop($path)
> > +
> > +Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
> > +
> > +=cut
> > +
> > +sub path_pop : prototype($) {
> > +    my ($path) = @_;
> > +    return path_parent($path);
> > +}
> > +
> > +=head3 path_file_name($path)
> > +
> > +Returns the last component of the given C<$path>, if it is a legal file 
> > name,
> > +or C<undef> otherwise.
> > +
> > +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
> > +there is no valid file name.
> > +
> > +B<Note:> This does not check whether the given C<$path> actually points to 
> > a
> > +file or a directory etc.
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_file_name : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my @components = path_components($path);
> > +
> > +    if (!scalar(@components)) {
> > +   return;
> > +    }
> > +
> > +    if (
> > +   scalar(@components) == 1
> > +   && ($components[0] eq '/' || $components[0] eq '.')
> > +    ) {
> > +   return;
> > +    }
> > +
> > +    if ($components[-1] eq '..') {
> > +   return;
> > +    }
> > +
> > +    return $components[-1];
> > +}
> > +
> > +=head3 path_with_file_name($path, $file_name)
> > +
> > +Returns C<$path> with C<$file_name> as the new last component.
> > +
> > +This is essentially like calling C<L<< path_parent()|/"path_parent($path)" 
> > >>>
> > +and using C<L<< path_push()|/"path_push($path, $other)" >>> to append the 
> > new
> > +file name, but handles a few extra cases:
> > +
> > +=over
> > +
> > +=item * If C<$path> is C</>, appends C<$file_name>.
> > +
> > +=item * If C<$path> is an empty string, appends C<$file_name>.
> > +
> > +=item * If C<$path> ends with a parent directory reference (C<..>), 
> > replaces it
> > +with C<$file_name>.
> > +
> > +=back
> > +
> > +Throws an exception if any of the arguments is C<undef> or if C<$file_name>
> > +contains a path separator (C</>).
> > +
> > +=cut
> > +
> > +sub path_with_file_name : prototype($$) {
> > +    my ($path, $file_name) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "\$file_name is undef" if !defined($file_name);
> > +    croak "\$file_name contains a path separator: $file_name" if 
> > $file_name =~ m|/|;
> > +
> > +    my $parent = path_parent($path);
> > +
> > +    # undef means that $path was either "" or "/", so we can just append 
> > to it
> > +    return ($path . $file_name) if !defined($parent);
> > +
> > +    # Because the parent of "foo" is ".", return $file_name to stay 
> > consistent.
> > +    # Otherwise, we'd end up with a current path ref prepended 
> > ("./$file_name")
> > +    if ($parent eq '.' && $path !~ m|/|) {
> > +   return $file_name;
> > +    }
> > +
> > +    return path_push($parent, $file_name);
> > +}
> > +
> > +my sub _path_file_prefix : prototype($) {
> > +    my ($file_name) = @_;
> > +
> > +    confess "\$file_name is undef" if !defined($file_name);
> > +
> > +    $file_name =~ s|^(\.*[^\.]*)||;
> > +    my $prefix = $1;
> > +
> > +    # sanity check
> > +    confess "\$prefix not matched" if !defined($prefix);
>
> ^ For the above 4 lines you should be able to just test the `=~`
> directly:
>
>     confess "..." if $file_name !~ s|^(\.*[^\.]*)||;
>
> (note =~ -> !~)

Ah, another good catch! Will incorporate this in v3 also.

>
> Would it make sense to instead *match* those parts instead of s// and
> return the extension as a 3rd value? I think `path_with_file_prefix()`
> could use this instead of building and joining a suffix list?
>
> No hard feelings, though.

Hmm, I'll give it a shot.

>
> > +
> > +    return ($prefix, $file_name);
> > +}
> > +
> > +=head3 path_file_prefix($path)
> > +
> > +Returns the prefix of the file name of the given C<$path>. If the C<$path> 
> > does
> > +not have a valid file name and thus no prefix, C<undef> is returned 
> > instead.
> > +
> > +The prefix of a file name is the part before any extensions (suffixes).
> > +
> > +    my $prefix = path_file_prefix("/etc/resolv.conf");
> > +    # resolv
> > +
> > +    my $prefix = path_file_prefix("/tmp/archive.tar.zst");
> > +    # archive
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_file_prefix : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my $file_name = path_file_name($path);
> > +    return undef if !defined($file_name);
> > +
> > +    my ($prefix, undef) = _path_file_prefix($file_name);
> > +    return $prefix;
> > +}
> > +
> > +=head3 path_with_file_prefix($path, $prefix)
> > +
> > +Returns C<$path> with a new C<$prefix>. This is similar to
> > +C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
> > +except that the file prefix is replaced or appended.
> > +
> > +If C<$path> does not have a file name or if C<$prefix> is an empty string,
> > +C<undef> is returned.
> > +
> > +    my $new_path = path_with_file_prefix("/tmp/archive.tar.zst", "backup");
> > +    # /tmp/backup.tar.zst
> > +
> > +    my $new_path = path_with_file_prefix("/etc/pve", "ceph");
> > +    # /etc/ceph
> > +
> > +Throws an exception if any of the arguments is C<undef>, or if C<$prefix>
> > +contains a path separator (C</>), ends with C<.>, or is an empty string.
> > +
> > +=cut
> > +
> > +sub path_with_file_prefix : prototype($$) {
> > +    my ($path, $prefix) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "\$prefix is undef" if !defined($prefix);
> > +    croak "\$prefix contains a path separator" if $prefix =~ m|/|;
> > +    croak "\$prefix ends with a dot" if $prefix =~ m|\.$|;
> > +
> > +    return undef if $prefix eq '';
> > +    return undef if !defined(path_file_name($path));
> > +
> > +    return $path if $prefix eq '';
> > +
> > +    my $parent = path_parent($path);
> > +
> > +    # sanity check -- should not happen because we checked for file name,
> > +    # and the existence of a file name implies there's a parent
> > +    confess "parent of \$path is undef" if !defined($parent);
> > +
> > +    my @suffixes = path_file_suffixes($path);
> > +
> > +    my $file_name = join(".", $prefix, @suffixes);
> > +
> > +    # Because the parent of "foo" is ".", return $file_name to stay 
> > consistent.
> > +    # Otherwise, we'd end up with a current path ref prepended 
> > ("./$file_name")
> > +    # (Done also in path_with_file_name)
> > +    if ($parent eq '.' && $path !~ m|/|) {
> > +   return $file_name;
> > +    }
> > +
> > +    return path_push($parent, $file_name);
> > +}
> > +
> > +my sub _path_file_suffixes : prototype($) {
> > +    my ($file_name_no_prefix) = @_;
> > +
> > +    confess "\$file_name_no_prefix is undef" if 
> > !defined($file_name_no_prefix);
> > +
> > +    # Suffixes are extracted "manually" because join()ing the result of 
> > split()
> > +    # results in a different file name than the original. Let's say you 
> > have a
> > +    # file named "foo.bar.". The correct suffixes would be ("bar", "").
> > +    # With split, you get the following:
> > +    #     split(/\./, ".bar.")     --> ("", "bar")     --> join()ed to 
> > "foo..bar"
> > +    #     split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to 
> > "foo..bar."
> > +    my @suffixes = ();
> > +    while ($file_name_no_prefix =~ s|^(\.[^\.]*)||) {
> > +   my $suffix = $1;
> > +   $suffix =~ s|^\.||;
> > +   push(@suffixes, $suffix);
> > +    }
> > +
> > +    return @suffixes;
> > +}
> > +
> > +=head3 path_file_suffixes($path)
> > +
> > +Returns the suffixes of the C<$path>'s file name as a list. If the 
> > C<$path> does
> > +not have a valid file name, an empty list is returned instead.
> > +
> > +In scalar context, returns a reference to a list.
>
> (^ Isn't this a bit awkward?)

Well, do you think it is? I've been using that kind of "return style"
here and there and have been liking it, but if it's weird to others, I
can adapt it. :P

>
> > +
> > +The suffixes of a path are essentially the file name's extensions, the 
> > parts
> > +that come after the L<< prefix|/"path_file_prefix($path)" >>.
> > +
> > +    my @suffixes = path_file_suffixes("/etc/resolv.conf");
> > +    # ("conf")
> > +
> > +    my $suffixes = path_file_prefix("/tmp/archive.tar.zst");
> > +    # ["tar", "zst"]
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_file_suffixes : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my $file_name = path_file_name($path);
> > +    if (!defined($file_name)) {
> > +   return wantarray ? () : [];
> > +    }
> > +
> > +    (undef, $file_name) = _path_file_prefix($file_name);
> > +
> > +    my @suffixes = _path_file_suffixes($file_name);
> > +
> > +    return wantarray ? @suffixes : \@suffixes;
> > +}
> > +
> > +=head3 path_with_file_suffixes($path, @suffixes)
> > +
> > +Returns C<$path> with new C<@suffixes>. This is similar to
> > +C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
> > +except that the suffixes of the file name are replaced.
> > +
> > +If the C<$path> does not have a file name, C<undef> is returned.
> > +
> > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "pxar", 
> > "gz");
> > +    # /tmp/archive.pxar.gz
> > +
> > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "gz");
> > +    # /tmp/archive.gz
> > +
> > +If the file name has no suffixes, the C<@suffixes> are appended instead:
> > +
> > +    my $new_path = path_with_file_suffixes("/etc/resolv", "conf");
> > +    # /etc/resolv.conf
> > +
> > +    my $new_path = path_with_file_suffixes("/etc/resolv", "conf", "zst");
> > +    # /etc/resolv.conf.zst
> > +
> > +If there are no C<@suffixes> provided, the file name's suffixes will
> > +be removed (if there are any):
> > +
> > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst");
> > +    # /tmp/archive
> > +
> > +Note that an empty string is still a valid suffix (an "empty" file ending):
> > +
> > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "", "", 
> > "", "zst");
> > +    # /tmp/archive....zst
> > +
> > +Throws an exception if C<$path> or any of the C<@suffixes> is C<undef>, or
> > +if any suffix contains a path separator (C</>) or a C<.>.
> > +
> > +=cut
> > +
> > +sub path_with_file_suffixes : prototype($@) {
> > +    my ($path, @suffixes) = @_;
>
> I am questioning a bit the sanity of having "suffixes" throughout this
> module instead of simply an "extension" that covers them all and can be
> split if needed...
>
> Do we have/anticipate particular use cases where this is more
> convenient?

To be really honest, it was 50/50 on the "extension" vs "suffixes"
decision. I then opted for suffixes instead, because they seemed to be
less ambiguous than an "extension". Let me elaborate.

Let's say we have a file called "foo.tar.gz" -- what would its extension
be? Some might say that ".tar.gz" is its extension, while some others
might say ".gz"; both have different reasonings but aren't necessarily
more or less correct than the other.

In our case, both you and I would say that ".tar.gz" is the extension,
but...

Rust's std::path::Path::extension [1] would return "gz" for the above,
while C++'s std::filesystem::path::extension [2] returns ".gz" (but they
don't have such a case in their docs!). Java has java.nio.file.Path, but
conveniently doesn't care about extensions. Kotlin fixest his by adding
an "extension" property to that class, which returns "gz" [3].

Python's pathlib is the only one I've seen go the "suffix route" and
provides the pathlib.PurePath.suffix and .suffixes methods [4]. This
type of approach also came up on the (Rust) tracking issue for
Path::file_prefix [5].

When I was looking up all that, I decided to go for the prefix +
suffix(es) route, as that just seemed to be the most unambiguous.
Convenience wasn't really a factor here, because there's the
"path_with_file_* family" of functions that should handle most of the
replacement cases.

(Besides, I found it quite nice that I could join() on the prefix +
suffixes (e.g. join(".", "foo", "tar", "gz")) that I first extracted
with path_file_prefix() and path_file_suffixes(); I like that there's an
"inverse" operation.)

*But,* I wouldn't be opposed to adding a function that just returns
"tar.gz" for the above case. Perhaps with a different name though :P

[1] https://doc.rust-lang.org/std/path/struct.Path.html#method.extension
[2] https://en.cppreference.com/w/cpp/filesystem/path/extension
[3] 
https://github.com/JetBrains/kotlin/blob/rrr/2.1.0/core-docs/libraries/stdlib/jdk7/src/kotlin/io/path/PathUtils.kt#L46
[4] https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffix
[5] https://github.com/rust-lang/rust/issues/86319

>
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "one of the provided suffixes is undef"
> > +   if any { !defined($_) } @suffixes;
> > +    croak "one of the provided suffixes contains a path separator"
> > +   if any { $_ =~ m|/| } @suffixes;
> > +    croak "one of the provided suffixes contains a dot"
> > +   if any { $_ =~ m|\.| } @suffixes;
> > +
> > +    return undef if !defined(path_file_name($path));
> > +
> > +    my $parent = path_parent($path);
> > +
> > +    # sanity check -- should not happen because we checked for file name,
> > +    # and the existence of a file name implies there's a parent
> > +    confess "parent of \$path is undef" if !defined($parent);
> > +
> > +    # Don't modify $path if there are no suffixes to be removed
> > +    my @existing_suffixes = path_file_suffixes($path);
> > +    return $path if !scalar(@suffixes) && !scalar(@existing_suffixes);
> > +
> > +    my $prefix = path_file_prefix($path);
> > +
> > +    # sanity check
> > +    confess "\$prefix is undef" if !defined($prefix);
> > +
> > +    my $file_name = join(".", $prefix, @suffixes);
> > +
> > +    # Because the parent of "foo" is ".", return $file_name to stay 
> > consistent.
> > +    # Otherwise, we'd end up with a current path ref prepended 
> > ("./$file_name")
> > +    # (Done also in path_with_file_name)
> > +    if ($parent eq '.' && $path !~ m|/|) {
> > +   return $file_name;
> > +    }
> > +
> > +    return path_push($parent, $file_name);
> > +}
> > +
> > +=head3 path_file_suffix($path)
> > +
> > +Returns the suffix of the C<$path>'s file name. If the C<$path> does not 
> > have a
> > +valid file name or if the file name has no suffix, C<undef> is returned
> > +instead.
> > +
> > +The suffix of a file name is essentially its extension, e.g.
> > +C</etc/resolv.conf> has the suffix C<conf>. If there are multiple suffixes,
> > +only the last will be returned; e.g. C</tmp/archive.tar.gz> has the suffix 
> > C<gz>.
> > +
> > +B<Note:> Files like e.g. C</tmp/foo.> have an empty string as suffix.
> > +
> > +For getting all suffixes of a path, see C<L<< 
> > path_file_suffixes()|/"path_file_suffixes($path)" >>>.
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_file_suffix : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my $file_name = path_file_name($path);
> > +    return undef if !defined($file_name);
> > +
> > +    (undef, $file_name) = _path_file_prefix($file_name);
> > +
> > +    my @suffixes = _path_file_suffixes($file_name);
> > +
> > +    return pop(@suffixes);
> > +}
> > +
> > +=head3 path_with_file_suffix($path, $suffix)
> > +
> > +Returns C<$path> with a new C<$suffix>. This is similar to
> > +C<L<< path_with_file_suffixes()|/"path_with_file_suffixes($path, 
> > @suffixes)" >>>,
> > +except that only the last suffix of the file name is replaced.
> > +
> > +If the C<$path> does not have a file name, C<undef> is returned.
> > +
> > +    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "gz");
> > +    # /tmp/archive.tar.gz
> > +
> > +If the file name has no suffixes, the C<$suffix> is appended instead:
> > +
> > +    my $new_path = path_with_file_suffix("/etc/resolv", "conf");
> > +    # /etc/resolv.conf
> > +
> > +If C<$suffix> is C<undef>, the file name's (last) suffix will be removed 
> > (if
> > +there is one):
> > +
> > +    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", undef);
> > +    # /tmp/archive.tar
> > +
> > +    my $new_path = path_with_file_suffix("/etc/resolv", undef);
> > +    # /etc/resolv
> > +
> > +Note that an empty string is still a valid suffix (an "empty" file ending):
> > +
> > +    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "");
> > +    # /tmp/archive.tar.
> > +
> > +    my $new_path = path_with_file_suffix("/etc/resolv", "");
> > +    # /etc/resolv.
> > +
> > +Throws an exception if any of the arguments is C<undef>, or if C<$suffix>
> > +contains a path separator (C</>) or a C<.>.
> > +
> > +=cut
> > +
> > +sub path_with_file_suffix : prototype($$) {
> > +    my ($path, $suffix) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    if (defined($suffix)) {
> > +   croak "\$suffix contains a path separator" if $suffix =~ m|/|;
> > +   croak "\$suffix contains a dot" if $suffix =~ m|\.|;
> > +    }
> > +
> > +    return undef if !defined(path_file_name($path));
> > +
> > +    my $parent = path_parent($path);
> > +
> > +    # sanity check -- should not happen because we checked for file name,
> > +    # and the existence of a file name implies there's a parent
> > +    confess "parent of \$path is undef" if !defined($parent);
> > +
> > +    my @suffixes = path_file_suffixes($path);
> > +
> > +    # Don't modify $path if there is no suffix to be removed
> > +    return $path if !scalar(@suffixes) && !defined($suffix);
> > +
> > +    pop(@suffixes);
> > +    push(@suffixes, $suffix) if defined($suffix);
> > +
> > +    my $prefix = path_file_prefix($path);
> > +
> > +    # sanity check
> > +    confess "\$prefix is undef" if !defined($prefix);
> > +
> > +    my $file_name = join(".", $prefix, @suffixes);
> > +
> > +    # Because the parent of "foo" is ".", return $file_name to stay 
> > consistent.
> > +    # Otherwise, we'd end up with a current path ref prepended 
> > ("./$file_name")
> > +    # (Done also in path_with_file_name)
> > +    if ($parent eq '.' && $path !~ m|/|) {
> > +   return $file_name;
> > +    }
> > +
> > +    return path_push($parent, $file_name);
> > +}
> > +
> > +=head3 path_file_parts($path)
> > +
> > +Returns the parts that constitute the file name (prefix and suffixes) of a
> > +C<$path> as a list. If the C<$path> does not have a valid file name, an 
> > empty
> > +list is returned instead.
> > +
> > +In scalar context, returns a reference to a list.
> > +
> > +These parts are split in such a way that allows them to be C<join>ed 
> > together,
> > +resulting in the original file name of the given C<$path> again.
> > +
> > +    my $file_parts = path_file_parts("/etc/pve/firewall/cluster.fw");
> > +    # ("cluster", "fw")
> > +    my $file_name = join(".", $file_parts->@*);
> > +
> > +    my @file_parts = path_file_parts("/tmp/archive.tar.gz");
> > +    # ("archive", "tar", "gz")
> > +    my $file_name = join(".", @file_parts);
> > +
> > +Throws an exception if C<$path> is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_file_parts : prototype($) {
> > +    my ($path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +
> > +    my $file_name = path_file_name($path);
> > +    if (!defined($file_name)) {
> > +   return wantarray ? () : [];
> > +    }
> > +
> > +    my $prefix;
> > +    ($prefix, $file_name) = _path_file_prefix($file_name);
> > +
> > +    my @suffixes = _path_file_suffixes($file_name);
> > +
> > +    my @file_parts = ($prefix, @suffixes);
> > +
> > +    return wantarray ? @file_parts : \@file_parts;
> > +}
> > +
> > +=head3 path_starts_with($path, $other_path)
> > +
> > +Checks whether a C<$path> starts with the components of C<$other_path>.
> > +
> > +    my $starts_with = path_starts_with("/etc/pve/firewall/cluster.fw", 
> > "/etc/pve");
> > +    # 1
> > +
> > +Throws an exception if any of the arguments is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_starts_with : prototype($$) {
> > +    my ($path, $other_path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "\$other_path if undef" if !defined($other_path);
> > +
> > +    # Nothing starts with nothing
> > +    return 1 if ($path eq '' && $other_path eq '');
> > +
> > +    # Nothing cannot start with something
> > +    # Something cannot start with nothing
> > +    return if ($path eq '' || $other_path eq '');
> > +
> > +    my @components = path_components($path);
> > +    my @other_components = path_components($other_path);
> > +
> > +    my @pairs = zip_shortest(\@components, \@other_components);
> > +
> > +    # for my ($comp, $other_comp) (@pairs) is experimental
> > +    for my $pair (@pairs) {
> > +   my ($comp, $other_comp) = $pair->@*;
> > +
> > +   if ($comp ne $other_comp) {
> > +       return;
> > +   }
> > +    }
> > +
> > +    return 1;
> > +}
> > +
> > +=head3 path_ends_with($path, $other_path)
> > +
> > +Checks whether a C<$path> ends with the components of C<$other_path>.
> > +
> > +    my $ends_with = path_ends_with("/etc/pve/firewall/cluster.fw", 
> > "firewall/cluster.fw");
> > +    # 1
> > +
> > +Throws an exception if any of the arguments is C<undef>.
> > +
> > +=cut
> > +
> > +sub path_ends_with : prototype($$) {
> > +    my ($path, $other_path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "\$other_path if undef" if !defined($other_path);
> > +
> > +    # Nothing ends with nothing
> > +    return 1 if ($path eq '' && $other_path eq '');
> > +
> > +    # Nothing cannot end with something
> > +    # Something cannot end with nothing
> > +    return if ($path eq '' || $other_path eq '');
> > +
> > +    my @components_rev = reverse(path_components($path));
> > +    my @other_components_rev = reverse(path_components($other_path));
> > +
> > +    my @pairs_rev = zip_shortest(\@components_rev, \@other_components_rev);
> > +
> > +    # for my ($comp, $other_comp) (@pairs_rev) is experimental
> > +    for my $pair (@pairs_rev) {
> > +   my ($comp, $other_comp) = $pair->@*;
> > +
> > +   if ($comp ne $other_comp) {
> > +       return;
> > +   }
> > +    }
> > +
> > +    return 1;
> > +}
> > +
> > +=head3 path_equals($path, $other_path)
> > +
> > +Checks whether C<$path> equals C<$other_path>. The paths are compared
> > +by their components, meaning that it's not necessary to
> > +L<< normalize|/"path_normalize($path)" >> them beforehand.
> > +
> > +=cut
> > +
> > +sub path_equals : prototype($$) {
> > +    my ($path, $other_path) = @_;
> > +
> > +    croak "\$path is undef" if !defined($path);
> > +    croak "\$other_path if undef" if !defined($other_path);
> > +
> > +    # Nothing is nothing
> > +    return 1 if ($path eq '' && $other_path eq '');
> > +
> > +    # Nothing is not something
> > +    # Something is not nothing
> > +    return if ($path eq '' || $other_path eq '');
> > +
> > +    my @components = path_components($path);
> > +    my @other_components = path_components($other_path);
> > +
> > +    return if scalar(@components) != scalar(@other_components);
> > +
> > +    my @pairs = zip_longest(\@components, \@other_components);
> > +
> > +    # for my ($comp, $other_comp) (@pairs_rev) is experimental
> > +    for my $pair (@pairs) {
> > +   my ($comp, $other_comp) = $pair->@*;
> > +
> > +   return if !defined($comp) || !defined($other_comp);
> > +
> > +   if ($comp ne $other_comp) {
> > +       return;
> > +   }
> > +    }
> > +
> > +    return 1;
> > +}
> > +
> > +1;
> > -- 
> > 2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH v2 pve-common 01/12] introduce PVE::Path

Reply via email to