On Mon Feb 3, 2025 at 4:42 PM CET, Fiona Ebner wrote: > Am 03.02.25 um 15:21 schrieb Max Carrara: > > On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote: > >> Am 09.01.25 um 15:48 schrieb Max Carrara: > >> > >> Why this kind of behavior with absolute paths? Seems surprising to me. > >> Wouldn't failing the call be better? > > > > I decided to stick with Rust's behaviour [pathbuf] here and in most > > other places, simply in order to keep the behaviour consistent. > > > > I'm not *really* a fan of it either, but I think it's better than > > failing the call -- IMO having to wrap path_join / path_push into an > > eval-block would be quite awkward for what's a rather "small" operation > > :s > > > > Perhaps I should provide some additional context here as well so as to > > make my "design philosophy" more clear: The reason why I modeled > > PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because: > > > > 1. Most (newer) developers will be much more likely to be familiar > > with Rust instead of Perl, so if they should see this library, > > they'll have a much easier time using it (or at least that's what > > I *hope* would happen). Sort of as in: "Oh, this is just like the > > path stuff in Rust, except for a few Perl-esque exceptions!" > > > > 2. I wanted the library to be essentially exception-free, except where > > obvious invariants aren't upheld by the caller (e.g. passing undef). > > Unless you match the error message with a regex, there's no > > structurally typed way to differ between errors in Perl, so I'd > > like the "types" of errors to be mostly the same wherever possible. > > > > 3. I didn't want to reinvent the wheel and instead stick to what other > > popular libraries do as much as possible, while also avoiding any > > additional abstractions (e.g. introducing an object for paths). > > > > With all that being said, since that behaviour is surprising to you, I > > have no quarrels changing it! :P After all I'd like to have something > > that doesn't cause any friction when using it. > > > > [pathbuf]: > > https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push > > > > You don't need to change it if there was good rationale behind it, which > you provided :) A short comment in the function documentation regarding > the inspiration would still be nice. Puts the responsibility on the > caller to not accidentally use an absolute path though. Maybe we can at > least warn instead? I suspect almost all calls with an absolute path in > the middle will be accidental.
Sure, that sounds fine by me! :) > > >> > >>> + > >>> + my $joined = path_join("foo/bar", "/etc", "resolv.conf"); > >>> + # /etc/resolv.conf > >>> + > >>> + my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts"); > >>> + # /etc/hosts > >>> + > >>> +Throws an exception if any of the passed C<@paths> is C<undef>. > >>> + > >>> +=cut > >>> + > >>> +sub path_join : prototype(@) { > >>> + my (@paths) = @_; > >>> + > >>> + if (!scalar(@paths)) { > >>> + return ''; > >>> + } > >>> + > >>> + croak "one of the provided paths is undef" if any { !defined($_) } > >>> @paths; > >>> + > >> > >> I think the rest could be written more efficiently like (untested): > >> > >> my $resulting_path = shift @paths; > >> for my $path (@paths) { > >> if ($path =~ m#^/#) { > >> $resulting_path = $path; > >> } else { > >> $resulting_path = path_push($resulting_path, $path); > >> } > >> } > > > > Note that I'm doing a reverse iteration below; I think doing it either > > way should be fine, because I doubt we'll ever need to join that many > > path components where it's actually going to make a difference ;P > > > > I also meant "efficiently" in terms of lines/readability ;) Shorter, > less loops and avoids index arithmetic. Oooh, that makes sense then! Sure, I'll give it a try :) > > >> > >>> + # Find the last occurrence of a root directory and start conjoining > >>> the > >>> + # components from there onwards > >>> + my $index = scalar(@paths) - 1; > >>> + while ($index > 0) { > >>> + last if $paths[$index] =~ m#^/#; > >>> + $index--; > >>> + } > >>> + > >>> + @paths = @paths[$index .. (scalar(@paths) - 1)]; > >>> + > >>> + my $resulting_path = shift @paths; > >>> + > >>> + for my $path (@paths) { > >>> + $resulting_path = path_push($resulting_path, $path); > >>> + } > >>> + > >>> + return $resulting_path; > >>> +} > >>> + > >>> +=head3 path_normalize($path) > >>> + > >>> +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of > >>> the given > >>> +C<$path>. > >>> + > >>> +This removes unnecessary components of a path that can be safely > >>> +removed, such as C<.>, trailing C</> or repeated occurrences of C</>. > >>> + > >>> +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become > >>> +C<foo/bar/baz>. > >>> + > >>> +B<Note:> This will I<not> remove components referencing the parent > >>> directory, > >>> +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will > >>> therefore > >>> +remain as they are. However, the parent directory of C</> is C</>, so > >>> +C</../../foo> will be normalized to C</foo>. > >>> + > >>> +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> > >>> failed. > >>> + > >>> +=cut > >>> + > >>> +sub path_normalize : prototype($) { > >>> + my ($path) = @_; > >>> + > >>> + croak "\$path is undef" if !defined($path); > >>> + > >>> + my $cleaned_path = eval { > >>> + File::Spec->canonpath($path); > >>> + }; > >>> + > >> > >> Style nit: blank lines between eval and using $@ are better avoided > >> IMHO. I'd also have the eval expression be a single line, but no strong > >> feelings. > > > > Could you perhaps elaborate why, please? > > > > Because new code gets added more easily in between. I've seen that > happen before. Ah, okay! Sure, then I'll change it. At first I feared that there was some (--> yet another) obscure Perl edge case I didn't know about. :P > > ---snip 8<--- > > >>> +Throws an exception if any of the arguments is C<undef>. > >>> + > >>> +=cut > >>> + > >>> +sub path_push : prototype($$) { > >>> + my ($path, $other) = @_; > >>> + > >>> + croak "\$path is undef" if !defined($path); > >>> + croak "\$other is undef" if !defined($other); > >>> + > >>> + return $path if $other eq ''; > >>> + return $other if path_is_absolute($other); > >>> + > >>> + my $need_sep = $path ne '' && $path !~ m#/$#; > >>> + > >>> + $path .= "/" if $need_sep; > >>> + $path .= $other; > >>> + > >>> + return $path; > >>> +} > >>> + > >>> +=head3 path_pop($path) > >>> + > >>> +Alias for C<L<< path_parent()|/"path_parent($path)" >>>. > >>> + > >> > >> Why have this alias? The name suggests it would behave like an inverse > >> to path_push(), but it does not, because of the normalization of > >> path_parent(). So I'd rather not have it or have it be a non-normalizing > >> version (but maybe not worth it) to avoid surprises. > > > > Could you please elaborate what you mean regarding the normalization of > > path_parent()? > > I just mean push "." followed by pop is not the same as you start with. > > > > > path_parent() (or path_pop()) will only normalize as much as it needs to > > in order to remove the last path component and get to the one that > > precedes that: > > > > my $p = "foo/././bar/././baz/./."; > > > > $p = PVE::Path::path_parent($p): # or path_pop() > > # $p is now "foo/././bar" > > > > path_push() does actually behave like an inverse if you include the > > redundant stuff: > > > > $p = PVE::Path::path_push($p, "././baz/./.") > > # $p is now "foo/././bar/././baz/./." again > > > > It's not a complete inverse, that's true (as in, it's not bijective ;P), > > because you can't replicate the example above with a path like > > "foo///bar///baz" since you can't push "///baz" onto "foo///bar" > > ("///baz" is treated as an absolute path). > > > > I think a non-normalizing version wouldn't really be that nice to use; > > one would have to constantly remember to call path_normalize() > > beforehand to actually get the desired result (i.e. getting the parent > > directory). > > Agreed. > > > > > I can just remove the alias if you prefer; it's not equivalent to Rust's > > PathBuf::pop() [pop] and if it's a source of confusion, we can probably > > do without. > > > > [pop]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.pop > > > > I see where the inspiration comes from then :) I get why Rust has both, > because one is for Path and the other for PathBuf. And since they are > essentially the same operation, fine by me to keep this alias. > > >> > >>> +=cut > >>> + > >>> +sub path_pop : prototype($) { > >>> + my ($path) = @_; > >>> + return path_parent($path); > >>> +} > >>> + > >>> +=head3 path_file_name($path) > >>> + > >>> +Returns the last component of the given C<$path>, if it is a legal file > >>> name, > >>> +or C<undef> otherwise. > >>> + > >>> +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> > >>> component, > >>> +there is no valid file name. > >>> + > >>> +B<Note:> This does not check whether the given C<$path> actually points > >>> to a > >>> +file or a directory etc. > >>> + > >>> +Throws an exception if C<$path> is C<undef>. > >>> + > >>> +=cut > >>> + > >>> +sub path_file_name : prototype($) { > >>> + my ($path) = @_; > >>> + > >>> + croak "\$path is undef" if !defined($path); > >>> + > >>> + my @components = path_components($path); > >>> + > >>> + if (!scalar(@components)) { > >>> + return; > >>> + } > >>> + > >>> + if ( > >>> + scalar(@components) == 1 > >>> + && ($components[0] eq '/' || $components[0] eq '.') > >>> + ) { > >> > >> Style nit: this conditional fits well into 100 characters, so can be one > >> line instead of four > > > > Fair point, but I prefer to have it a little more readable here, and > > it's still in line with our style guide. :P > > Personally, I find this version "more noisy" to read, because of the > split, but no hard feelings :) > > ---snip 8<--- > > >>> +=cut > >>> + > >>> +sub path_file_prefix : prototype($) { > >>> + my ($path) = @_; > >>> + > >>> + croak "\$path is undef" if !defined($path); > >>> + > >>> + my $file_name = path_file_name($path); > >>> + return undef if !defined($file_name); > >>> + > >>> + my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name); > >>> + return $prefix; > >>> +} > >>> + > >>> +=head3 path_with_file_prefix($path, $prefix) > >>> + > >> > >> Hmm, looking at this and path_with_file_suffix{,es}(), would it maybe be > >> nicer to have a single > >> path_with_file_name_from_parts($path, $prefix, @suffixes) > >> function instead of these? Would seem more natural/straightforward to > >> me. The implementations are rather involved IMHO compared to how useful > >> the functions are. It's very easy to get the behavior for the (I suspect > >> rather uncommon) case of where you want to replace only prefix or > >> suffix(es) without already knowing the other. Just need to call > >> path_file_parts() first. Or did you take inspiration from somewhere else > >> for these? > > > > I'll have to disagree here precisely because having to call > > path_file_parts() first is IMO tedious from the caller's perspective. In > > many cases I rarely care about all of the individual parts; instead I > > just e.g. want to rename a file while keeping the suffix(es), or I want to > > trim off the last suffix of something like "archive.tar.gz" because I'm > > decompressing it to "archive.tar" or similar. > > > > In other words, in a lot of cases I just want to perform one individual > > operation, and the path_file_* and path_with_file_* functions represent > > all those individual operations as concisely as possible. The burden of > > "figuring out what to do" shouldn't be given to the caller in such > > cases. > > > > Regarding inspiration: These functions were inspired after having read a > > related Rust GitHub issue [issue] where a lot of this was discussed as > > well. > > > > That issue also suggests having some kind of iterator for file parts, > > that other methods of std::path::Path can be based on; this is actually > > what led me to adding path_file_parts() below. You have access to the > > individual parts there and can just join() them to the file name. That > > would be very similar to the path_with_file_name_from_parts() function > > you suggested. :P > > > > [issue]: https://github.com/rust-lang/rust/issues/86319 > > > > Okay, thank you for elaborating! Maybe such calls are more common than I > think. You're welcome! Also thanks for the thorough review, it's much appreciated :) I'll see if I can send out a v4 soon enough. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel