10 thumbs up ;-) But this really demonstrates how badly we need this function - I bet any number of those points may or may not be covered by any number of implementations in the wild.
It would be so nice to have this done "right", once and for all. On Sat, Apr 1, 2017 at 2:42 PM, Fleshgrinder <p...@fleshgrinder.com> wrote: > On 4/1/2017 2:01 PM, Anatol Belski wrote: > > 1. optionally - yes, otherwise it should do platform default > > 2. no, this kind of operation is a pure parsing, no I/O related checks > needed > > 3. irrelevant, but can be defined > > > > Other points yet I'd care about > > - result should be correct for target platform disregarding actual > platform, fe target Linux path Windows, or Windows path on Mac, etc. > > - validation, particularly for reserved words and chars, also other > platform aspects > > - encodings have to be respected, or UTF-8 only, to define > > - probably should be compatible with PHP stream wrapper namespaces > > > > > > Thanks > > > > Anatol > > > > 1. How do you envision that? If the path is `/a/b/../c` where only `/a` > exists right now? It's unresolvable, assuming that `../` points to `/a` > is wrong if `b/` is a symbolic link that points to `/x/y`. > > 2. Here I agree, casing cannot be decided without hitting the > filesystem. Some are case-sensitive, some insensitive, and others > configurable. > > 3. Does not matter for Windows itself, it is case-insensitive. > > (I continue the numbering for the points you raised.) > > 4. How would we go about normalizing a Windows path to POSIX? `C:\a` is > not necessarily the same as `/a`, or should it produce `C:/a`? > > 5. ๐ > > 6. I vote for UTF-8 only. We already have locale dependent filesystem > functions, which also makes them kind of weird to use, especially in > libraries. Another very important aspect to take care of this point is > normalization forms. Filesystems generally store stuff as is, that means > that we can create to files with the same name, at least by the looks of > it, which are actually different ones. Think of `รค` which can also be > `aฬ`. It is generally most advisable to stick to NFC, because that is > also how users usually produce those chars. > > 7. ๐ just forward I'd say. > > 8. Collapse multiple separators (e.g. `a//b` ~> `a/b`). > > 9. Resolve self-references, unless they are leading (e.g. `a/./b` ~> > `a/b` but `./a/b` stays `./a/b`). > > 10. Trim separators from the end (e.g. `a/` ~> `a`). > > -- > Richard "Fleshgrinder" Fussenegger >