On 4/1/2017 6:15 PM, Anatol Belski wrote:
> Basically, it is the same as your points 8., 9. and 10. - it deals
> with the given path itself, so no symlinks, etc. In the snippet
> /a/b/../c it's parsed like follows
> 
> - parse up to /a/b/../ - scroll back to /a - append the remain so it
> becomes /a/c
> 
> Similar process is with /a/./b would become /a/b and others. It is
> string traversing only. What is done with dirname() uses this
> approach. In general one can say - normalization is a path
> simplification, no drive access like realpath() does. For example, it
> lets to know the path itself would be correct before it comes to
> actual file operation, and not bother with I/O otherwise.
> 

Your strategy works in these examples, but the example I gave was
different. Imagine that we have `/a/b/../c` which we would normalize to
`/a/c`. However, the `b` component is actually a symbolic link to `x/y`.
Hence, the real version of the path is `/a/x/c` and not `/a/c` as we
would have normalized it to.

On 4/1/2017 6:15 PM, Anatol Belski wrote:
> As mentioned in an earlier post, in might make sense to have flags to
> control the behavior. Maybe a signature like
> 
> string canonicalize_path(string $path, int $flags = 0);
> 
> The function OFC knows the current platform. Flags like
> PATH_TARGET_WINDOWS | PATH_UNIXIFY would control the path separator
> behaviors. Generally, regarding path without drive letter - on
> Windows I'd strongely advise to not to use it in configs, etc.
> because of multiple root issues mentioned already. But in principle,
> say one has same FS structure on different platforms and just wants
> to mirror it, that would be ok with flags like PATH_TARGET_LINUX |
> PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe
> the reverse case - generating a path on Linux that is to be used on
> Windows, flags might contain only PATH_TARGET_WINDOWS which would
> produce backslashes as system default. Maybe that's too much or
> unrelated, and only platform targets should be provided, dunno, just
> a mind game for now.
> 

I hope you notice how this function is exploding in complexity. I beg
for classes, with clear responsibilities and small methods that do one
thing.

On 4/1/2017 6:15 PM, Anatol Belski wrote:
> These last 3 points, as well as above one, are canonicalization. Of
> course, in the imaginary function, it could be decoupled like
> PATH_NO_CANONIC if it's not wanted, or PATH_CANONICALIZE_ONLY to omit
> other conversions. It's only about to have the behaviors sensible. Fe
> possible other flags could be PATH_STRIP_TRAILING_SLASH,
> PATH_ALLOW_RELATIVE and other fine things. But by default, the
> function should do the default thing for the target platform, based
> on the current platform. Thus, producing NFD for Mac and NFC
> otherwise, backslash for Windows and forward slash otherwise, other
> thing that will for sure popup. As mentioned earlier, still this
> requires some re-implementations of the platform APIs, even we'd talk
> about slashes only - for ASCII paths I'm not sure we even can
> differentiate the UTF-8 encoding  forms without involving yet another
> library, so this might be tricky. Simply exposing the part of
> realpath() processing might solve several things for one given
> platform, that's for sure. The initial case Rasmus reported was about
> crossplatform handling, but the topic is indeed slightly bigger than
> just path separators, so IMO the convenient way were to care about a
> crossplatform approach. I've no info, how badly such crossplatform
> path issues are indeed relevant, so it might be another story to
> investigate before one starts any implementation. At least, grouping
> some cases and thought, maybe as an RFC, could be good to track the
> topic.
> 

I agree mostly:

- We should not call it canonicalization (I used the word too), but
rather normalization. The former is used in other languages and means
realpath there. This could be confusing.
- Leaving the stripping of the trailing separator to the user means that
other users never know what the get, that is bad. The normalization
should always use one strategy here.

-- 
Richard "Fleshgrinder" Fussenegger


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to