On 4/1/2017 6:15 PM, Anatol Belski wrote: > Basically, it is the same as your points 8., 9. and 10. - it deals > with the given path itself, so no symlinks, etc. In the snippet > /a/b/../c it's parsed like follows > > - parse up to /a/b/../ - scroll back to /a - append the remain so it > becomes /a/c > > Similar process is with /a/./b would become /a/b and others. It is > string traversing only. What is done with dirname() uses this > approach. In general one can say - normalization is a path > simplification, no drive access like realpath() does. For example, it > lets to know the path itself would be correct before it comes to > actual file operation, and not bother with I/O otherwise. >
Your strategy works in these examples, but the example I gave was different. Imagine that we have `/a/b/../c` which we would normalize to `/a/c`. However, the `b` component is actually a symbolic link to `x/y`. Hence, the real version of the path is `/a/x/c` and not `/a/c` as we would have normalized it to. On 4/1/2017 6:15 PM, Anatol Belski wrote: > As mentioned in an earlier post, in might make sense to have flags to > control the behavior. Maybe a signature like > > string canonicalize_path(string $path, int $flags = 0); > > The function OFC knows the current platform. Flags like > PATH_TARGET_WINDOWS | PATH_UNIXIFY would control the path separator > behaviors. Generally, regarding path without drive letter - on > Windows I'd strongely advise to not to use it in configs, etc. > because of multiple root issues mentioned already. But in principle, > say one has same FS structure on different platforms and just wants > to mirror it, that would be ok with flags like PATH_TARGET_LINUX | > PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe > the reverse case - generating a path on Linux that is to be used on > Windows, flags might contain only PATH_TARGET_WINDOWS which would > produce backslashes as system default. Maybe that's too much or > unrelated, and only platform targets should be provided, dunno, just > a mind game for now. > I hope you notice how this function is exploding in complexity. I beg for classes, with clear responsibilities and small methods that do one thing. On 4/1/2017 6:15 PM, Anatol Belski wrote: > These last 3 points, as well as above one, are canonicalization. Of > course, in the imaginary function, it could be decoupled like > PATH_NO_CANONIC if it's not wanted, or PATH_CANONICALIZE_ONLY to omit > other conversions. It's only about to have the behaviors sensible. Fe > possible other flags could be PATH_STRIP_TRAILING_SLASH, > PATH_ALLOW_RELATIVE and other fine things. But by default, the > function should do the default thing for the target platform, based > on the current platform. Thus, producing NFD for Mac and NFC > otherwise, backslash for Windows and forward slash otherwise, other > thing that will for sure popup. As mentioned earlier, still this > requires some re-implementations of the platform APIs, even we'd talk > about slashes only - for ASCII paths I'm not sure we even can > differentiate the UTF-8 encoding forms without involving yet another > library, so this might be tricky. Simply exposing the part of > realpath() processing might solve several things for one given > platform, that's for sure. The initial case Rasmus reported was about > crossplatform handling, but the topic is indeed slightly bigger than > just path separators, so IMO the convenient way were to care about a > crossplatform approach. I've no info, how badly such crossplatform > path issues are indeed relevant, so it might be another story to > investigate before one starts any implementation. At least, grouping > some cases and thought, maybe as an RFC, could be good to track the > topic. > I agree mostly: - We should not call it canonicalization (I used the word too), but rather normalization. The former is used in other languages and means realpath there. This could be confusing. - Leaving the stripping of the trailing separator to the user means that other users never know what the get, that is bad. The normalization should always use one strategy here. -- Richard "Fleshgrinder" Fussenegger
signature.asc
Description: OpenPGP digital signature