Re: [Dorset] Paths relative to a current working directory whose path contains symbolic links

Ralph Corderoy Mon, 09 Nov 2020 07:01:16 -0800

Hi Patrick,

Keith's already answered, but here's some background.


> Further suppose that, in that path, 'a' is a symbolic link:

Symbolic links were an easy hack by Berkeley but they aren't orthogonal
to the system they modified and messed up many existing things.  They're
still messed up decades later,
e.g. https://en.wikipedia.org/wiki/Symlink_race.

One thing to consider is how /bin/pwd traditionally worked.

- stat(2) the current directory, ‘.’, to learn its inode number.
- chdir(2) to the parent directory.
- readdir(2) the new current directory and look for an entry with the
  inode number of the directory we've just ascended from.

Keep repeating those stages as you work up the directory tree.  Stop
when you reach the root directory which is where ‘.’ and ‘..’ are the
same inode number.

    $ ls -1di / /. /..
    2 /
    2 /.
    2 /..
    $

If in the directory /usr/share/doc/bash then this would give

    bash doc share usr

There's only one possible answer this way as it ignores how you arrived
at the starting directory and instead learns of its sole canonical place
in the filesystem.  /bin/pwd still exists today and although it now
works more efficiently thanks to an extra kernel interface, it still
gives this filesystem answer, as does a POSIX ‘pwd -P’, whether built
into the shell for efficiency or not.

/bin/pwd was quite expensive on old systems which used the above
traditional method, especially if all the directories from here to ‘/’
weren't in RAM thus causing disk-head seeks.  A shell's built-in pwd
could keep track of the CWD's path in the filesystem and just print the
string when asked.  That shell has to decide how to handle the ambiguity
created by symlinks: does changing into a subdirectory which is a
symbolic link tack the symlink's name onto the CWD string or is
readlink(2) used, possibly repeatedly, and the true filesystem location
maintained?

> In Python, it seems as though the way to eliminate the discrepancy 
> while maintaining cross-platform compatibility is to add a check to 
> see whether the program is running under a shell that sets $PWD (see 
> bash(1)), or on a system that has pwd(1), and, if it is, then resolve 
> any relative paths relative to $PWD or the output of pwd, rather than 
> relative to os.getcwd().

If I cd into /a/symlink/b and pass ../c to a program then I may mean the
textual /a/symlink/c, or the filesystem /x/y/c because I know where
symlink has led me.  The program can't know which.  It shouldn't try and
guess but instead just pass the argument to open(2), etc.  If that bites
the user, e.g. by trampling the wrong file, then the user will have
learnt symlinks are a bad idea and alternatives should be sought where
possible.  :-)

There's a good paper by Rob Pike presented at Usenix in 2000 which
accepts symlinks are here to say and suggests a solution based on the
work in Plan 9.

    A deeper question is whether the shell should even be trying to make
    pwd and cd do a better job.  If it does, then the getwd library call
    and every program that uses it will behave differently from the
    shell, a situation that is sure to confuse.

        ― https://9p.io/sys/doc/lexnames.html

-- 
Cheers, Ralph.

-- 
  Next meeting: Online, Jitsi, Tuesday, 2020-12-01 20:00
  Check to whom you are replying
  Meetings, mailing list, IRC, ...  http://dorset.lug.org.uk
  New thread, don't hijack:  mailto:dorset@mailman.lug.org.uk

Re: [Dorset] Paths relative to a current working directory whose path contains symbolic links

Reply via email to