Re: How does opendir/readdir process files

Brandon McCaig Tue, 13 Jan 2015 19:41:42 -0800

Harry:

(Expect typographical errors in such a long post...)

tl;dr? RTFM.

On Fri, Jan 09, 2015 at 08:26:55PM -0500, Harry Putnam wrote:
> I'm taxing peoples patience I suppose but being considerably thick of
> skull I cannot just look at this and see what it does.
> 
> > my @files = map { $_->[0] }
> >                    sort { $a cmp $b }
> >                    map {[$_, (stat("$dir/$_"))[9] ] }
> >                    grep { ! /^\./ && -f "$dir/$_" } readdir($dh);
> 
> my @files = map { $_->[0] }  # What happened there?
> 
>    sort { $a cmp $b } # I now this sorts stuff, but not sure how or
>                       # what kind of sort
> 
> perldoc -f cmp  Tells you only to find the meaning in perlop
> 
> perldoc perlop - /cmp
> 
> Looking at every appearance of `cmp'... I still do not know what the
> heck it does.  I did find out that it returns 1 or 0.  Seems kind of
> thin to help me understand what it does.
> 
>  map {[$_, (stat("$dir/$_"))[9] ] } ## I see a stat is done and
>                                     # modtime extracted what else
>                                     # happens here?

You're better off looking at a statement like this in reverse, or
at least in the "order of operations".

You obviously need to understand Perl syntax to understand that
so make sure you go through `perldoc perlsyn' as a starting
point, and I'm guessing `perldoc perlfunc' may be needed as well.
You will need to understand context too. Really I recommend any
Perl newbie work their way through the core perldocs until they
can read code without being confused anymore.

In the case of this:

    my @files = map { $_->[0] }
                       sort { $a cmp $b }
                       map {[$_, (stat("$dir/$_"))[9] ] }
                       grep { ! /^\./ && -f "$dir/$_" } readdir($dh);

An experienced Perl programmer would easily see these as a chain
of core function calls where arguments for each function occur
after its name (so the result of subsequent function calls is fed
into earlier ones when reading the statement left-to-right).

It takes advantage of being able to call subroutines without
parenthesis and of being able to pass a lexical block directly to
a subroutine. The perldocs explain the details of how all of that
works. If you don't know yet then you'll probably have to read
the docs and experiment to get a handle on it.

                   grep { ! /^\./ && -f "$dir/$_" } readdir($dh);

Initially this a call to readdir(). It is in list context so
readdir is going to read all remaining entries from the directory
handle $dh. Those will be passed into grep(). Grep is one of
those special subroutines that is sort of like an operator too
(you'll see in a minute that we're using a bunch of these).

    { ! /^\./ && -f "$dir/$_" }

This is a lexical blocK that is being passed to grep. The first
argument to grep is effectively code AKA a set of Perl statements
AKA just like a function. This is sometimes referred to as
"functional form" or a "higher order function". It is used by
grep to filter the list that follows down to matching elements.
The default variable, $_, is aliased to each element of the list
before executing the code block. It could be writen a couple of
other ways:

    sub my_filter { return ! /^\./ && -f "$dir/$_"i; }

    grep(\&my_filter, readdir($dh));

Or...

    grep(sub { ! /^\./ && -f "$dir/$_"i }, readdir($dh));

Or...

    grep ! /^\./ && -f "$dir/$_", readdir($dh);

Which one is best is mostly a matter of opinion. The first one is
really quite understandable, but requires quite a bit more code
and also pollutes the package namespace with a subroutine.
There's nothing wrong with that, but it may not be desirable. The
second one is practically the same, except that it avoids naming
the function and instead passes a reference to an anonymous
function (I also left out the explicit return keyword and
semi-colon, which are both optional in this case, though some
would prefer they were always used). The last one uses an
expression as the code argument (so no explicit block or function
is defined), but it still means the same thing.

I often prefer the last form, but depending on complexity it can
become overwhelming or even impossible. The original form is
often preferred. I tend to slide form expression, to block, to
named subroutine reference depending on complexity.

The code being passed to grep tests that the item doesn't begin
with a dot (i.e., it's not a hidden file in UNIX) and it is a
regular file. A more explicit way to say this could be:

    sub my_filter {
        my $is_dot_file = $_ =~ /^\./;
        my $is_reg_file = -f "$dir/$_";

        return ! $is_dot_file && $is_reg_file;
    }

So the grep will reduce the list of files from readdir($dh) down
to regular files that aren't hidden. Next up we have:

                       map {[$_, (stat("$dir/$_"))[9] ] }

This is another special function, map(), that is sort of like an
operator too. Again it is being called without explicit
parenthesis for the arguments and is being passed code as a block
which against aliases $_ to the elements of the list. Again we
have list context and it also takes the filtered list that grep
returned. While grep is used to filter the list, map is used to
transform it. Another way to say it would be:

    sub my_filter {
        my $is_dot_file = $_ =~ /^\./;
        my $is_reg_file = -f "$dir/$_";

        return ! $is_dot_file && $is_reg_file;
    }

    sub my_mapper {
        my @stats = stat("$dir/$_");
        my $mtime = $stats[9];
        my $tuple = [$_, $mtime];

        return $tuple;
    }

    my @files = readdir($dh);
    my @reg_files = grep \&my_filter, @files;

    map(\&my_mapper, @reg_files);

The code block passed to map is returning a tuple (a pair),
represented with an anonymous array, containing the name of the
file in its first element and its last modified timestamp (mtime)
in the its second element. A reference to the array representing
the tuple for each regular file is returned for each element in
the list. A list of these tuples or array references is what map
returns.

                       sort { $a cmp $b }

Next up we have sort(). Just like grep and map it is being called
without parenthesis, is accepting code as a block, and is
accepting the list of previously filtered and mapped files to
work on. Sort uses the code block to compare two elements within
the list at a time. It does this to determine the order in which
they should be sorted. The elements are aliased to $a and $b
within the block (so you should never use those names in your own
code).

If the block returns a negative number then it indicates that $a
comes before $b. If the block returns a positive number then it
indicates that $a comes after $b (in other words, $b comes before
$a). If it returns zero then it indicates that $a and $b are
considered equal for the purposes of the sort. Two core operators
that do this are <=> and cmp. <=> works with numbers and cmp
works with strings. So this sort is sorting stringwise in
ascending order (strings that are considered "less" come before
strings that are considered "more"). A stringwise sort is
typically based on a position comparison between the character
ordinals in the string.

That leaves us with this:

    sub my_filter {
        my $is_dot_file = $_ =~ /^\./;
        my $is_reg_file = -f "$dir/$_";

        return ! $is_dot_file && $is_reg_file;
    }

    sub my_mapper {
        my @stats = stat("$dir/$_");
        my $mtime = $stats[9];
        my $tuple = [$_, $mtime];

        return $tuple;
    }

    sub compare_files {
        return $a <=> $b;
    }

    my @files = readdir($dh);
    my @reg_files = grep \&my_filter, @files;
    my @file_tuples = map(\&my_mapper, @reg_files);

    sort \&compare_files, @file_tuples;

That won't quite work as we intended because we'd be comparing
the array references of our tuples. Dermot probably meant to
compare the mtime within the tuples. We can fix that easily
enough:

    sub compare_mtimes {
        my $lhs = $a->[1];
        my $rhs = $b->[1];

        return $lhs <=> $rhs;
    }

    sort \&compare_mtimes, @file_tuples;

Finally we have:

    my @files = map { $_->[0] }

Here we have another invocation of map() which again is getting
the results from the previous function call chain of grep, map,
and sort. We've already gone over what map does so we should be
able to figure out what this is doing already. Taking our sorted
list of regular file tuples and mapping them back to a list of
file names.

    sub my_filter {
        my $is_dot_file = $_ =~ /^\./;
        my $is_reg_file = -f "$dir/$_";

        return ! $is_dot_file && $is_reg_file;
    }

    sub my_mapper {
        my @stats = stat("$dir/$_");
        my $mtime = $stats[9];
        my $tuple = [$_, $mtime];

        return $tuple;
    }

    sub compare_mtimes {
        my $lhs = $a->[1];
        my $rhs = $b->[1];

        return $lhs <=> $rhs;
    }

    sub extract_filename {
        my $filename = $_->[0];

        return $filename;
    }

    my @files = readdir($dh);
    my @reg_files = grep \&my_filter, @files;
    my @file_tuples = map(\&my_mapper, @reg_files);
    my @sorted_tuples = sort \&compare_mtimes, @file_tuples;
    my @sorted_files = map \&extract_filenames, @sorted_tuples;

And there you have it. A very verbose way to write this:

    my @files = map $_->[0],
            sort { $a->[1] <=> $b->[1] }
            map [$_, (stat("$dir/$_"))[9]],
            grep !/^\./ && -f "$dir/$_",
            readdir($dh);

Or put another way, this is a very terse way to write that.

I hope this wall of text helps. For the most part, you just need
to spend some quality time with The Revelant Manuals (TFM).

Regards,

-- 
Brandon McCaig <bamcc...@gmail.com> <bamcc...@castopulence.org>
Castopulence Software <https://www.castopulence.org/>
Blog <http://www.bambams.ca/>
perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }.
q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.};
tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say'

signature.asc
Description: Digital signature

Re: How does opendir/readdir process files

Reply via email to