Re: Select all files in a dir for processing?

Jeff 'japhy' Pinyan Thu, 18 Oct 2001 10:52:30 -0700

On Oct 18, Bill Akins said:

>I need to read in every file in a dir into a new file.  I can do it like
>below if I know the file names, but how can I specify *.CSV?  Actually there
>will be only the data files I need to read in the dir so *.* would work as
>well.  Perhaps I should also mention this is running on Win 2K.


To get the entries from a directory, you can use the opendir(), readdir(),
and closedir() functions:

  opendir DIR, "c:/files" or die "can't read c:/files: $!";
  @entries = readdir DIR;
  closedir DIR;

That will put all the entries (files AND directories and other things) in
the array.  Note that it only puts the NAME of the entry (not the path),
so you'll have "foo" and "bar", not "c:/files/foo" and "c:/files/bar".  If
you want those, you can add them like so:

  @entries = map "c:/files/$_", readdir DIR;

You'll probably want to skip all the directories (like . and ..), so you
can add a grep() statement to filter them out:

  @entries = grep !-d, map "c:/files/$_", readdir DIR;

The -d file-test checks a variable (in this case, $_) to see if it is a
directory, and we use ! to take the opposite.  This means, in english:

  get the entries from the directory...
  prepend the path to each of them...
  accept only non-directories...
  and store them in @entries

Notice how it reads backwards.

If you want, there's a shorter mechanism that may provide what you
need.  While you can use a regex to make sure the filename ends in .CSV:

  @entries = grep !-d && /\.CSV\z/, map "c:/files/$_", readdir DIR;

you might find it easier just to use the glob() function:

  @entries = grep !-d, glob("c:/files/*.CSV");

That ONE line takes the place of the opendir(), readdir(), and
closedir() lines.  It also takes care of prepending the path name to the
files (so no map() is required).  The grep() is still there because I'm
paranoid, and you might have a directory named "foo.CSV".  (I know you
probably don't, but like I said, I'm paranoid.)

To read more about these functions, check the 'perlfunc' documentation:

  perldoc perlfunc
  perldoc -f opendir
  perldoc -f readdir
  perldoc -f closedir
  perldoc -f glob

or go online to http://www.perl.com, http://www.perldoc.org/, or
http://www.perldoc.com/.

Now, for your file concatenation...

>open(F1,"C:/Temp/0912.CSV") || die "Couldn't open F1.\n"; #Open File 1

I'm glad you have error checking, but it's a good idea to include the
$! variable in the die() message -- it explains what went wrong.  See my
opendir() example above.

>open(F2,"C:/Temp/0913.CSV") || die "Couldn't open F2.\n"; #Open File 2
>open(F3,"C:/Temp/0914.CSV") || die "Couldn't open F3.\n"; #Open File 3
>open(F4,"C:/Temp/0916.CSV") || die "Couldn't open F4.\n"; #Open File 4
># Yada yada ya...
>open(STAT,">C:/Temp/Stats.CSV") || die "Couldn't open STAT.\n"; #Create
>Stats file
>while(<F1>) {
>       print STAT; #Print F1 -> Stats
>}

[snip]

Now you get to learn about some of the magic that Perl has to offer!  It
combines the @ARGV array with the use of an empty <> operator.  When you
use <>, and @ARGV has values in it, Perl uses those values as names of
files to read from, one by one.  So let's prepare for the magic.

  @ARGV = grep !-d, glob "c:/files/*.CSV";

Now we can read from them and print each file in sequence to some output
file:

  open OUT, "> c:/files/all.txt"
    or die "can't create c:/files/all.txt: $!";

  while (<>) {
    print OUT;
  }

  close OUT;

That was wonderfully simple!

>I would also like to strip the first row from every file EXCEPT the first,
>or possibly from all and insert header row into Row 1 of resulting stats.csv
>file.  Any suggestions on that?

This requires some more effort, but can be explained without causing too
bad a headache. ;)

The eof() function tells us when we've reached the end of a file; there is
a special use of it -- eof -- that checks the last file we have read from.
In addition to this function, there is a builtin variable, $., which is
the line number of the line from the most recently read-from filehandle.

The problem with $. is it doesn't get reset if you open another
file... with the same filehandle... without closing the first one:

  open FOO, "file_with_2_lines";
  while (<FOO>) { print "$.: $_" }
  open FOO, "file_with_4_lines";
  while (<FOO>) { print "$.: $_" }

This will give output like:

  1: A first line
  2: A second line
  3: B first line
  4: B second line
  5: B third line
  6: B fourth line

If we had a close(FOO) before the second open() call, the numbers would
have started at 1 again for file_with_4_lines.

WHAT DOES THIS MEAN FOR YOU?  Well, the magical <> reads read from a
filehandle called ARGV, but ARGV doesn't get closed between files.  That
means that doing

  while (<>) {
    print "$.: $_"
  }

would start with 1: and keep going up -- it would never reset to 1: again,
unless you manually closed the ARGV filehandle.  But how can we close ARGV
at the right time?

That's right -- eof.

  while (<>) {
    print "$.: $_";
    close ARGV if eof;
  }

Now the numbering looks right.  Your problem is related to this.  Let's
say we want to skip the first line of every file:

  while (<>) {
    if ($. != 1) { print OUT }
    close ARGV if eof;
  }

Now our output has skipped the first line of every file.  Hrm... but what
if we REALLY wanted to keep the first line of the first file?  Here's
where more about @ARGV comes into play.  When you start reading from <>,
Perl removes the first element of @ARGV, and stores it in $ARGV (the name
of the file you're reading from).  This means that if we save the first
filename to another variable, we can compare them in the loop:

  ($first) = @ARGV = grep !-d, glob "c:/files/*.CSV";

  while (<>) {
    if ($ARGV eq $first or $. != 1) { print OUT }
    close ARGV if eof;
  }

The logic is:

  if we're dealing with the first file...
  or the line number is not 1...
  print the line

which sounds like what you wanted.

I hope this explanation has been thorough and understandable. :)

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Select all files in a dir for processing?

Reply via email to