> -----Original Message-----
> From: Dermot Paikkos [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, November 14, 2001 6:22 AM
> To: [EMAIL PROTECTED]
> Subject: REGEX or Parse a text file
> 
> 
> Hi Perlgurus,
> I am having trouble getting the data I want out of a text 
> file: The file has
> this sturcture:
> 
> File Name: m:\a\a084099.jpg
>   Width x Height: 2480 x 2062
>   Number of Colours: True Colour (24 bits)
>   Dots per inch: 300 x 300
>   Image size (inches): 8.27 x 6.87
>   Raw size: 15341280  Actual size: 1203289   (Compression 
> ratio: 12.8:1)
> 
> There a thousand or so entries. Each entry is separated by a 
> blank new line.
> I want to get just the file name and the last line. My 
> troulbe is I can't
> manage to parse it correctly. The best I have been able to get is:
> m:\a\a084100.jpg  Uncompressed Size: 15341280  Actual size: 1203289
> m:\a\a084100.jpg  Uncompressed Size: 15341280  Actual size: 1203289
> m:\a\a084100.jpg  Uncompressed Size: 15341280  Actual size: 1203289
> m:\a\a084100.jpg  Uncompressed Size: 15341280  Actual size: 1203289
> m:\a\a084100.jpg  Uncompressed Size: 15341280  Actual size: 1203289
> m:\a\a084100.jpg  Uncompressed Size: 15006480  Actual size: 1251205
> m:\a\a084100.jpg  Uncompressed Size: 15006480  Actual size: 1251205
> 
> The lines get repeated 7 times before getting the next entry. I used:
> 
> while (<REPORT>) {
>     if (/File/gc) {            # I tried with or without gc 
> but it made no
> difference.
>  $name = $';
>         chomp($name);
>  ($file = $name) =~ s/ Name: //;
>     }
>     if (/Raw/) {
>  $size = $';
>  chomp($size);
>  ($foo = $size) =~ s/\((Compression...*)//;
>  ($bar = $foo) =~ s/size/Uncompressed Size/;
>     }
> 
>     print OUTPUT "$file $bar \n";

Since you have blank lines between each entry, you can have Perl
read each entry "paragraph" as a single string by setting $/ to "".

Then just use a regex to grab the items you need.

Give this a try:

  $/ = "";              # read "paragraphs"
  while (<REPORT>) {
    s/\s+$//;
    my ($f, $r, $a) = /File Name: (.*?)\n.*Raw.*?(\d+).*Actual.*?(\d+)/s;
    ...
  }

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to