RE: how to parse blank-line-separated records

Bob Showalter Mon, 15 Sep 2003 07:00:41 -0700

David T-G wrote:
> Hi, all --
> 
> I'm wrestling with a data file containing owners and contact
> info and it
> suddenly occurred to me that I could probably change my
> record separator
> from \n to \n\n (a blank line) and grab the whole record that way.


Yes, or set $/ = '' to get "paragraph" mode. That's a little more flexible,
as Perl will treat any sequence of multiple blank lines as a record
separator.

> Assuming I figure out how to do that, then how do I match the pieces?
> 
> The file looks a lot like
> 
>   header stuff
>     code         unit
>                  owner                   home_phone  work_phone      
>                  addr city, st  zip
> 
> where any of the phone numbers or the addresses might be
> missing, but we
> can count on the column positions for formatting (and thus parsing).
> 
> So I probably go through a
> 
>   while (<>)
> 
> loop and it sucks in each record for me, but then how do I
> match to get
> the various pieces -- around the newlines?

If the paragraphs have a definite fixed format, usually unpack() is the
easiest way to grab the data. Use 'x' in your pattern to skip over bytes,
and 'A' to extract a sequence of bytes. So, if you want to skip 20 chars,
then grab 8 chars, then skip 32 chars, then grab 15 chars, you use:

   my @fields = unpack('x20 A8 x32 A15', $record);

(use lowercase 'a' instead of 'A' if you want to preserve trailing blanks on
the fields you extract.)

Otherwise, you can construct regexes that grab what you're looking for.
Depending on your regex, you might need to use the /s and/or /m modifiers to
change the way ^, $, and . match within the multi-line string.

> 
> Yes, sample code would be welcome :-)  So would pointers to where this
> has been done before; I'm just not finding it as I read the
> code examples
> from _Programming_ (2e) this morning :-(

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: how to parse blank-line-separated records

Reply via email to