On Aug 13, 2005, at 8:22 AM, Offer Kaye wrote:

I have a text file with columns, where the columns may not be aligned,
and not all lines may have data in all columns:

header1     header2     header3    header4
------------------------------------------------------------
l1dat1        l1dat2        l1dat3      l1dat4
l2dat1                                        l2dat4
l3veryveryveryverylongdat1 l3dat2

As you can see, line1 has all data, line2 is missing clomuns 2 and 3,
line 3 is a mess :)

Any thoughts on parsing such a "table"?


If you assume that each field must be at least 10 characters and separated by at least one space, then a pattern like this may help define a single field:

  /(.{10}\S*)\s/

For example:

echo '
header1     header2     header3    header4
------------------------------------------------------------
l1dat1        l1dat2        l1dat3      l1dat4
l2dat1                                        l2dat4
l3veryveryveryverylongdat1 l3dat2
' |
sed -e 's/$/                                                 /' |
perl -lne '$re=qr/^(.{10}\S*)\s(.{10}\S*)\s(.{10}\S*)\s(.{10,})/;
@F = $_ =~ $re ; print join("--", @F);'

The sed is there to add some extra space padding to the end of short lines. You can then trim the individual fields to remove leading/ trailing spaces.

Does that get you close to what you were trying to do?

Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software.  Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to