On Aug 13, 2005, at 8:22 AM, Offer Kaye wrote:
I have a text file with columns, where the columns may not be aligned,
and not all lines may have data in all columns:
header1 header2 header3 header4
------------------------------------------------------------
l1dat1 l1dat2 l1dat3 l1dat4
l2dat1 l2dat4
l3veryveryveryverylongdat1 l3dat2
As you can see, line1 has all data, line2 is missing clomuns 2 and 3,
line 3 is a mess :)
Any thoughts on parsing such a "table"?
If you assume that each field must be at least 10 characters and
separated by at least one space, then a pattern like this may help
define a single field:
/(.{10}\S*)\s/
For example:
echo '
header1 header2 header3 header4
------------------------------------------------------------
l1dat1 l1dat2 l1dat3 l1dat4
l2dat1 l2dat4
l3veryveryveryverylongdat1 l3dat2
' |
sed -e 's/$/ /' |
perl -lne '$re=qr/^(.{10}\S*)\s(.{10}\S*)\s(.{10}\S*)\s(.{10,})/;
@F = $_ =~ $re ; print join("--", @F);'
The sed is there to add some extra space padding to the end of short
lines. You can then trim the individual fields to remove leading/
trailing spaces.
Does that get you close to what you were trying to do?
Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software. Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>