Re: Problems matching or parsing with delimiters in text

Chris Devers Mon, 28 Mar 2005 08:39:59 -0800

On Mon, 28 Mar 2005, KEVIN ZEMBOWER wrote:

> I'm trying to read in text lines from a file that look like this:
> "B-B01","Eng","Binder for Complete Set of Population Reports",13,0
> "C-CD01","Eng","The Condoms CD-ROM",12,1
> "F-J41a","Fre",,13,1
> "F-J41a","SPA",,13,1
> "M-FC01","Eng","Africa Flip Charts- Planning Your Family (E,F, 
> Swahili)(12""x9"")",7,1
> "M-FC01","Fre","Africa Flip Charts- Planning Your Family (E,F, 
> Swahili)(12""x9"")",7,1
> 
> The first two lines are typical of most of the file. The second two 
> have a blank third field and the last two show embedded commas and 
> escaped double quotes in the third field. This is an output of another 
> program, but I can filter it and make substitutions if that makes 
> anything easier.
> 
> I'm trying to parse it with these statements:
>
> while (<>) { # While there are more records in the inventory export file 
> called on the command line
>    ++$ln; #increment the line number count
>    my ($partno, $language, $title, $cost, $available) = 
> m["(.*)","(.*)","?(.*?)"?,(.*),(.*)$];
>    print "PN=$partno, L=$language, T=$title, C=$cost, A=$available\n" if 
> $debug;
>    next if $debug;
>    createlangversion($partno, $language, $title, $cost, $available);
> } #while there are more lines in the import data file


No. Use split(). This problem is what it's for.

    while (<>) {
        $ln++; # postfix increment is more common & so readable
        my ($partno, $language, $title, $cost, $available) =
            split(',', $_);
        if $debug {
            print "PN=$partno, L=$language, T=$title, C=$cost, A=$available\n";
            next;
        }
        createlangversion($partno, $language, $title, $cost, $available);
    }

This should be both easier and more robust than hand-matching the line 
with a regex.

Note though that the comma-separated values (CSV) format you're using is 
infamous for being deceptively simple. If one of the fields in your file 
itself has an embedded comma, then parsing it immediately gets much 
harder to do. For example, if you had this record:

    "C-CD02","Eng","The Condoms CD-ROM, Second Edition",12,1

Then everything falls apart. 

You could try to fix this by writing code to detect these situations, 
but it's really annoying to get right. You're *much* better off by 
turning to a module to do the work for you. Two popular ones for this 
are DBD::CSV, which allows you to write DBI code that treats your CSV 
data file as if it were a table in a database, and Text::CSV (or, if you 
can run it, the optimised Text::CSV_XS, which is written in C rather 
than Perl and so is much faster). For information about these, see:

    <http://search.cpan.org/dist/DBD-CSV/lib/DBD/CSV.pm>

    <http://search.cpan.org/~alancitt/Text-CSV-0.01/CSV.pm>
    <http://search.cpan.org/~jwied/Text-CSV_XS/CSV_XS.pm>

Good luck...


-- 
Chris Devers      [EMAIL PROTECTED]
http://devers.homeip.net:8080/blog/

np: 'Missed Me'
     by The Dresden Dolls
     from 'A Is For Accident'

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Problems matching or parsing with delimiters in text

Reply via email to