On 9/17/07, Jonathan Lang <[EMAIL PROTECTED]> wrote:
snip
> Most of the replies have suggested using 'split( /\|/, $line )'.
> However, this ignores a potentially important aspect of common cvs
> file formats - well, important to me, anyway - which is the
> interaction between quotes, field delimiters, and newlines:
snip

This is because most of the time you see pipe delimited files they
aren't really full blown CSV type files; they are usually just fields
delimited by pipes (ie neither pipes nor end-of-line characters are
not allowed in fields).  If you have some psuedo-CSV like pipe
delimited file you can (after beating the person upstream who decided
to use a custom file format instead of XML or CSV) do something like
the following.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my @recs; #holds completed records
my @fields; #holds the record being built
my @leftovers; #holds unprocessed field pieces
while (<DATA>) {
        #the data to process are the the leftover pieces
        #of the last line and the current line split on
        #pipe (but keep the pipes, they might be part of
        #a string)
        my @data = (@leftovers, split /(\|)/);

        #while there are still pieces to process
        while (@data) {
                #if the current piece does not
                #start with a quote we can treat
                #it normally
                unless ($data[0] =~ /^\s*"/) {
                        #remove this field from the
                        #unprocess pieces
                        my $field = shift @data;
                        #skip it if it is a pipe
                        next if $field =~ /^\|$/;
                        #remove the \n if this is
                        #the last piece
                        chomp $field if @data == 0;
                        #shove the field onto the
                        #record being built
                        push @fields, $field;
                        #and start again with the next piece
                        next;
                }
                #Fields that start with a quote require special
                #handling.  These fields are not complete until
                #they have an even number of quotes
                my $i      = 0;
                my $quotes = 0;
                while ($i <= $#data and ($quotes == 0 or $quotes % 2)) {
                        $quotes += $data[$i++] =~ y/"//;
                }
                #if the number of quotes are not even
                #then all of these pieces go at the start
                #of the next line
                last if $quotes % 2;
                #if the number of quotes are even then
                #join all of the pieces that make it even
                #and remove them from the unproccessed
                #pieces
                my $field = join '', splice @data, 0, $i;
                #remove the outer quotes
                $field =~ s/\s*"(.*)"\s*/$1/s;
                #turn the quoted quotes into normal quotes
                $field =~ s/""/"/gs;
                #add this field to the record that is
                #being built
                push @fields, $field;
                #and start again with the next piece
                next;
        }
        unless (@leftovers = @data) {
                #if there are no leftovers then
                #the record is finished
                push @recs, [EMAIL PROTECTED];
                @fields = ();
        }
}

print Dumper [EMAIL PROTECTED];

__DATA__
"Harry|Sally"|Sleepless
Jack|"Jill
""Walker"""

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to