On 9/17/07, Jonathan Lang <[EMAIL PROTECTED]> wrote:
snip
> Most of the replies have suggested using 'split( /\|/, $line )'.
> However, this ignores a potentially important aspect of common cvs
> file formats - well, important to me, anyway - which is the
> interaction between quotes, field delimiters, and newlines:
snip
This is because most of the time you see pipe delimited files they
aren't really full blown CSV type files; they are usually just fields
delimited by pipes (ie neither pipes nor end-of-line characters are
not allowed in fields). If you have some psuedo-CSV like pipe
delimited file you can (after beating the person upstream who decided
to use a custom file format instead of XML or CSV) do something like
the following.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @recs; #holds completed records
my @fields; #holds the record being built
my @leftovers; #holds unprocessed field pieces
while (<DATA>) {
#the data to process are the the leftover pieces
#of the last line and the current line split on
#pipe (but keep the pipes, they might be part of
#a string)
my @data = (@leftovers, split /(\|)/);
#while there are still pieces to process
while (@data) {
#if the current piece does not
#start with a quote we can treat
#it normally
unless ($data[0] =~ /^\s*"/) {
#remove this field from the
#unprocess pieces
my $field = shift @data;
#skip it if it is a pipe
next if $field =~ /^\|$/;
#remove the \n if this is
#the last piece
chomp $field if @data == 0;
#shove the field onto the
#record being built
push @fields, $field;
#and start again with the next piece
next;
}
#Fields that start with a quote require special
#handling. These fields are not complete until
#they have an even number of quotes
my $i = 0;
my $quotes = 0;
while ($i <= $#data and ($quotes == 0 or $quotes % 2)) {
$quotes += $data[$i++] =~ y/"//;
}
#if the number of quotes are not even
#then all of these pieces go at the start
#of the next line
last if $quotes % 2;
#if the number of quotes are even then
#join all of the pieces that make it even
#and remove them from the unproccessed
#pieces
my $field = join '', splice @data, 0, $i;
#remove the outer quotes
$field =~ s/\s*"(.*)"\s*/$1/s;
#turn the quoted quotes into normal quotes
$field =~ s/""/"/gs;
#add this field to the record that is
#being built
push @fields, $field;
#and start again with the next piece
next;
}
unless (@leftovers = @data) {
#if there are no leftovers then
#the record is finished
push @recs, [EMAIL PROTECTED];
@fields = ();
}
}
print Dumper [EMAIL PROTECTED];
__DATA__
"Harry|Sally"|Sleepless
Jack|"Jill
""Walker"""
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/