Hi, Joshua, :) On Thu, 16 Jan 2003, Scott, Joshua wrote:
> I've got a CSV file which I need to process. The format is as follows. > > "Smith, John J",1/1/2002,1/15/2002,"Orlando, FL",Florida > "Doe, John L",1/1/2002,1/15/2002,Los Angeles, California > > I've tried splitting it using: @row = split(",",$data); > The problem is with the fields that contain the commas between the > quotes. It's splitting the fields at each of these fields as well > and I'd like to know how to avoid that. The suggestions for using a module tailored for this purpose are the way to go. However, as a learning exercise, here's what I came up with to satisfy your requirements: #!/usr/bin/perl use strict; use warnings; # Split CSV lines, which may have commands embedded in quoted strings. my @lines = ( q("Smith, John J",1/1/2002,1/15/2002,"Orlando, FL",Florida), q("Doe, John L",1/1/2002,1/15/2002,Los Angeles, California) ); my @fields; my $qs = q("'); my $sep = ","; use re 'debug'; foreach (@lines) { # Simple split for strings that don't contain quotes. if( index( $_, q(") ) == -1 and index( $_, q(') ) == -1 ) { push @fields, [ split( ',', $_ ) ]; } # Regex for others. print "$_\n"; my @matches; while( / # EITHER: ([$qs]) # A quote character. ([^$qs]+?) # Followed by a bunch of non-quote chars. \1 # And ending with the same non-quote char. | # OR: $sep? # Optionally the separator character. ([^$sep]+?) # Followed by a bunch of non-separator chars. (?:$sep|$) # Then the end of the string or the separator char. /gx ) { print "\$2 = $2; \$3 = $3\n"; # Throw away $1 - only used to bracket embedded quotes. push( @matches, $2 || $3 ); } push @fields, \@matches if @matches; } print "@{$_}\n" foreach @fields; Hope that is enlightening. I'm sure there are better ways of doing it, but I'm hardly an "expert" myself! ---Jason -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]