Re: csv multi-line to single-line

Mr. Shawn H. Corey Wed, 24 Sep 2008 05:18:54 -0700

On Wed, 2008-09-24 at 03:17 -0700, John W. Krahn wrote:
> [EMAIL PROTECTED] wrote:
> > Hi,
> 
> Hello,
> 
> > We receive a text file with the following entries.
> > 
> > "000001","item1","apple one","apple two","apple three"
> > "000002","item2","body one","body two","body three"
> > "000003","item2","body one","body two","body three"
> > "000004","item2","body one","body two","body three"
> > "000005","item1","orange one","orange two","orange three"
> > "000006","item2","body one","body two","body three"
> > "000007","item2","body one","body two","body three"
> > "000008","item2","body one","body two","body three"
> > "000009","item2","body one","body two","body three"
> > "000010","item2","body one","body two","body three"
> > 
> > How do I use perl to convert the above to the following?  I'm a novice
> > perl user.
> > 
> > "apple three","body one","body two","body three"
> > "apple three","body one","body two","body three"
> > "apple three","body one","body two","body three"
> > "orange three","body one","body two","body three"
> > "orange three","body one","body two","body three"
> > "orange three","body one","body two","body three"
> > "orange three","body one","body two","body three"
> > "orange three","body one","body two","body three"
> 
> $ echo '"000001","item1","apple one","apple two","apple three"
> "000002","item2","body one","body two","body three"
> "000003","item2","body one","body two","body three"
> "000004","item2","body one","body two","body three"
> "000005","item1","orange one","orange two","orange three"
> "000006","item2","body one","body two","body three"
> "000007","item2","body one","body two","body three"
> "000008","item2","body one","body two","body three"
> "000009","item2","body one","body two","body three"
> "000010","item2","body one","body two","body three"' | \
> perl -lne'
>      my @data = split /,/;
>      if ( $data[ 1 ] eq q/"item1"/ ) {
>          $field1 = $data[ 4 ];
>          }
>      elsif ( $data[ 1 ] eq q/"item2"/ ) {
>          print join q/,/, $field1, @data[ 2, 3, 4 ];
>          }
> '
> "apple three","body one","body two","body three"
> "apple three","body one","body two","body three"
> "apple three","body one","body two","body three"
> "orange three","body one","body two","body three"
> "orange three","body one","body two","body three"
> "orange three","body one","body two","body three"
> "orange three","body one","body two","body three"
> "orange three","body one","body two","body three"



You have two problems here: 1) parsing the CSV file and 2) rearranging
the data.

There is no standard for CSV files.  There is a MIME type; its
definition is available at http://tools.ietf.org/html/rfc4180

Data can be categorized by how it has to be parsed.  The simplest is
context-free data.  An example is tab-separated values (TSV).  These can
be parsed using only regular expressions.

The next complex type is bounded-recursive contexts.  An example is CSV.
These require a finite-state automation (FSA).  FSAs are also called
state machines.

The most complex are unbounded-recursive contexts.  An example is the
algebra expressions you learnt in high school.  These require a FSA with
a push-down stack.

To create a state machine for CVS:

1.  Identify all the contexts.

2.  Identify all the symbols in each context.

3.  Identify all the transitions from one context to another.

4.  Identify the start and end states.

5.  Create the code to implement the state machine.

Or download an appropriate module from CPAN http://search.cpan.org/ that
does all this for you.  And no, I'm not going to recommend one because
CSV is not standardize.  You'll have to decide which one fits your
needs.


As for your second problem, I think John has answered it.

-- 
Just my 0.00000002 million dollars worth,
  Shawn

Linux is obsolete.
-- Andrew Tanenbaum


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: csv multi-line to single-line

Reply via email to