John and Rob, Thanks a lot for the pointers. Very helpfull. --- On Thu, 1/27/11, Rob Dixon <rob.di...@gmx.com> wrote:
> From: Rob Dixon <rob.di...@gmx.com> > Subject: Re: parse data from a report file > To: beginners@perl.org > Cc: "loan tran" <loan...@yahoo.com> > Date: Thursday, January 27, 2011, 11:33 AM > On 27/01/2011 06:58, loan tran > wrote: > > Hi All, > > > > I'm trying to parse data from a report file and I'm > having trouble producing desired results. > > > > Here is a data example from the report: > > > > > PONumber Line > InvoicedQty UnitCost Amount > Curr Extended Amount > > Fr Date Company > Department Account ItemNum > ItemDescription > > --------- ------------------------- ------ > --------------------- ------------------ > > 1023112-0000 > > 1 1.0000 102.3419 102.34 > USD 102.34 > > 03A 10/13/10 213 31000 - > 10810- 138328 > ARMBD ART LN > > Vendor: 1288 ALIMED > > Buyer:A02 VALERIE BAGALA > > > > 1026244-0000 > > 1 1.0000 284.2525 284.25 > USD 284.25 > > 03A 10/29/10 213 31000 - > 10810- 279784 > BAGS CRUSHER > > Vendor: 1338 SHARPE LINES > INC > Buyer:A02 VALERIE BAGALA > > > > 1024877-0000 > 1 > 4.0000 140.4800 561.92 > USD 561.92 > > 03A 10/26/10 213 31000 - > 10810- 235228 > SYR 1ML AMBER W/ TIP CAP > > Vendor: 2472 BAXA CORP > > Buyer:A02 VALERIE BAGALA > > > > 1000066-0000 > 1 .9000 241.6845 > 217.52 USD 217.52 > > 03A 05/19/10 213 41000 - 10810- > 145155 NDL JAMSHIDI 11GA DISP STR > > Vendor: 2686 CARDINAL HEALTH > 200 INC > Buyer:A04 DEAN SCHUMACHER > > > > --------- ------------------------- ------ > --------------------- ------------------ > > > > > > A complete record in the report expands to more than 1 > line. > > Each record begin with a line starting with 4 to 7 > digits and -0000 > > each record ends with a line contains the word > Vendor:' > > > > I need to extract some elements from each record and > combine the extracted data in one new record (1 line per > record) > > so that later I can bcp the new data into a > table/database. > > > > Here is my code: > > > > #!/usr/bin/perl -w > > use strict; > > > > my $PO_file = > "/home/sybase/scripts125/pl/test/simple_SH135.dat"; > > open(IN,"$PO_file") || die "Fail open $PO_file"; > > > > my ($line, @part1, @part2, @part3, $rec_part1, > $rec_part2, $rec_part3, $complete_record); > > > > print "Need to extract the following data\n"; > > print > "PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|Description|VendorID\n"; > > > > while ($line=<IN>) { > > chomp $line; > > $line =~ s/^\s+//go; > > $line =~ s/\s+/ /go; > > $line =~ s/,//go; > > > > # Part1 > > # Data Example: > 1023112-0000 > 1 1.0000 102.3419 102.34 > USD 102.34 > > > ########################################################################Part1: > > if ($line =~ /-0000/){ > > #if ($line =~ /^\d{7}-0000/){ > > my @part1 = > split(/\s+/,$line); > > my $PONumber = > $part1[0]; > > my $Quantity = > $part1[2]; > > my $UnitCost = > $part1[3]; > > my $ExtAmt = > $part1[6]; > > $rec_part1 = join > "|",($PONumber, $Quantity, $UnitCost, $ExtAmt); > > }# end part 1 > > > > > > ### Part2: > > #Data Example: 03A 10/13/10 > 213 31000 - > 10810- 138328 > ARMBD ART LN > > > ###################################################################### > > if ($line =~ /(\d{2}\/\d{2}\/\d{2})/){ > > #if ($line =~ > /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A > 10/13/10 213 31000 #why not work? > > my @part2 = > split(/\s+/,$line,8); # last group has multiple words > is descriptions > > my $PurchFr = $part2[0] ; > > my $Date = $part2[1] ; > > my $Company = $part2[2] ; > > my $Dept = $part2[3] ; > > my $Acct = $part2[5] ; > > my $ItemNumber = $part2[6] > ; > > my $Desc = $part2[7]; > > #$rec_part2 = join > '|',($part2[1],$part2[2] ,$part2[6],$part2[7]); > > $rec_part2 = join > '|',($Date,$Company ,$ItemNumber,$Desc); > > #print "rec_part2: > $rec_part2 \n"; > > }# end Part2 > > > > > > ##Part3: VendorID > > # Data Example: Vendor: > 1288 ALIMED > Buyer:A02 VALERIE BAGALA > > > ###################################################################### > > if ($line =~ /^Vendor:/){ > > my @part3 = split(/\s+/,$line); > > my $VendorID = $part3[1]; > > $rec_part3 = $VendorID; > > } #end part3 > > > > $complete_record = > "$rec_part1".'|'."$rec_part2".'|'."$rec_part3"; > > print "$complete_record\n"; > > > > }#end while > > > > My questions: > > > > I expect my program to produce 4 records, like: > > > > > PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|Description|VendorID > > > 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS > CRUSHER|1338 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL > JAMSHIDI 11GA DISP STR|2686 > > > > However it produces unwanted results, like below. Any > pointers would be greatly appreciated. > > > > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD > ART LN| > > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMBD > ART LN|1288 > > > 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS > CRUSHER|1288 > > > 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS > CRUSHER|1338 > > > 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS > CRUSHER|1338 > > > 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS > CRUSHER|1338 > > > 1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS > CRUSHER|1338 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|1338 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR > 1ML AMBER W/ TIP CAP|2472 > > > 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL > JAMSHIDI 11GA DISP STR|2472 > > > 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL > JAMSHIDI 11GA DISP STR|2686 > > > 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL > JAMSHIDI 11GA DISP STR|2686 > > My thoughts are of a much simpler solution. I hope the > program below helps. > > Rob > > use strict; > use warnings; > > my $po_file = > '/home/sybase/scripts125/pl/test/simple_SH135.dat'; > > while (<$in>) { > > chomp; > next unless /-0000/; > > my @record = (split)[0, 2, 3, 6]; > > $_ = <DATA>; > push @record, (split)[1, 2, 6, 7]; > > $_ = <DATA>; > die "No 'Vendor' found in record" unless > /Vendor:/; > push @record, (split)[1]; > > print join('|', @record), "\n"; > } > > **OUTPUT** > > 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD|1288 > 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS|1338 > 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR|2472 > 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL|2686 > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/