Hi All, I'm trying to parse data from a report file and I'm having trouble producing desired results.
Here is a data example from the report: PONumber Line InvoicedQty UnitCost Amount Curr Extended Amount Fr Date Company Department Account ItemNum ItemDescription --------- ------------------------- ------ --------------------- ------------------ 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA 1026244-0000 1 1.0000 284.2525 284.25 USD 284.25 03A 10/29/10 213 31000 - 10810- 279784 BAGS CRUSHER Vendor: 1338 SHARPE LINES INC Buyer:A02 VALERIE BAGALA 1024877-0000 1 4.0000 140.4800 561.92 USD 561.92 03A 10/26/10 213 31000 - 10810- 235228 SYR 1ML AMBER W/ TIP CAP Vendor: 2472 BAXA CORP Buyer:A02 VALERIE BAGALA 1000066-0000 1 .9000 241.6845 217.52 USD 217.52 03A 05/19/10 213 41000 - 10810- 145155 NDL JAMSHIDI 11GA DISP STR Vendor: 2686 CARDINAL HEALTH 200 INC Buyer:A04 DEAN SCHUMACHER --------- ------------------------- ------ --------------------- ------------------ A complete record in the report expands to more than 1 line. Each record begin with a line starting with 4 to 7 digits and -0000 each record ends with a line contains the word Vendor:' I need to extract some elements from each record and combine the extracted data in one new record (1 line per record) so that later I can bcp the new data into a table/database. Here is my code: #!/usr/bin/perl -w use strict; my $PO_file = "/home/sybase/scripts125/pl/test/simple_SH135.dat"; open(IN,"$PO_file") || die "Fail open $PO_file"; my ($line, @part1, @part2, @part3, $rec_part1, $rec_part2, $rec_part3, $complete_record); print "Need to extract the following data\n"; print "PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|Description|VendorID\n"; while ($line=<IN>) { chomp $line; $line =~ s/^\s+//go; $line =~ s/\s+/ /go; $line =~ s/,//go; # Part1 # Data Example: 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34 ########################################################################Part1: if ($line =~ /-0000/){ #if ($line =~ /^\d{7}-0000/){ my @part1 = split(/\s+/,$line); my $PONumber = $part1[0]; my $Quantity = $part1[2]; my $UnitCost = $part1[3]; my $ExtAmt = $part1[6]; $rec_part1 = join "|",($PONumber, $Quantity, $UnitCost, $ExtAmt); }# end part 1 ### Part2: #Data Example: 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN ###################################################################### if ($line =~ /(\d{2}\/\d{2}\/\d{2})/){ #if ($line =~ /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A 10/13/10 213 31000 #why not work? my @part2 = split(/\s+/,$line,8); # last group has multiple words is descriptions my $PurchFr = $part2[0] ; my $Date = $part2[1] ; my $Company = $part2[2] ; my $Dept = $part2[3] ; my $Acct = $part2[5] ; my $ItemNumber = $part2[6] ; my $Desc = $part2[7]; #$rec_part2 = join '|',($part2[1],$part2[2] ,$part2[6],$part2[7]); $rec_part2 = join '|',($Date,$Company ,$ItemNumber,$Desc); #print "rec_part2: $rec_part2 \n"; }# end Part2 ##Part3: VendorID # Data Example: Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA ###################################################################### if ($line =~ /^Vendor:/){ my @part3 = split(/\s+/,$line); my $VendorID = $part3[1]; $rec_part3 = $VendorID; } #end part3 $complete_record = "$rec_part1".'|'."$rec_part2".'|'."$rec_part3"; print "$complete_record\n"; }#end while My questions: I expect my program to produce 4 records, like: PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|Description|VendorID 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMBD ART LN|1288 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686 However it produces unwanted results, like below. Any pointers would be greatly appreciated. 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN| 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMBD ART LN|1288 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1288 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338 1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS CRUSHER|1338 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|1338 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2472 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/