On 27/01/2011 06:58, loan tran wrote:
Hi All,
I'm trying to parse data from a report file and I'm having trouble producing
desired results.
Here is a data example from the report:
PONumber Line InvoicedQty UnitCost Amount Curr Extended Amount
Fr Date Company Department Account ItemNum ItemDescription
--------- ------------------------- ------ ---------------------
------------------
1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
1026244-0000 1 1.0000 284.2525 284.25 USD 284.25
03A 10/29/10 213 31000 - 10810- 279784 BAGS CRUSHER
Vendor: 1338 SHARPE LINES INC Buyer:A02 VALERIE BAGALA
1024877-0000 1 4.0000 140.4800 561.92 USD 561.92
03A 10/26/10 213 31000 - 10810- 235228 SYR 1ML AMBER W/ TIP CAP
Vendor: 2472 BAXA CORP Buyer:A02 VALERIE BAGALA
1000066-0000 1 .9000 241.6845 217.52 USD 217.52
03A 05/19/10 213 41000 - 10810- 145155 NDL JAMSHIDI 11GA DISP STR
Vendor: 2686 CARDINAL HEALTH 200 INC Buyer:A04 DEAN SCHUMACHER
--------- ------------------------- ------ ---------------------
------------------
A complete record in the report expands to more than 1 line.
Each record begin with a line starting with 4 to 7 digits and -0000
each record ends with a line contains the word Vendor:'
I need to extract some elements from each record and combine the extracted data
in one new record (1 line per record)
so that later I can bcp the new data into a table/database.
Here is my code:
#!/usr/bin/perl -w
use strict;
my $PO_file = "/home/sybase/scripts125/pl/test/simple_SH135.dat";
open(IN,"$PO_file") || die "Fail open $PO_file";
my ($line, @part1, @part2, @part3, $rec_part1, $rec_part2, $rec_part3,
$complete_record);
print "Need to extract the following data\n";
print
"PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|Description|VendorID\n";
while ($line=<IN>) {
chomp $line;
$line =~ s/^\s+//go;
$line =~ s/\s+/ /go;
$line =~ s/,//go;
# Part1
# Data Example: 1023112-0000 1 1.0000 102.3419 102.34 USD
102.34
########################################################################Part1:
if ($line =~ /-0000/){
#if ($line =~ /^\d{7}-0000/){
my @part1 = split(/\s+/,$line);
my $PONumber = $part1[0];
my $Quantity = $part1[2];
my $UnitCost = $part1[3];
my $ExtAmt = $part1[6];
$rec_part1 = join "|",($PONumber, $Quantity, $UnitCost, $ExtAmt);
}# end part 1
### Part2:
#Data Example: 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART
LN
######################################################################
if ($line =~ /(\d{2}\/\d{2}\/\d{2})/){
#if ($line =~ /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A 10/13/10 213
31000 #why not work?
my @part2 = split(/\s+/,$line,8); # last group has multiple words is
descriptions
my $PurchFr = $part2[0] ;
my $Date = $part2[1] ;
my $Company = $part2[2] ;
my $Dept = $part2[3] ;
my $Acct = $part2[5] ;
my $ItemNumber = $part2[6] ;
my $Desc = $part2[7];
#$rec_part2 = join '|',($part2[1],$part2[2] ,$part2[6],$part2[7]);
$rec_part2 = join '|',($Date,$Company ,$ItemNumber,$Desc);
#print "rec_part2: $rec_part2 \n";
}# end Part2
##Part3: VendorID
# Data Example: Vendor: 1288 ALIMED Buyer:A02 VALERIE
BAGALA
######################################################################
if ($line =~ /^Vendor:/){
my @part3 = split(/\s+/,$line);
my $VendorID = $part3[1];
$rec_part3 = $VendorID;
} #end part3
$complete_record = "$rec_part1".'|'."$rec_part2".'|'."$rec_part3";
print "$complete_record\n";
}#end while
My questions:
I expect my program to produce 4 records, like:
PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|Description|VendorID
1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMBD ART LN|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP
STR|2686
However it produces unwanted results, like below. Any pointers would be greatly
appreciated.
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD ART LN|1288
1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMBD ART LN|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS CRUSHER|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR 1ML AMBER W/ TIP
CAP|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP
STR|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP
STR|2686
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP
STR|2686
My thoughts are of a much simpler solution. I hope the program below helps.
Rob
use strict;
use warnings;
my $po_file = '/home/sybase/scripts125/pl/test/simple_SH135.dat';
while (<$in>) {
chomp;
next unless /-0000/;
my @record = (split)[0, 2, 3, 6];
$_ = <DATA>;
push @record, (split)[1, 2, 6, 7];
$_ = <DATA>;
die "No 'Vendor' found in record" unless /Vendor:/;
push @record, (split)[1];
print join('|', @record), "\n";
}
**OUTPUT**
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMBD|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL|2686
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/