Thanks for the information.  That was much more than I expected.

You right about the T line.  That was a typo.  The T is in the firth
position of the last line of each order block.

As far as your follow up question on the B lines, "only line with a B in the
beginning in set?," I'm not sure if I understand.  If you mean that there
will only be 1 line per order (set of lines A-T) with a B in the first
position, you are correct.

Also, as far as your assumption, "The way I do it assumes that the first and
only first line of each set beginns with an A (and falsly buts that A at the
end of the privious record, but 
doesnt matter for the aim her, does it?),"  I'm not sure what you mean by
this either.  However, it sounds like you have it correct.  Lines that
indicate the beginning of an order block, will only ever start with an A in
the first position.

Finally, the final assumption, that "The push assumes that there are always
exactly 5 records between B and email and that this is the only line with a
B in record (and comes before the lines 
with ADV_".  I think that this is correct.  An example line is
"B,W29116,test,test,[EMAIL PROTECTED],"  The positions are 0,1,2,3,4, so that
equals 5, and it will ALWAYS be five.  Finally, the B line will ALWAYS come
before the ADV_ lines.  This appears to be correct judging that the output
of the script is e-mail addresses.

I tested the script, and I was able to output e-mail addresses.  However,
using the data that I posted, it does not quite output exactly what I need.
Based on this sample of order.csv and the script that you sent me (I added
the line "print @email" to view the output):

  for (my $i=0; $i<=$#fields; $i++){
     if ($fields[$i] eq "B") {$b_index=$i; next;}
     elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4]; 
last;}
print @email;
 ):

A,W29073,Thu Apr 05 15:25:08 2001
B,W29073,Scott,S,[EMAIL PROTECTED],249 Tah Ave,,Sth San Francisco,CA,~US,55555-5555
P,W29073,
X,W29073,Company Name,A,Department Name,San Francisco 00),Purchase Order
Number,254
S,W29073,UPS Next Day Air,Scott S,2 Tah Ave,,Sth San
Francisco,CA,~US,55555-5555
I,W29073,AVHQ_101090lfbl,6.000,$28.50,$171.00,,,,1.00,,2,0
I,W29073,AVHQ_101090xlfbl,4.000,$28.50,$114.00,,,,1.00,,3,0
T,W29073,$285.00,,,,$53.09,$338.09,,10.00,
A,W29101,Wed Apr 11 07:43:33 2001
B,W29101,harold,m,[EMAIL PROTECTED],10 wind ridge parkway,,Atlanta,GA,~US,55555
P,W29101,
X,W29101,Company Name,,Department Name,,Purchase Order Number,10252
S,W29101,UPS Regular Ground,harold m,10 wind ridge
parkway,,Atlanta,GA,~US,55555
I,W29101,ADV_Carb-Natxxl,1.000,$16.50,$16.50,,,,1.50,,4
T,W29101,$17.50,,7.000,$1.23,$9.28,$28.01,,1.50,
A,W29116,Thu Apr 12 11:42:21 2001
B,W29116,test,test,[EMAIL PROTECTED],test,,test,GA,~US,11111
P,W29116,Credit,Offline,Visa,4444444444444444,04/04,,,,
X,W29116,Company Name,,Department Name,,Purchase Order Number,
S,W29116,UPS Regular Ground,test test,test,,test,GA,~US,11111
I,W29116,ADV_1601,1.000,$14.00,$14.00,,,,1.50,,3
T,W29116,$14.00,,7.000,$0.98,$9.94,$24.92,,1.50,

I would expect to see:

[EMAIL PROTECTED]@test.com

However, I see:

[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@masnc.n
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@masnc
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@mas
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@m
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]
@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@masnc.ne
[EMAIL PROTECTED]@masnc.net

What is going wrong?  Am I trying to view the output incorrectly?

Thanks for any additional direction.

Andrew



-----Original Message-----
From: wolf blaum [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 22, 2004 3:28 PM
To: Hughes, Andrew; Perl Beginners Mailing List
Subject: Re: complex data file parsing


hi, 
> I know that each block always starts with and A in the first position of
> the first line and ends with a T in the last position of the last line.

isnt it a T in the first position of the last row of the set?

> I know that the second line starts with a B, and the data in the 5th space
> on this line is the e-mail address, which is what I ultimately want.
> However,...

only line with a B in the bigining in set?

> I am trying to get a list of email addresses for people who have ordered
> products that begin with ADV.  These can appear in multiple I lines.
> Therefore you can never predict how many lines make up 1 order block.

What about:

#! /usr/bin/perl
use strict;
use warnings;
my @email;

open (FH, "<complex.txt") or die "$!";

local $/ = "\nA,"; # make \nA, the record seperator

while(<FH>){       # read the next record
  my @fields = split ",|\n", $_;           # split at , or \n
  my $b_index;                                # 0 for every new record
  for (my $i=0; $i<=$#fields; $i++){
     if ($fields[$i] eq "B") {$b_index=$i; next;}
     elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4]; 
last;}
  }
}

works on the sample you provided.

$/ (see perlvar) is the record seperator, usually \n.

If really T would be the last char i the last row of the set, you could use
"T
\n" as $/
The way I do it assumes that the first and only first line of each set
beginns 
with an A (and falsly buts that A at the end of the privious record, but 
doesnt matter for the aim her, does it?)


The push assumes that there are always exactly 5 records between B and email

and that this is the only line with a B in record (and comes before the
lines 
with ADV_

lot of assumtions.

Im sure there is better ways to do that - might be a strat, though.

> "Online ordering is now available. Visit http://insidersadvantage.com for
> details."

Uh, given from your question, I better dont,, eh?

Good luck, Wolf

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to