Gary Stainburn wrote:
> 
> Hi folks.
> 
> I'm trying to locate the UK postcode located somewhere inside an address,
> extract it, and place it in a specific position.  The data file I'm
> processing is generated by a COBOL file and is fixed-length format text.
> 
> I've almost got it, but as you can see from the output, it's not quite right.
> The split's happening after the first letter not before it.
> 
> The postcode is of the format XX99 9XX where the space is optional and the
> first 'XX' and the '99' may be single character. The 9XX is always 1 digit
> followed by 2 letters.
> 
> e.g. WF10 5QQ, M5 5QQ,
> TT$  pcoderun <slexport.txt 2>&1 >slexport.gps|head
> in='                    LS8 5QP','' out='                    L','S8 5QP'
> in='                    LS9 8HE','' out='                    L','S9 8HE'
> in='                    LS28 6QW','' out='                    L','S28 6QW'
> in='                    LS28 7UH','' out='                    L','S28 7UH'
> in='CO DURHAM           DL1 2BL','' out='CO DURHAM           D','L1 2BL'
> in='                    LS11 7NW','' out='                    L','S11 7NW'
> in='                    BS2 0EQ','' out='                    B','S2 0EQ'
> in='LEEDS,              LS12 6BN','' out='LEEDS,              L','S12 6BN'
> in='                    LS11 0DS.','' out='                    L','S11 0DS'
> in='PRESTON,            PR5 8AT.','' out='PRESTON,            P','R5 8AT'
> TT$cat pcoderun
> #!/usr/bin/perl -w
> 
> my $template="A40A30A30A30A30A10A9"; # matches COBOL file descripter
> while(<STDIN>) {
>   my ($head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest)=unpack($template,$_);
>   if ($addr4) {
>     splitit(\$addr4,\$pcode);
>   } elsif ($addr3) {
>     splitit(\$addr3,\$pcode);
>   } elsif ($addr2) {
>     splitit(\$addr2,\$pcode);
>   }
>   print pack($template,$head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest),"\n";
> }
> 
> sub splitit {
>   my ($line,$pcode)=@_;
>   if ($$line=~/^(.*)(\D{1,2}\d{1,2}\s{0,1}\d\D{2})\s*/) {
>     print STDERR "in='$$line','$$pcode' out='$1','$2'\n";
>     $$line=$1;
>     $$pcode=$2;
>   }
> }


Along with the other wonderful suggestions, you could write it like
this:

my $template = 'A40A30A30A30A30A10A9'; # matches COBOL file descripter
while ( <STDIN> ) {
  my ( $head, $addr1, $addr2, $addr3, $addr4, $pcode, $rest ) = unpack
$template, $_;
  for ( $addr4, $addr3, $addr2, $addr1 ) {
    if ( s/\s*([A-Z]{1,2}\d\d?\s?\d[A-Z]{2})\s*$// ) {
      $pcode = $1;
      last;
    }
  }
  print
pack($template,$head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest),"\n";
}


:-)

John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to