On Wednesday 18 December 2002 4:22 pm, Rob Dixon wrote:
> Gary
>
> Unfortunately \D will match anything which is not a digit, including white
> space and punctuation as well as alphabetic charatcers. Also, the initial
> .* will happily consume the first letter in your postcode, allowing \D{1,2}
> to succeed with a single letter. Try using this regex:
>
>     /^(.*)\b([A-Z]{1,2}\d{1,2}\s*\d[A-Z]{2})\b/
>
> \b is a zero-width match on a word boundary, where a word is any letter or
> digit, or an underscore.
>
> HTH,
>
> Rob

Thanks for that Rob, worked a treat.

>
>
> ----- Original Message -----
> From: "Gary Stainburn" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Wednesday, December 18, 2002 3:37 PM
> Subject: postcode regex problem - again.
>
> > Hi folks.
> >
> > I'm trying to locate the UK postcode located somewhere inside an address,
> > extract it, and place it in a specific position.  The data file I'm
> > processing is generated by a COBOL file and is fixed-length format text.
> >
> > I've almost got it, but as you can see from the output, it's not quite
>
> right.
>
> > The split's happening after the first letter not before it.
> >
> > The postcode is of the format XX99 9XX where the space is optional and
> > the first 'XX' and the '99' may be single character. The 9XX is always 1
> > digit followed by 2 letters.
> >
> > e.g. WF10 5QQ, M5 5QQ,
> > TT$  pcoderun <slexport.txt 2>&1 >slexport.gps|head
> > in='                    LS8 5QP','' out='                    L','S8 5QP'
> > in='                    LS9 8HE','' out='                    L','S9 8HE'
> > in='                    LS28 6QW','' out='                    L','S28
> > 6QW' in='                    LS28 7UH','' out='                   
> > L','S28 7UH' in='CO DURHAM           DL1 2BL','' out='CO DURHAM          
> > D','L1 2BL' in='                    LS11 7NW','' out='                   
> > L','S11 7NW' in='                    BS2 0EQ','' out='                   
> > B','S2 0EQ' in='LEEDS,              LS12 6BN','' out='LEEDS,             
> > L','S12 6BN' in='                    LS11 0DS.','' out='                 
> >   L','S11
>
> 0DS'
>
> > in='PRESTON,            PR5 8AT.','' out='PRESTON,            P','R5 8AT'
> > TT$cat pcoderun
> > #!/usr/bin/perl -w
> >
> > my $template="A40A30A30A30A30A10A9"; # matches COBOL file descripter
> > while(<STDIN>) {
> >   my
>
> ($head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest)=unpack($template,$_);
>
> >   if ($addr4) {
> >     splitit(\$addr4,\$pcode);
> >   } elsif ($addr3) {
> >     splitit(\$addr3,\$pcode);
> >   } elsif ($addr2) {
> >     splitit(\$addr2,\$pcode);
> >   }
> >   print
>
> pack($template,$head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest),"\n";
>
> > }
> >
> > sub splitit {
> >   my ($line,$pcode)=@_;
> >   if ($$line=~/^(.*)(\D{1,2}\d{1,2}\s{0,1}\d\D{2})\s*/) {
> >     print STDERR "in='$$line','$$pcode' out='$1','$2'\n";
> >     $$line=$1;
> >     $$pcode=$2;
> >   }
> > }
> > TT$
> >
> > --
> > Gary Stainburn
> >
> > This email does not contain private or confidential material as it
> > may be snooped on by interested government parties for unknown
> > and undisclosed purposes - Regulation of Investigatory Powers Act, 2000
> >
> >
> > --
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]

-- 
Gary Stainburn
 
This email does not contain private or confidential material as it
may be snooped on by interested government parties for unknown
and undisclosed purposes - Regulation of Investigatory Powers Act, 2000     


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to