Gary
Unfortunately \D will match anything which is not a digit, including white
space and punctuation as well as alphabetic charatcers. Also, the initial .*
will happily consume the first letter in your postcode, allowing \D{1,2} to
succeed with a single letter. Try using this regex:
/^(.*)\b([A-Z]{1,2}\d{1,2}\s*\d[A-Z]{2})\b/
\b is a zero-width match on a word boundary, where a word is any letter or
digit, or an underscore.
HTH,
Rob
----- Original Message -----
From: "Gary Stainburn" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, December 18, 2002 3:37 PM
Subject: postcode regex problem - again.
> Hi folks.
>
> I'm trying to locate the UK postcode located somewhere inside an address,
> extract it, and place it in a specific position. The data file I'm
> processing is generated by a COBOL file and is fixed-length format text.
>
> I've almost got it, but as you can see from the output, it's not quite
right.
> The split's happening after the first letter not before it.
>
> The postcode is of the format XX99 9XX where the space is optional and the
> first 'XX' and the '99' may be single character. The 9XX is always 1 digit
> followed by 2 letters.
>
> e.g. WF10 5QQ, M5 5QQ,
> TT$ pcoderun <slexport.txt 2>&1 >slexport.gps|head
> in=' LS8 5QP','' out=' L','S8 5QP'
> in=' LS9 8HE','' out=' L','S9 8HE'
> in=' LS28 6QW','' out=' L','S28 6QW'
> in=' LS28 7UH','' out=' L','S28 7UH'
> in='CO DURHAM DL1 2BL','' out='CO DURHAM D','L1 2BL'
> in=' LS11 7NW','' out=' L','S11 7NW'
> in=' BS2 0EQ','' out=' B','S2 0EQ'
> in='LEEDS, LS12 6BN','' out='LEEDS, L','S12 6BN'
> in=' LS11 0DS.','' out=' L','S11
0DS'
> in='PRESTON, PR5 8AT.','' out='PRESTON, P','R5 8AT'
> TT$cat pcoderun
> #!/usr/bin/perl -w
>
> my $template="A40A30A30A30A30A10A9"; # matches COBOL file descripter
> while(<STDIN>) {
> my
($head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest)=unpack($template,$_);
> if ($addr4) {
> splitit(\$addr4,\$pcode);
> } elsif ($addr3) {
> splitit(\$addr3,\$pcode);
> } elsif ($addr2) {
> splitit(\$addr2,\$pcode);
> }
> print
pack($template,$head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest),"\n";
> }
>
> sub splitit {
> my ($line,$pcode)=@_;
> if ($$line=~/^(.*)(\D{1,2}\d{1,2}\s{0,1}\d\D{2})\s*/) {
> print STDERR "in='$$line','$$pcode' out='$1','$2'\n";
> $$line=$1;
> $$pcode=$2;
> }
> }
> TT$
>
> --
> Gary Stainburn
>
> This email does not contain private or confidential material as it
> may be snooped on by interested government parties for unknown
> and undisclosed purposes - Regulation of Investigatory Powers Act, 2000
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]