On Wednesday 18 December 2002 4:22 pm, Rob Dixon wrote: > Gary > > Unfortunately \D will match anything which is not a digit, including white > space and punctuation as well as alphabetic charatcers. Also, the initial > .* will happily consume the first letter in your postcode, allowing \D{1,2} > to succeed with a single letter. Try using this regex: > > /^(.*)\b([A-Z]{1,2}\d{1,2}\s*\d[A-Z]{2})\b/ > > \b is a zero-width match on a word boundary, where a word is any letter or > digit, or an underscore. > > HTH, > > Rob
Thanks for that Rob, worked a treat. > > > ----- Original Message ----- > From: "Gary Stainburn" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Wednesday, December 18, 2002 3:37 PM > Subject: postcode regex problem - again. > > > Hi folks. > > > > I'm trying to locate the UK postcode located somewhere inside an address, > > extract it, and place it in a specific position. The data file I'm > > processing is generated by a COBOL file and is fixed-length format text. > > > > I've almost got it, but as you can see from the output, it's not quite > > right. > > > The split's happening after the first letter not before it. > > > > The postcode is of the format XX99 9XX where the space is optional and > > the first 'XX' and the '99' may be single character. The 9XX is always 1 > > digit followed by 2 letters. > > > > e.g. WF10 5QQ, M5 5QQ, > > TT$ pcoderun <slexport.txt 2>&1 >slexport.gps|head > > in=' LS8 5QP','' out=' L','S8 5QP' > > in=' LS9 8HE','' out=' L','S9 8HE' > > in=' LS28 6QW','' out=' L','S28 > > 6QW' in=' LS28 7UH','' out=' > > L','S28 7UH' in='CO DURHAM DL1 2BL','' out='CO DURHAM > > D','L1 2BL' in=' LS11 7NW','' out=' > > L','S11 7NW' in=' BS2 0EQ','' out=' > > B','S2 0EQ' in='LEEDS, LS12 6BN','' out='LEEDS, > > L','S12 6BN' in=' LS11 0DS.','' out=' > > L','S11 > > 0DS' > > > in='PRESTON, PR5 8AT.','' out='PRESTON, P','R5 8AT' > > TT$cat pcoderun > > #!/usr/bin/perl -w > > > > my $template="A40A30A30A30A30A10A9"; # matches COBOL file descripter > > while(<STDIN>) { > > my > > ($head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest)=unpack($template,$_); > > > if ($addr4) { > > splitit(\$addr4,\$pcode); > > } elsif ($addr3) { > > splitit(\$addr3,\$pcode); > > } elsif ($addr2) { > > splitit(\$addr2,\$pcode); > > } > > print > > pack($template,$head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest),"\n"; > > > } > > > > sub splitit { > > my ($line,$pcode)=@_; > > if ($$line=~/^(.*)(\D{1,2}\d{1,2}\s{0,1}\d\D{2})\s*/) { > > print STDERR "in='$$line','$$pcode' out='$1','$2'\n"; > > $$line=$1; > > $$pcode=$2; > > } > > } > > TT$ > > > > -- > > Gary Stainburn > > > > This email does not contain private or confidential material as it > > may be snooped on by interested government parties for unknown > > and undisclosed purposes - Regulation of Investigatory Powers Act, 2000 > > > > > > -- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] -- Gary Stainburn This email does not contain private or confidential material as it may be snooped on by interested government parties for unknown and undisclosed purposes - Regulation of Investigatory Powers Act, 2000 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]