Hi Derek. Derek Romeyn wrote: > Using your idea I ended up with data like this. Which is odd because > the database should only include 400 and 500 type errors. > [snip] > > 404 24.54.175.153 - - [11/Mar/2003:07:48:37 -0800] "GET > /e/t/invest/img/spacer.gif HTTP/1.1" 404 0 "https:// > 370 209.91.198.57 - - [11/Mar/2003:07:48:24 -0800] "GET > /e/t/search/aaa?qmenu=2&sym=dyn, intc HTTP/1.0" 400 370 > 526 66.196.65.24 - - [11/Mar/2003:07:54:32 -0800] "GET > /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/5.0 (Slur > 178 167.127.163.141 - isklvjyy [11/Mar/2003:08:02:46 -0800] "GET /e/t/aaa > HTTP/1.1" 500 178 "-" "Mozilla/4.0 (compatible > 404 68.39.167.38 - - [11/Mar/2003:08:06:34 -0800] "GET /e/t/aaa/img/spacer.gif > HTTP/1.1" 404 0 "https://us.etrade.com/e/ > 526 65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET > /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en > 526 65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET > /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en > > The 404's were right but the rest took the second group of numbers > instead of the needed first. > > This is how my code looked: > > my $code,$msg; > foreach (@RAW_DATA) { > $code = $1 if m|HTTP.*\s+(\d{3})|g;
Here's your problem. You're searching for 'HTTP', followed by any number of any character, followed by one or more whitespace characters and three digits. Because the '.*' will eat up as much as it can, the captured digits will be the /last/ occurrence of three digits following a space. If you change '.*' into '.*?' it will match as few characters as possible and you'll get the three digits you want. Also, do you need the /g modifier on this search? I don't think it can make any difference in this context. I'd recommend using /x though so that you can lay it out a little more visibly. > ($timestamp, $msg) = split(/\t/); I'm not clear from your data which fields you're extracting, but I assume this split works as you haven't said otherwise. > if (!$code) { > print "NEXT\n"; > next; > } Surely you really want to 'next' if the initial match fails? > print "$code\t$msg\n"; > $code = 0; > } > > I did manage to get a version of George's to work. Still interested > in trying all variations though. The following corrects all my points above. Use it if you like it. HTH, Rob foreach (@RAW_DATA) { unless ( m| HTTP.*? \s+ (\d{3}) |x ) { print "NEXT\n"; next; } my $code = $1; my ($timestamp, $msg) = split(/\t/); print "$code\t$msg\n"; } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]