Matt Simonsen wrote at Tue, 20 Aug 2002 00:09:10 +0200: > I'm wondering what people would suggest as the best way to split this so > it respects the "" and [] as fields yet doesn't kill performance? > > > 1.2.3.4 - - [15/Aug/2002:06:43:39 -0700] "GET /usr/123 HTTP/1.0" 200 > 38586 "http://www.careercast.com/js.php" "Mozilla/4.0 (compatible; MSIE > 5.5; Windows 98)" - www.careercast.com > > > > In other words I want > > hash[0] = 1.2.3.4 > hash[1] = - > hash[2] = - > hash[3] = 15/Aug/2002:06:43:39 -0700 > hash[4] = 200 > hash[...] > hash[10] = www.careercast.com
Taking the whole string in $string my @parts = $string =~ /([^\s"\[]+ | ".*?" | \[.*?\])/gx; print join "\n", @parts; prints at me: 1.2.3.4 - - [15/Aug/2002:06:43:39 -0700] "GET /usr/123 HTTP/1.0" 200 38586 "http://www.careercast.com/js.php" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" - www.careercast.com This solution only works unless there aren't quotes in [ ... ] I didn't make a performance test, but I'm afraid this solution is slower than John's one, but a little bit more general. A speed improvement could be to make with a lookahead: /((?=\S) [^\s"\[]+ | ".*?" | \[.*?\])/gx; and to remove the .*? parts: /((?=\S) [^\s"\[]+ | "[^"]+" | \[[^\]]+\])/gx; [untested] Best Wishes, Janek -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]