Matt Simonsen wrote at Tue, 20 Aug 2002 00:09:10 +0200:

> I'm wondering what people would suggest as the best way to split this so
> it respects the "" and [] as fields yet doesn't kill performance?
> 
> 
> 1.2.3.4 - - [15/Aug/2002:06:43:39 -0700] "GET /usr/123 HTTP/1.0" 200
> 38586 "http://www.careercast.com/js.php"; "Mozilla/4.0 (compatible; MSIE
> 5.5; Windows 98)" - www.careercast.com
> 
> 
> 
> In other words I want 
> 
> hash[0] = 1.2.3.4
> hash[1] = -
> hash[2] = -
> hash[3] = 15/Aug/2002:06:43:39 -0700
> hash[4] = 200
> hash[...]
> hash[10] = www.careercast.com

Taking the whole string in $string

my @parts = $string =~ /([^\s"\[]+ | ".*?" | \[.*?\])/gx;
print join "\n", @parts;

prints at me:

1.2.3.4
-
-
[15/Aug/2002:06:43:39 -0700]
"GET /usr/123 HTTP/1.0"
200
38586
"http://www.careercast.com/js.php";
"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"
-
www.careercast.com


This solution only works unless there aren't quotes in [ ... ]

I didn't make a performance test, but I'm afraid this solution is
slower than John's one, but a little bit more general.
A speed improvement could be to make with a lookahead:

/((?=\S) [^\s"\[]+ | ".*?" | \[.*?\])/gx;

and to remove the .*? parts:

/((?=\S) [^\s"\[]+ | "[^"]+" | \[[^\]]+\])/gx;

[untested]


Best Wishes,
Janek


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to