Ram, thank you for your reply. I think I should have been a bit more descriptive in what I am trying to do.
The log file is about 1.2 - 1.5 GB per 24 hours. Each record is look like this: Mon 10 Mar 2003 12:07:55 PM EST (1047316075.181206) [EMAIL PROTECTED]:<16.0>:L4:S14 req=/script-root-tran/accounts/MyAccounts sid=1433628848 tid=295244861 mod=1 err=0 hit=0 fix=0 pri=8 qul=0 jah=0 tql=0 tet=24.009162 qut=0.000228 mut=0.000012 sct=0.000000 set=24.005668 gct=0.000000 This log is a Broadvision performance log. The reason for using split is a maintenance issue and I would be open to using regex as a solution as these logs are from a production box and the script would be executed on the prod box. Here is a jist of my code: while(<PERFLOG>) { chomp; if(/\breq=/o){ &collect_stats; } } sub collect_stats() { my (@hash_keys, $element, $key, $value, $script_name, $script_time, $discard); @hash_keys = split; # ----- cut the unneeded elements out -------- # "Mon 10 Mar 2003 12:07:55 PM EST" splice (@hash_keys, 0, 7); # get the exec time # get"(1047316075.181206) " which is # time since 00:00:00 UTC January 1, 1970 and # "[EMAIL PROTECTED]:<16.0>:L4:S14" which is # discarded. ($script_time, $discard) = splice(@hash_keys, 0, 2); $script_time =~ s/[()]//go; # compare the time stamps if($opt_t) { $start_time = $script_time if($start_time eq 0); if (($start_time + $report_interval) < $script_time) { $start_time = 0; &output; } } # get the script/jsp name and # strip the 'req= $_ = splice(@hash_keys,0,1); s/req=//o; $script_name = $_; #============================================================ # now, all thats left are the name/value pairs # store every element in a hash. There will be a # hash for every script that is logged and a master # hash that stores each script has. The key for each # script hash is the script name. # # These are the name/value pairs # SID= session id # TID= transaction id # MOD= IM state ( 1=normal, 2=overload, <=0 drain) # ERR= error ( 0=no error, 1=error) # HIT= request cache hit ( 0=miss, 1=hit) # FIX= fix up script hit ( 0=miss, 1=hit) # PRI= request priority ( range: 0-15, 0=highest, 15=lowest) # QUL= queue length of the priority # JAH= jobs ahead in queues # TQL= total jobs in all queues # TET= total execution time # QUT= queuing time # MUT= mutex waiting time # SCT= script compile time # SET= script execution time # GCT= garbage collection time foreach $element(@hash_keys) { ($key, $value) = split /=/, $element; $map_store{$script_name}{$key} += $value; } # increment the number of times this script # has been processed $map_store{$script_name}{"occurence"} += 1; $lines_processed++; } -----Original Message----- From: Ramprasad [mailto:[EMAIL PROTECTED] Sent: Saturday, March 15, 2003 1:29 AM To: [EMAIL PROTECTED] Subject: Re: performance of splice v. split Serge Shakarian wrote: > <?XML:NAMESPACE PREFIX = O /> > > When parsing a large log file which one would be better for performance > splice or split? Every line is in a standard format and I would be using > splice/split to get 12 out of 20 space separeted words in each line. Thanks > in advance. > Serge > splice is used on arrays and split on scalars so if your are using splice you will anyway be using split to create an array Personally I think writing a regex ( probably a longish one with 20cols ) will give you best results because you are picking only few cols out of many like ( take for eg just 5 cols and picking up 3 ) while(<LOG>){ my($a,$b,$c) = (/^(.*?) (.*?) .*? (.*?) .*$/) .... .... .... } Just try it out and let me know Ram -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]