RE: performance of splice v. split

SHAKARIAN,SERGE (HP-NewJersey,ex1) Sat, 15 Mar 2003 14:44:39 -0800

Ram,

thank you for your reply. I think I should have been a bit more descriptive
in what I am trying to do.


The log file is about 1.2 - 1.5 GB per 24 hours. Each record is look like
this:
Mon 10 Mar 2003 12:07:55 PM EST (1047316075.181206)
[EMAIL PROTECTED]:<16.0>:L4:S14
req=/script-root-tran/accounts/MyAccounts sid=1433628848 tid=295244861 mod=1
err=0 hit=0 fix=0 pri=8 qul=0 jah=0 tql=0 tet=24.009162 qut=0.000228
mut=0.000012 sct=0.000000 set=24.005668 gct=0.000000

This log is a Broadvision performance log. The reason for using split is a
maintenance issue and I would be open to using regex as a solution as these
logs are from a production box and the script would be executed on the prod
box. Here is a jist of my code:

while(<PERFLOG>)
{
    chomp;
    if(/\breq=/o){ &collect_stats; }
}

sub collect_stats()
{
    my (@hash_keys, $element, $key, $value, $script_name, $script_time,
$discard);

    @hash_keys = split;
 
    # ----- cut the unneeded elements out --------
    # "Mon 10 Mar 2003 12:07:55 PM EST"        
    splice (@hash_keys, 0, 7);
    
    # get the exec time
    # get"(1047316075.181206) " which is
    # time since 00:00:00 UTC January 1, 1970 and
    # "[EMAIL PROTECTED]:<16.0>:L4:S14" which is
    # discarded.
    ($script_time, $discard) = splice(@hash_keys, 0, 2);
    $script_time =~ s/[()]//go;    
    
    # compare the time stamps
    if($opt_t)
    {
        $start_time = $script_time if($start_time eq 0);

        if (($start_time + $report_interval) < $script_time)
        {
            $start_time = 0;
            &output;
        }      
    }
    
    # get the script/jsp name and
    # strip the 'req=
    $_ = splice(@hash_keys,0,1); s/req=//o;
    $script_name = $_;
    
    #============================================================
    # now, all thats left are the name/value pairs 
    # store every element in a hash. There will be a
    # hash for every script that is logged and a master
    # hash that stores each script has. The key for each
    # script hash is the script name.
    #
    # These are the name/value pairs
    # SID= session id
    # TID= transaction id
    # MOD= IM state ( 1=normal, 2=overload, <=0 drain)
    # ERR= error ( 0=no error, 1=error)
    # HIT= request cache hit ( 0=miss, 1=hit)
    # FIX= fix up script hit ( 0=miss, 1=hit)
    # PRI= request priority  ( range: 0-15, 0=highest, 15=lowest)
    # QUL= queue length of the priority
    # JAH= jobs ahead in queues
    # TQL= total jobs in all queues
    # TET= total execution time
    # QUT= queuing time
    # MUT= mutex waiting time
    # SCT= script compile time
    # SET= script execution time
    # GCT= garbage collection time
    foreach $element(@hash_keys)
    {   
        ($key, $value) = split /=/, $element;
        $map_store{$script_name}{$key} += $value;
    }
    # increment the number of times this script
    # has been processed
    $map_store{$script_name}{"occurence"} += 1;
    
    $lines_processed++;    
}

-----Original Message-----
From: Ramprasad [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 15, 2003 1:29 AM
To: [EMAIL PROTECTED]
Subject: Re: performance of splice v. split


Serge Shakarian wrote:
> <?XML:NAMESPACE PREFIX = O />
> 
> When parsing a large log file which one would be better for performance
> splice or split? Every line is in a standard format and I would be using
> splice/split to get 12 out of 20 space separeted words in each line.
Thanks
> in advance.
> Serge
> 

splice is used on arrays and split on scalars
so if your are using splice  you will anyway be using split to create an 
array
Personally I think writing a regex ( probably a longish one with 20cols 
) will give you best results because you are picking only few cols out 
of many

like ( take for eg  just 5 cols and picking up 3 )

while(<LOG>){
     my($a,$b,$c) = (/^(.*?) (.*?) .*? (.*?) .*$/)
....
....
....


}


Just try it out and let me know
Ram




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: performance of splice v. split

Reply via email to