Re: performance of splice v. split

John W. Krahn Sat, 15 Mar 2003 17:34:43 -0800

Serge Shakarian wrote:
> 
> thank you for your reply. I think I should have been a bit more descriptive
> in what I am trying to do.
> 
> The log file is about 1.2 - 1.5 GB per 24 hours. Each record is look like
> this:
> Mon 10 Mar 2003 12:07:55 PM EST (1047316075.181206)
> [EMAIL PROTECTED]:<16.0>:L4:S14
> req=/script-root-tran/accounts/MyAccounts sid=1433628848 tid=295244861 mod=1
> err=0 hit=0 fix=0 pri=8 qul=0 jah=0 tql=0 tet=24.009162 qut=0.000228
> mut=0.000012 sct=0.000000 set=24.005668 gct=0.000000
> 
> This log is a Broadvision performance log. The reason for using split is a
> maintenance issue and I would be open to using regex as a solution as these
> logs are from a production box and the script would be executed on the prod
> box. Here is a jist of my code:
> 
> while(<PERFLOG>)
> {
>     chomp;
>     if(/\breq=/o){ &collect_stats; }


First, calling a sub is going to be slower than writing the code in the
loop.  Second, calling a sub with the ampersand prefix is frowned upon
in Perl5.  Third, using global variables in a sub is frowned upon,
better to pass the variables explicitly.  Fourth, using the /o option on
the match operator only applies if there is variable interpolation in
the regular expression.  In other words, it doesn't apply here.


> }
> 
> [snip code]


I would probably write it like this:

while ( <PERFLOG> ) {
    next unless /\breq=(\S+)/;
    # get the script/jsp name
    my $script_name = $1;
    # get"(1047316075.181206) " which is
    # time since 00:00:00 UTC January 1, 1970
    my ($script_time) = /\(([\d.]+)\)/;

    # compare the time stamps
    if ( $opt_t ) {
        $start_time ||= $script_time;

        if ( $start_time + $report_interval < $script_time ) {
            $start_time = 0;
            output();
        }
    }

    #============================================================
    # now, all thats left are the name/value pairs 
    # store every element in a hash. There will be a
    # hash for every script that is logged and a master
    # hash that stores each script has. The key for each
    # script hash is the script name.
    #
    # These are the name/value pairs
    # SID= session id
    # TID= transaction id
    # MOD= IM state ( 1=normal, 2=overload, <=0 drain)
    # ERR= error ( 0=no error, 1=error)
    # HIT= request cache hit ( 0=miss, 1=hit)
    # FIX= fix up script hit ( 0=miss, 1=hit)
    # PRI= request priority  ( range: 0-15, 0=highest, 15=lowest)
    # QUL= queue length of the priority
    # JAH= jobs ahead in queues
    # TQL= total jobs in all queues
    # TET= total execution time
    # QUT= queuing time
    # MUT= mutex waiting time
    # SCT= script compile time
    # SET= script execution time
    # GCT= garbage collection time

    my %hash = map /^(\S+)=([\d.]+)$/, split;

    for my $key ( keys %hash ) {
        $map_store{$script_name}{$key} += $hash{$key};
        }

    # increment the number of times this script
    # has been processed
    $map_store{$script_name}{occurence}++;

    $lines_processed++;
    }



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: performance of splice v. split

Reply via email to