Re: fastest way to substitute

Paul Tremblay Fri, 26 Jul 2002 22:02:55 -0700

On Fri, Jul 26, 2002 at 02:02:54PM -0400, Jeff 'japhy' Pinyan wrote:

Jeff:


I ran a benchmark on your method, and it actually proved slower. I ran a
test line 10,000 times. Directly substituting each line took 38 wall
seconds. Using your method took 60. 

I included my test below, in case I made a mistake.

Thanks

Paul

##################################################

#!/usr/bin/perl #-w
use strict;
use Benchmark;
my $loopcount=10_000;
##my $loopcount = 1;

my $line = "\\tab Paul & Tom said \\ldblquote we are brothers \\rdblquote \\emdash 
which was the truth. \\e4 \\e5 \\e6 text text \\e7 text \\e8 text \\e9 text 
text text \\e10 text \\e11 text \\e12 text \\e13 text \\e14 text text \\e15  text 
\\e16 text \\e17 text \\e18 text \\e19 text \\e20 text e\\21 text \\e21 
text \\e22 text \\e23 text \\e24 text \\e25 text \\e26 text \\e27 text \\e28 
text  \\e29 text \\e30 ";

sub each_line{
        $_ = $line;
                         
        s/&/&amp;/g;
        s/</&lt;/g;
        s/>/&gt;/g;
        s/\\ldblquote /<rt_quote\/>/g;
        s/\\rdblquote /<lt_quote\/>/g;
        s/\\emdash /<em_dash\/>/g;
        s/\\rquote /<r_quote\/>/g;
        s/\\tab /<tab\/>/g;
        s/\\lquote /<l_quote\/>/g;
        s/\\e4 /<e4\/>/g;
        s/\\e5 /<e5\/>/g;
        s/\\e6 /<e6\/>/g;
        s/\\e7 /<e7\/>/g;
        s/\\e8 /<e8\/>/g;
        s/\\e9 /<e9\/>/g;
        s/\\e10 /<e10\/>/g;
        s/\\e11 /<e12\/>/g;
        s/\\e13 /<e13\/>/g;
        s/\\e14 /<e14\/>/g;
        s/\\e15 /<e15\/>/g;
        s/\\e16 /<e16\/>/g;
        s/\\e17 /<e17\/>/g;
        s/\\e18 /<e18\/>/g;
        s/\\e19 /<e19\/>/g;
        s/\\e20 /<e20\/>/g;
        s/\\e21 /<e21\/>/g;
        s/\\e22 /<e22\/>/g;
        s/\\e22 /<e22\/>/g;
        s/\\e23 /<e23\/>/g;
        s/\\e24 /<e24\/>/g;
        s/\\e25 /<e25\/>/g;
        s/\\e26 /<e26\/>/g;
        s/\\e27 /<e27\/>/g;
        s/\\e28 /<e28\/>/g;
        s/\\e29 /<e29\/>/g;
        s/\\e30 /<e30\/>/g;
        ##print $_;

}
sub hash_method{
        my $line = $line;
        my %rep = qw(
                ldblquote   rt_quote
                rdblquote   lt_quote
                emdash      em_dash               
                rquote      r_quote
                tab         tab    
                lquote      l_quote
                &       &amp
                <       &lt;
                >       &gt;
                e4      e4
                e5      e5
                e6      e6
                e7      e7
                e8      e8
                e9      e9
                e10     e10
                e11     e11
                e12     e12
                e13     e13
                e14     e14
                e15     e15
                e16     e16
                e17     e17
                e18     e18
                e19     e19
                e20     e20
                e21     e21
                e22     e22
                e23     e23
                e24     e24
                e25     e25
                e26     e26
                e27     e27
                e28     e28
                e29     e29
                e30     e30
        );


  my $rx = join "|", map quotemeta, keys %rep;


  $line =~ s[\\($rx) ][<$rep{$1}/>]go;
  ##print "$line\n";

 }




#----------------------
# the main loop section

timethese $loopcount, {
    each_line => \&each_line,
    hash_method => \&hash_method,

};



# end of the world as I knew it [EMAIL PROTECTED] all rights reserved

###############################################
> 
> On Jul 26, Paul Tremblay said:
> 
> >Is there a quicker way to substitute an item in a line than reading the
> >line in each time?
> >
> >I am writing a script to convert RTF to XML. One part of the script
> >involves simple substitution, like this:
> >
> >s/\\ldblquote /<rt_quote\/>/g;
> >s/\\rdblquote /<lt_quote\/>/g;
> >s/\\emdash /<em_dash\/>/g;
> >s/\\rquote /<r_quote\/>/g;
> >s/\\tab /<tab\/>/g;
> >s/\\lquote /<l_quote\/>/g;
> 
> It's best to come up with a hash of strings and replacements:
> 
>   my %rep = qw(
>     ldblquote rt_quote
>     rdblquote lt_quote
>     emdash    em_dash
>     rquote    r_quote
>     tab               tab
>     lquote    l_quote
>   );
> 
> Then create a regex:
> 
>   my $rx = join "|", map quotemeta, keys %rep;
> 
> Then use it in a larger regex:
> 
>   $source =~ s[\\($rx) ][<$rep{$1}/>]g;
> 
> Ta da!  ONLY one pass through the string.  You'll need to beef up the hash
> and the regex as needed, if not everything is '\\IN ' and not every
> replacement is '<OUT/>'.
> 
> -- 
> Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
> RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
> ** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
> <stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
> [  I'm looking for programming work.  If you like my work, let me know.  ]

-- 

************************
*Paul Tremblay         *
*[EMAIL PROTECTED]*
************************

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: fastest way to substitute

Reply via email to