Re: xml problem

Randal L. Schwartz Wed, 20 Jun 2001 08:49:37 -0700
>>>>> "Chas" == Chas Owens <[EMAIL PROTECTED]> writes:

Chas> #replace anything not in lower ASCII, Damn Americans
Chas> for (my $i = 0; $i < length($file); $i++) {
Chas>         my $char = ord(substr($file, $i, 1));

Chas>         if ($char > 128) {
Chas>                 print "replacing ", chr($char), " with &#$char;\n";
Chas>                 substr($file, $i, 1) = "&#$char;";
Chas>         }
Chas> }

Not withstanding my other comment, this code is also inefficient,
both tactically and strategically.

Take for example the string "\200abc"...

After you replace "\200" with "&200;", the next character examined
is the "2" that you've just inserted, since you do not bump along
the value of $i appropriately.  Lots of wasted character movements
there.  You really want the next loop to look at "a", not "2".

But more importantly, all those ord()s and substr()s are using
the wrong parts of Perl for basic string processing.  This will
execute much much faster:

  $file =~ s/([\200-\377])/"&#".ord($1).";"/ge;

A regex match here is appropriate.  The "/g" replaces the outer loop,
and the expression provides the replacement text without overlap.

Whenever you think "change string", you should first think "regex
replacement", not anything else.  Rarely will something else be
better. (For "change character to character", think "transliterate".)

The moral of the story is that quick and dirty... may end up
being just dirty. :-)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: xml problem

Reply via email to