On Fri, Jul 26, 2002 at 02:02:54PM -0400, Jeff 'japhy' Pinyan wrote:
>
> It's best to come up with a hash of strings and replacements:
>
> my %rep = qw(
> ldblquote rt_quote
> rdblquote lt_quote
> emdash em_dash
> rquote r_quote
> tab tab
> lquote l_quote
> );
>
> Then create a regex:
>
> my $rx = join "|", map quotemeta, keys %rep;
>
> Then use it in a larger regex:
>
> $source =~ s[\\($rx) ][<$rep{$1}/>]g;
>
> Ta da! ONLY one pass through the string.
This looks really nice! I'll have to test it with a timer. I'd imgaine
it would be much faster because you only make one pass through. On
the other hand, doesn't perl have to recompile the $rx each time because
it is a variable? After all, $rx might have changed--though in my case,
it definitely wouldn't have.
> You'll need to beef up the hash
> and the regex as needed, if not everything is '\\IN ' and not every
> replacement is '<OUT/>'.
As a matter of fact, the expressions take only two forms:
\emdash Regular text
\'9oeRegular text
Some of the expressions (the ones for foreign characters) don't have a
space after the control word. So I think:
$source =~ s[\\($rx)(?:\s)*][<$rep{$1}/>]g;
Should work?
On another note, my script is 1100 lines long, and seems to work.
It seems like there is a need for converting RTF to XML, since the perl
convertors availble only convert to HTML.
I would like to release the script at some point, but when I get tips
off this site, I realize how much better an experienced perl programmer
could do things. It would be much more effective to work on this as part
of a team, but I've never done something like this before. I guess I'll
post feelers on other mailing lists.
(This really should be another thread!)
Thanks!
Paul
--
************************
*Paul Tremblay *
*[EMAIL PROTECTED]*
************************
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]