On Fri, Jul 26, 2002 at 02:02:54PM -0400, Jeff 'japhy' Pinyan wrote: > > It's best to come up with a hash of strings and replacements: > > my %rep = qw( > ldblquote rt_quote > rdblquote lt_quote > emdash em_dash > rquote r_quote > tab tab > lquote l_quote > ); > > Then create a regex: > > my $rx = join "|", map quotemeta, keys %rep; > > Then use it in a larger regex: > > $source =~ s[\\($rx) ][<$rep{$1}/>]g; > > Ta da! ONLY one pass through the string.
This looks really nice! I'll have to test it with a timer. I'd imgaine it would be much faster because you only make one pass through. On the other hand, doesn't perl have to recompile the $rx each time because it is a variable? After all, $rx might have changed--though in my case, it definitely wouldn't have. > You'll need to beef up the hash > and the regex as needed, if not everything is '\\IN ' and not every > replacement is '<OUT/>'. As a matter of fact, the expressions take only two forms: \emdash Regular text \'9oeRegular text Some of the expressions (the ones for foreign characters) don't have a space after the control word. So I think: $source =~ s[\\($rx)(?:\s)*][<$rep{$1}/>]g; Should work? On another note, my script is 1100 lines long, and seems to work. It seems like there is a need for converting RTF to XML, since the perl convertors availble only convert to HTML. I would like to release the script at some point, but when I get tips off this site, I realize how much better an experienced perl programmer could do things. It would be much more effective to work on this as part of a team, but I've never done something like this before. I guess I'll post feelers on other mailing lists. (This really should be another thread!) Thanks! Paul -- ************************ *Paul Tremblay * *[EMAIL PROTECTED]* ************************ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]