I just finished my first version of a script that converts rtf to xml and was wondering if I went about writing it the wrong way.
My method was to read in one line at a time and split the lines into tokens, and then to read one token at a time. I used this line to split up the text: @tokens = split(/({\\[^\s\n\}{]+)|(\\[^\s\n\\}]+)|(\\\\)|(})|(\\})/,$line); Splitting up the text on my test file of 1.8 megabytes tooks 25 seconds. The entire script took 50 seconds. I had written a previous uncompleted version in which I relied on regular expressions rather than tokens, and this script took only 10 seconds to run. I gave up on this method because it seemed there would always be an excpetion that would require another regexp. So why does splitting a text into tokens take so long? Has anybody done something similar to what I am trying, and do you have any advice? The good news is that relativley speaking, perl is very, very fast. I tried a similar script in python using a lexer called plex, and the 1.8 megabyte file took 12 minutes to parse! In case you are wondering why I'm seemingly obsessed with speed, I would like to make this script available to anyone. Right now the only free utilities for converting rtf to xml are a java utility call majix, which deletes your footnotes and only allows for 9 user-defined styles. If my perl script is too slow, it won't be very useful. Thanks Paul -- ************************ *Paul Tremblay * *[EMAIL PROTECTED]* ************************ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]