mbstevens wrote: > In such a case you may need to make the page > into one string to search if you don't want to use some complex > method of tracking state with variables as you move from > string to string.
In general it's a very hard problem to do stateful regexes. I recall something from last year about the new Perl implementation that tried to address this sort of problem. But I may have been reading old docs and it could have been done years ago. Parsing the HTML would be the only sure way to accomplish it. Let something that already knows the hierarchy tell you that you're entering a URL and you can skip past all of its recursive inclusions of strings with URLs with strings that have URLs and so on... Of course, that means reconstructing the HTML from the parse tree afterward... --Blair -- http://mail.python.org/mailman/listinfo/python-list