In <[EMAIL PROTECTED]>, Thomas Ploch wrote: > This is how my regexes look like: > > import re > > class Tags: > def __init__(self, sourceText): > self.source = sourceText > self.curPos = 0 > self.namePattern = "[A-Za-z_][A-Za-z0-9_.:-]*" > self.tagPattern = re.compile("<(?P<name>%s)(?P<attr>[^>]*)>" > % self.namePattern) > self.attrPattern = re.compile( > r"\s+(?P<attrName>%s)\s*=\s*(?P<value>\"[^\"]*\"|'[^']*')" > % self.namePattern)
Have you tested this with tags inside comments? >>> You are probably right. For me it boils down to these problems: >>> - Implementing a stack for large queues of documents which is faster >>> than list.pop(index) (Is there a lib for this?) >> >> If you need a queue then use one: take a look at `collections.deque` or >> the `Queue` module in the standard library. > > Which of the two would you recommend for handling large queues with fast > response times? `Queue.Queue` builds on `collections.deque` and is thread safe. Speedwise I don't think this makes a difference as the most time is spend with IO and parsing. So if you make your spider multi-threaded to gain some speed go with `Queue.Queue`. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list