Re: Question concerning this list

Marc 'BlackJack' Rintsch Sun, 31 Dec 2006 02:35:59 -0800

In <[EMAIL PROTECTED]>, Thomas Ploch
wrote:

> Alright, my prof said '... to process documents written in structural
> markup languages using regular expressions is a no-no.' (Because of
> nested Elements? Can't remember) So I think he wants us to use regexes
> to learn them. He is pointing to HTMLParser though.


Problem is that much of the HTML in the wild is written in a structured
markup language but it's in many cases broken.  If you just search some
words or patterns that appear somewhere in the documents then regular
expressions are good enough.  If you want to actually *parse* HTML "from
the wild" better use the BeautifulSoup_ parser.

.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/

> You are probably right. For me it boils down to these problems:
> - Implementing a stack for large queues of documents which is faster
> than list.pop(index) (Is there a lib for this?)

If you need a queue then use one:  take a look at `collections.deque` or
the `Queue` module in the standard library.

Ciao,
        Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Question concerning this list

Reply via email to