On Sun, 2006-09-17 at 13:14 +0200, patrickk wrote:
> I´m having large text fields where an editor can post xhtml-code.  
> right now, I´m using rich text editing for that area (mostly fck, but  
> also tinymce).
> I´m looking for suggestions on how to cleanup the xhtml-code:
> - delete empty paragraphs
> - no text outside paragraphs/headlines/lists ...
> - headline must not start and end with empty spaces
> - and so on ...
> 
> I could use something like cleanup_callback with tinymce, but I´ve  
> also come across BeautifulSoup.
> Just wanted to know if anyone has experience with that issue and may  
> point out some pros/cons (one thing that scares me is that  
> BeautifulSoup is around 70k).

Is your input guaranteed to be well-formed XHTML? If so, ElementTree
(http://effbot.org/zone/element-index.htm ) will be faster, particularly
cElementTree. It always feels very Pythonic when you program with it, so
it gets ease-of-use points.

BeautifulSoup is a lifesaver when you need to process HTML that might be
not particularly well constructed and I like its functionality in that
area. I haven't used it in very heavy multi-process environments, so I
must admit that the memory usage isn't something I've worried about too
much. Not sure who comfortable it is to write out something that
BeautifulSoup has parsed -- you'll need to write your own serialiser
(ElementTree has SimpleXMLWriter) -- but that shouldn't be a
showstopper.

For BeautifulSoup you are going to have to write a tree walker to
process the nodes. ElementTree-based code could be handled in the same
fashion, but the iterparse() method for processing as you parse is my
favourite way of working where I have to act on potentially all the
input.

Regards,
Malcolm

Regards,
Malcolm



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users
-~----------~----~----~----~------~----~------~--~---

Reply via email to