On Sun, 2006-09-17 at 13:14 +0200, patrickk wrote: > I´m having large text fields where an editor can post xhtml-code. > right now, I´m using rich text editing for that area (mostly fck, but > also tinymce). > I´m looking for suggestions on how to cleanup the xhtml-code: > - delete empty paragraphs > - no text outside paragraphs/headlines/lists ... > - headline must not start and end with empty spaces > - and so on ... > > I could use something like cleanup_callback with tinymce, but I´ve > also come across BeautifulSoup. > Just wanted to know if anyone has experience with that issue and may > point out some pros/cons (one thing that scares me is that > BeautifulSoup is around 70k).
Is your input guaranteed to be well-formed XHTML? If so, ElementTree (http://effbot.org/zone/element-index.htm ) will be faster, particularly cElementTree. It always feels very Pythonic when you program with it, so it gets ease-of-use points. BeautifulSoup is a lifesaver when you need to process HTML that might be not particularly well constructed and I like its functionality in that area. I haven't used it in very heavy multi-process environments, so I must admit that the memory usage isn't something I've worried about too much. Not sure who comfortable it is to write out something that BeautifulSoup has parsed -- you'll need to write your own serialiser (ElementTree has SimpleXMLWriter) -- but that shouldn't be a showstopper. For BeautifulSoup you are going to have to write a tree walker to process the nodes. ElementTree-based code could be handled in the same fashion, but the iterparse() method for processing as you parse is my favourite way of working where I have to act on potentially all the input. Regards, Malcolm Regards, Malcolm --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users -~----------~----~----~----~------~----~------~--~---