Worked like a charm! Thanks!! Michael Repucci (M) 718-288-4554 (W) 212-938-5597 mich...@repucci.org http://michael.repucci.org/
On Tue, Feb 24, 2009 at 12:06 PM, Horst Gutmann <ze...@zerokspot.com> wrote: > > Get the source release > (< > http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c9.tar.gz#md5=3864c01d9c719c8924c455714492295e > >), > extract it and run `python setup.py install`. At least I assume that > this also works under Windows ;-) > > -- Horst > > On Tue, Feb 24, 2009 at 6:01 PM, Michael Repucci <mich...@repucci.org> > wrote: > > Hi again. So after much reading and consideration, I decided I'd like > > to try html5lib. Unfortunately, it seems that the current version > > html5lib-0.11.1.zip (http://code.google.com/p/html5lib/downloads/list) > > uses setuptools, whose current version is setuptools-0.6c9.win32- > > py2.5.exe (http://pypi.python.org/pypi/setuptools). But I'm running > > Python 2.6. I haven't had much experience with adding packages to > > Python, so if there's a workaround for this (aside from using Python > > 2.5), could somebody please share. Thanks! > > > > On Feb 24, 10:09 am, Michael Repucci <mich...@repucci.org> wrote: > >> Hi Everybody, I just wanted to give a HUGE THANKS to everyone > >> participating in this discussion. It's exactly the kind of information > >> I was hoping I could get from all of you; more than enough to keep a > >> newbie like me occupied on the topic for quite some time. So thanks > >> again for sharing your knowledge and experience. :) > >> > >> On Feb 24, 9:14 am, Brian Neal <bgn...@gmail.com> wrote: > >> > >> > On Feb 23, 10:51 pm, Jacob Kaplan-Moss <jacob.kaplanm...@gmail.com> > >> > wrote: > >> > >> > > On Mon, Feb 23, 2009 at 7:49 PM, Brian Neal <bgn...@gmail.com> > wrote: > >> > > > Interesting, I've also come across this: > >> > >> > > >http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html > >> > >> > > > I've heard it is very fast as it is just a python binding to a C- > >> > > > library...? > >> > >> > > Short version: don't use lxml.html.clean, either. > >> > >> > > Long version: yes, lxml is built on top of libxml2 so it is indeed > >> > > *very* fast. Probably as much as an order of magnitude faster than > >> > > html5lib. > >> > >> > > However, if you look at the source of lxml.html.clean > >> > > (http://codespeak.net/lxml/api/lxml.html.clean-pysrc.html) you'll > see > >> > > its implemented in terms of a blacklist. This is almost always a bad > >> > > idea: you only have to miss *one thing* on your blacklist to make > your > >> > > site as insecure as if you'd not bothered escaping HTML at all. IOW, > >> > > with a blacklist you'd be on constant defense. Remember how early > spam > >> > > protection systems just blocked spammers email addresses? How'd that > >> > > work out, anyway? > >> > >> > > Also... the FIXMEs in that code doesn't exactly inspire confidence. > >> > >> > > No nock against lxml here -- it's an incredible toolkit, and I use > it > >> > > all of the place for general XML and HTML parsing. But security is > >> > > *hard* stuff; it's worth being paranoid about your tools. > >> > >> > I did start to use lxml.html.clean on my project. I tested it (very > >> > casually) and it seemed to work just fine. However, in one spot in my > >> > code I needed finer control over what tags to allow. According to the > >> > docs and the options, it looked to me like you could operate it with a > >> > white list. However this didn't work out in practice. The options you > >> > give to the cleaner are confusing and seem to contradict each other. I > >> > couldn't get it to do what I wanted. I asked about this on the mailing > >> > list and it was conceeded that the options didn't work together very > >> > well. I also studied the source code a bit and came to the same > >> > conclusion. > >> > >> > I then turned to using Markdown and recalibrated my opinion on what to > >> > allow as input from the user. > >> > >> > Thanks for the link to html5lib though. I will keep that in my back > >> > pocket. > >> > >> > BN > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---