On Thu, Feb 7, 2008 at 7:11 PM, Shaun Laughey <[EMAIL PROTECTED]> wrote:
> > Hi, > I have used Beautiful Soup for parsing html. > It works very nicely and I didn't see much of an issue with speed in > parsing several hundred html files every hour or so. > I also rolled my own using various regex's and stuff nicked from a > perl lib. It was awful and feature incomplete. Beautiful Soup worked > better. > > Shaun Laughey. > To clarify, I use BeautifulSoup for a small project that parses frequently changing HTML on a number of websites (>1MB each), extracts the content of specific tags, filters out certain strings from the content, and serves it up in a consistent format. The input HTML comes from the wild, and often contains odd tags, funny characters, and other inconsistencies. It has so far worked near-perfectly for the last 9 months. Speed appears to be a conventional problem with BS, which is why I mentioned it, but when I analysed the code in an effort to speed it up I discovered that 90%+ of the time taken was accounted for by network latency in getting the data from the remote sites. Alex
_______________________________________________ python-uk mailing list python-uk@python.org http://mail.python.org/mailman/listinfo/python-uk