En Sun, 10 Feb 2008 02:09:12 -0200, Victor Lin <[EMAIL PROTECTED]> escribió:
> On 2月10日, 上午11時42分, "Gabriel Genellina" <[EMAIL PROTECTED]> > wrote: >> En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <[EMAIL PROTECTED]> >> escribió: >> >> > I encounter a problem with pickle. >> > I download a html from: >> >> >http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B... >> >> > and parse it with BeautifulSoup. >> > This page is very huge. >> > When I use pickle to dump it, a RuntimeError: maximum recursion depth >> > exceeded occur. Yes, I could reproduce the error. Worse, using cPicle instead of pickle, Python just aborts (no exception trace, no error printed, no Application Error popup...) (this is with Python 2.5.1 on Windows XP) <code> import urllib import BeautifulSoup import cPickle doc = urllib.urlopen('http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B000NMKHW6/ref=sr_1_2?ie=UTF8&s=electronics&qid=1202541889&sr=1-2') soup = BeautifulSoup.BeautifulSoup(doc) #print len(cPickle.dumps(soup,-1)) </code> That page has an insane SELECT containing 1000 OPTIONs. Removing some of them makes cPickle happy: <code> div=soup.find("div", id="buyboxDivId") select=div.find("select", attrs={"name":"quantity"}) for i in range(200): # remove 200 options out of 1000 select.contents[5].extract() print len(cPickle.dumps(soup,-1)) </code> I don't know whether this is an error in BeautifulSoup or in pickle. That SELECT with many OPTIONs is big, but not recursive (and I think that BeautifulSoup uses weak references to build its links); anyway pickle is supposed to handle recursion well. The longest chain of nested tags has length=32; in principle I would expect that BS has a similar nesting complexity, and the "recursion limit exceeded" error isn't expected. >> BeautifulSoup objects usually aren't pickleable, independently of your >> recursion error. > But I pickle and unpickle other soup objects successfully. > Only this object seems too deep to pickle. Yes, sorry, I was using an older version of BeautifulSoup. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list