En Sun, 10 Feb 2008 02:09:12 -0200, Victor Lin <[EMAIL PROTECTED]>  
escribió:

> On 2月10日, 上午11時42分, "Gabriel Genellina" <[EMAIL PROTECTED]>
> wrote:
>> En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <[EMAIL PROTECTED]>
>> escribió:
>>
>> > I encounter a problem with pickle.
>> > I download a html from:
>>
>> >http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B...
>>
>> > and parse it with BeautifulSoup.
>> > This page is very huge.
>> > When I use pickle to dump it, a RuntimeError: maximum recursion depth
>> > exceeded occur.

Yes, I could reproduce the error. Worse, using cPicle instead of pickle,  
Python just aborts (no exception trace, no error printed, no Application  
Error popup...) (this is with Python 2.5.1 on Windows XP)

<code>
import urllib
import BeautifulSoup
import cPickle

doc =  
urllib.urlopen('http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B000NMKHW6/ref=sr_1_2?ie=UTF8&s=electronics&qid=1202541889&sr=1-2')
soup = BeautifulSoup.BeautifulSoup(doc)
#print len(cPickle.dumps(soup,-1))
</code>

That page has an insane SELECT containing 1000 OPTIONs. Removing some of  
them makes cPickle happy:

<code>
div=soup.find("div", id="buyboxDivId")
select=div.find("select", attrs={"name":"quantity"})
for i in range(200): # remove 200 options out of 1000
   select.contents[5].extract()
print len(cPickle.dumps(soup,-1))
</code>

I don't know whether this is an error in BeautifulSoup or in pickle. That  
SELECT with many OPTIONs is big, but not recursive (and I think that  
BeautifulSoup uses weak references to build its links); anyway pickle is  
supposed to handle recursion well. The longest chain of nested tags has  
length=32; in principle I would expect that BS has a similar nesting  
complexity, and the "recursion limit exceeded" error isn't expected.

>> BeautifulSoup objects usually aren't pickleable, independently of your
>> recursion error.
> But I pickle and unpickle other soup objects successfully.
> Only this object seems too deep to pickle.

Yes, sorry, I was using an older version of BeautifulSoup.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to