On 2月10日, 上午11時42分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <[EMAIL PROTECTED]> > escribi�: > > > I encounter a problem with pickle. > > I download a html from: > > >http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B... > > > and parse it with BeautifulSoup. > > This page is very huge. > > When I use pickle to dump it, a RuntimeError: maximum recursion depth > > exceeded occur. > > BeautifulSoup objects usually aren't pickleable, independently of your > recursion error. But I pickle and unpickle other soup objects successfully. Only this object seems too deep to pickle. > > py> import pickle > py> import BeautifulSoup > py> soup = BeautifulSoup.BeautifulSoup("<html><body>Hello, world!</html>") > py> print pickle.dumps(soup) > Traceback (most recent call last): > ... > TypeError: 'NoneType' object is not callable > py> > > Why do you want to pickle it? Store the downloaded page instead, and > rebuild the BeautifulSoup object later when needed. > > -- > Gabriel Genellina
Because parsing html cost a lots of cpu time. So I want to cache soup object as file. If I have to get same page, I can get it from cache file, even the parsed soup file. My program's bottleneck is on parsing html, so if I can parse once and unpickle them later, it could save a lots of time. -- http://mail.python.org/mailman/listinfo/python-list