John Nagle wrote:
>   Are weak refs slower than strong refs?  I've been considering making the
> "parent" links in BeautifulSoup into weak refs, so the trees will release
> immediately when they're no longer needed.  In general, all links back
> towards the root of a tree should be weak refs; this breaks the loops
> that give reference counting trouble.
> 
>                 John Nagle

    I just finished converting BeautifulSoup to use weak back references.
All that's necessary is to make the "parent", "previous", "previousSibling",
and "tagStack" links into weak proxy references.  Those are the
"backlinks" of the Beautiful Soup tree; once they're weak links,
BeautifulSoup then becomes loop-free and GC in debug mode reports
zero garbage.  This is useful, because BeautifulSoup's backlinks create huge 
amounts of collectable garbage, leading to frequent GC cycles.

    The "weakref" module could use some support functions.  If you
try to create a weakref proxy from a weakref proxy, or from "None",
you get an exception, which is not usually what you want.
It's more useful to pass through "None" or an existing weakref proxy.
So I wrote the function below, which makes it much easier to
convert code to use weak proxy references.

    "weakref.proxy()" probably should work that way.
Weakref proxies are supposed to be transparent, but they're not
quite transparent enough.

    Patches to BeautifulSoup are below.

                                John Nagle

59a60,80
 > import weakref                                                       # Weak 
 > references for previous, parent, previousSibling
 > #
 > #    Weakref allocation control
 > #
 > #    The following links are always weak references, to avoid internal 
 > referenc
 > #    require extra garbage collection.
 > #            self.parent
 > #            self.previous
 > #            self.previousSibling
 > #    These are all "back links".
 > #
 > #    backref  --  create a weak reference as a back pointer
 > #
 > #    Generates a weakref proxy, but handles input of "none" or an existing 
 > weak
 > #
 > def backref(p) :
 >      if p == None :                                                          
 >                 # if none
 >              return(None)                                                    
 >                 # then none
 >      if isinstance(p,weakref.ProxyType) or 
 > isinstance(p,weakref.CallableProxyTyp
 >              return(p)
 >      return(weakref.proxy(p))                                                
 >         # otherwise a new weakref
60a82
 > #
79,80c101,102
<         self.parent = parent
<         self.previous = previous
---
 >         self.parent = backref(parent)
 >         self.previous = backref(previous)
85c107
<             self.previousSibling = self.parent.contents[-1]
---
 >             self.previousSibling = backref(self.parent.contents[-1])
127c149
<             self.nextSibling.previousSibling = self.previousSibling
---
 >             self.nextSibling.previousSibling = backref(self.previousSibling)
157c179
<         newChild.parent = self
---
 >         newChild.parent = backref(self)
161c183
<             newChild.previous = self
---
 >             newChild.previous = backref(self)
164c186
<             newChild.previousSibling = previousChild
---
 >             newChild.previousSibling = backref(previousChild)
166c188
<             newChild.previous = previousChild._lastRecursiveChild()
---
 >             newChild.previous = backref(previousChild._lastRecursiveChild())
190c212
<                 newChild.nextSibling.previousSibling = newChild
---
 >                 newChild.nextSibling.previousSibling = backref(newChild)
194c216
<             newChildsLastElement.next.previous = newChildsLastElement
---
 >             newChildsLastElement.next.previous = backref(newChildsLastElemen
1052c1074
<         self.tagStack.append(tag)
---
 >         self.tagStack.append(backref(tag))
1074c1096
<             self.previous = o
---
 >             self.previous = backref(o)
1167c1189
<         self.previous = tag
---
 >         self.previous = backref(tag)
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to