[issue47145] Improve graphlib.TopologicalSort by removing the prepare step

Larry Hastings Sat, 02 Apr 2022 14:50:42 -0700


Larry Hastings <la...@hastings.org> added the comment:


I agree that the API should have as few surprises as possible.  AFAICT you 
haven't made any terminal pronouncements like "it's impossible to add this 
feature without too many unacceptable surprises", so I'll proceed assuming we 
can find an API (and semantics) that we're all happy with.

I've come around on the idea that forgetting "done" nodes is too surprising.  
The least surprising behavior is: once we've added a node to the graph, the 
graph remembers it.

Now I'll propose a second simple rule, which you've already touched on: you 
can't add a dependency after a node has been returned by get_ready().  
Attempting this always throws an exception; which exception specifically TBD.  
"Tough beans".

I think those two rules are easy enough to remember and to reason about.  It 
meaningfully expands the utility of the library with minimal surprise.  The 
library behaves identically for users of the existing API but permits new use 
cases too.  Adding this functionality would therefore mean fewer users would 
discover too late their use case isn't supported.


As far as my "long-lived graph objects will consume more and more memory over 
time" caveat, there's a better solution than "forget": graph.remove(*nodes).  
(My version already has a "remove" method, and I forgot that graphlib's 
doesn't.)  Allowing the user to remove a node from the graph gives them 
explicit control, and the semantics should be obvious and unsurprising; if 
you--the user--remove a node, and later you--the user--re-add that node to the 
graph, it behaves identically to any other node the graph has never seen 
before.  Removing a node intuitively removes all edges to that node.

Two notes on "remove" if we decide to go that route.  First, I'd ensure you can 
remove a node at any time. Nodes have three externally visible states wrt 
TopologicalSort:

1) added but not published by get_ready,
2) published by get_ready but not returned using done, and
3) done.  You should be able to remove a node in any of those three states.

Removing a node in 2) should be equivalent to calling done before calling 
remove; that is, if you're removing the node anyway, you don't need to call 
done.

Second, the current underlying implementation would make remove really slow.  
Nodes don't remember their predecessors, only their successors, so removing a 
node would be O(n).  If we expected remove to get a lot of use, we'd probably 
want to change how the graph is stored to speed up this operation.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue47145>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue47145] Improve graphlib.TopologicalSort by removing the prepare step

Reply via email to