On Mon, Dec 7, 2009 at 11:28 PM, geremy condra <debat...@gmail.com> wrote: > On Mon, Dec 7, 2009 at 6:28 PM, M.-A. Lemburg <m...@egenix.com> wrote: >> geremy condra wrote: >>> On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <m...@egenix.com> wrote: >>>> geremy condra wrote: >>>>> How interested are you in a C port of graphine? I haven't had >>>>> any specific requests for it, but if its something you need I >>>>> can shuffle it towards the top of the to do pile. >>>> >>>> There are two main reasons for a C implementation: >>>> >>>> 1. performance >>>> >>>> 2. memory footprint >>>> >>>> These are important when using graphs with lots of nodes or >>>> when using lots of graphs or operations on graphs in tight >>>> loops. >>>> >>>> However, to get such a new type into the core, you need to start >>>> a PEP process and hash out the API some more, esp. with respect >>>> to integration with the other core types, such as dictionaries >>>> and lists of tuples for both creation and as alternative >>>> representation. >>> >>> I'm happy to start the PEP process, but will need some >>> guidance, as I've never written a PEP before. >> >> Some pointers to get you started: >> >> PEPs in general: >> http://www.python.org/dev/peps/pep-0001/ >> http://www.python.org/dev/peps/pep-0009/ >> >> Adding modules to the standard lib: >> http://www.python.org/dev/peps/pep-0002/ >> >> (though, in reality, you'd probably only be patching the >> collections.py module, I guess) > > Thanks, I'll go over these a little later. > >>>> Some other comments (in no particular order): >>>> >>>> * the "name" default of using id(node) is not ideal, since >>>> id(node) will return the address of the object and that >>>> can be subject to reuse within the lifetime of the process; >>>> using a global (thread-safe) counter would be safer >>> >>> Good idea. I'm going to start a branch on the repo to >>> handle the changes you mention. >>> >>>> * Graph.__init__ should be able to take a list or set >>>> of nodes and edges as initializer >>> >>> The format of this will need to be thought all the way >>> through before being implemented. To date, we haven't >>> come up with anything completely satisfactory, but >>> AFAIK everybody involved is still open to suggestions >>> on this. >> >> I wasn't thinking of anything clever :-) ... >> >> g = Graph( >> [Node("a"), Node("b"), Node("c")], >> [Edge(Node("a"), Node("b"), "ab"), >> Edge(Node("a"), Node("c"), "ac"), >> Edge(Node("b"), Node("c"), "bc"), >> ]) >> >> The main motivation here is to get lists, sets and dicts >> play nice together. > > Generally, we've tried to discourage people from instantiating > nodes and edges directly, in favor of having them controlled > through the graph. Maybe something along the lines of: > > g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')]) > > ? > >>>> * Graph.__setitem__ could be mapped to .add_node() for >>>> convenience >>> >>> This may be a question of playing around with it. ATM I'm >>> not sold on the idea, but I'll implement it and see how it >>> turns out in practice. >> >> Thinking about it some more, I agree, it's not all that useful. >> >>>> * Graph.__length__ could be mapped to .size() for >>>> convenience >>> >>> We decided not to do this due to the ambiguity between >>> whether .size() or .order() was the intended operation, >>> and looking back I'm not sure that was entirely unjustified. >>> Do you see there being any confusion on that score? >> >> There is an ambiguity here, indeed. My thinking was that >> the edges define the graph and can be mapped to a dictionary, >> e.g. >> >> d = {"ab": ("a", "b"), >> "ac": ("a", "c"), >> "bc": ("b", "c")} >> >> len(d) == 3 >> len(g) == 3 >> >> A graph without edges is also what you typically call an empty >> graph, ie. >> >> if not g: >> print 'empty' >> >> http://mathworld.wolfram.com/EmptyGraph.html >> >> Still, perhaps it's better to just not go down this route. > > I'd rather avoid it if possible, since there are many > potential interpretations of this in different contexts. > Again, I'm open to to the idea, but not convinced. > >>>> * Graph.__iter__ could be mapped to an iterator using >>>> the fastest traversal method for the graph nodes >>>> (ie. order does not matter, it's only important that >>>> all nodes are found as fast as possible) >>> >>> Again, it seems ambiguous as to whether nodes or >>> edges are the intended target here, and while the >>> API can obviously dictate that, it seems a bit like >>> a case of "in the face of ambiguity, refuse the >>> temptation to guess" to me. >> >> Right, but sometimes "practicalty beats purity" ;-) We had >> the same situation for dictionaries and then decided that >> iteration over keys would be more natural than iterating over >> items (key, value) or values. >> >> It's also important to note that: >> >> for n in g: print n >> >> and >> >> n in g >> >> match up in terms of semantics. >> >> Since n in g already uses the "node in graph" approach, >> the for-loop should use the same logic. > > Beware, that doesn't just match nodes. > > g = Graph() > g.add_node('a') > g.add_node('b') > g.add_edge('a', 'b', 'ab') > 'ab' in g # returns true > >>>> * Graph could also benefit from some bulk update methods, >>>> e.g. to update weights on all edges or nodes by passing >>>> in a dictionary mapping names to attribute values >>> >>> Sounds good. Again, the format will need some careful >>> thought, but you're right that this would improve it >>> greatly. >> >> This is only an optimization, but could lead to some great >> performance improvements by avoiding function call overhead. > > Agreed. > >>>> * Graph could benefit from some bulk query methods, >>>> such as .get_node_attributes() and .add_edges(). >>> >>> I'm not sure how the first is different from the existing >>> .data, what did you have in mind? >> >> The second one was a typo. It should have been >> .get_edge_attributes(). >> >> In both cases I was thinking of being able to quickly >> extract a number of node/edge attributes, e.g. >> >>>>> g.get_edge_attributes('weight', 'color') >> {"ab": {"weight": 3, "color": "blue"}, >> "ac": {"weight": 2, "color": "green"}, >> "bc": {"weight": 1, "color": "red"}} >> >> Again, the idea is to reduce call overhead and later >> on be able to move these lookups to C. > > Entirely doable, but I'm not sure I see the use case. > Would you mind providing a more realistic example? > >>>> * Node.__getitem__ could be mapped to .data.__getitem__ >>>> (following the approach taken by ElementTree); same >>>> for Edge.__getitem__ >>> >>>> * Node.__setitem__ could be mapped to .data.__setitem__ >>>> (following the approach taken by ElementTree); same >>>> for Edge.__setitem__ >>> >>> .data is just a property that returns a dictionary of non >>> private members right now, so if you're wanting to just say >>> node.this = 'stuff', you can already do that. Or am I >>> misreading your intent? >> >> Right, but with attributes you are restricted to Python >> identifiers, e.g. keywords (e.g. "from", "pass") are not allowed. >> The above approach bypasses this restriction. > > Hmm. Sounds like a plausible use case to me, although I'm > not sure its one that should be encouraged. The bigger > question in my mind is whether all attribute lookups should > have to pay the extra lookup cost to support a somewhat > narrow need. I'll definitely have to talk with the other devs > about this one, and run a little bit of timing besides. > > Geremy Condra >
Just as a note, I've made some of the changes we've been talking about in a new branch over at graphine.org. Geremy Condra -- http://mail.python.org/mailman/listinfo/python-list