On Oct 27, 7:32 pm, Carl Banks <[EMAIL PROTECTED]> wrote: > I was wondering if anyone had any advice on this. > > This is not to study graph theory; I'm using the graph to represent a > problem domain. The graphs could be arbitrarily large, and could > easily have millions of nodes, and most nodes have a substantial > amount of data associated with them. Obviously I don't want a whole > such graph in memory at once, so libraries the just deal with in- > memory graphs are out. > > I know I could implement this with a relational DB, and I'd be fine > with a library that was built on top of one. But I'm hoping for > specialzed behavior useful for graphs. > > For example, suppose I have a set of nodes called A. It would be > useful to know if any of these nodes are connected by paths that > include no nodes in A. I could, of course, do that by reading from > the database and following the paths, but I clearly don't want to do > that. I would want instead to somehow cache the outside connectedness > relationship between different nodes of A. But then what happens if > something changes elsewhere. How is the cache for A notified, and can > it be updated efficiently using graph theorems, rather than > regenerated? > > It's very tricky; that's why I hope someone else has done it. > > I'm guessing no. > > Carl Banks
Are you just looking for a persistent graph? The usual options are 'shelve', 'sqllite', 'mmap', and 'ctypes.Structure'. Or did you need a special structure for the 'outside connectedness' properties? Storing that is the usual time-space tradeoff. (Actually, I think that term refers to something slightly distinct. This would be more properly termed read-time/write-time trade off, roughly.) Assuming you pick an RDB, you'd have a table of vertices and a table of edges. Did you want 'set A' to be a table and persist between runs of the program? If not, just keep a set of 2-tuples of nodes that satisfy that property. I think the set S you're after is: all x,y such that x is a vertex in A, y is a vertex in A, and there exists a P such that P is a path, P starts at x, P ends at y, and vertex v in P implies that v is x, v is y, or v is not in A. I don't know if you can cache any information about A that gives you any shortcuts in calculating S, or what the running time or running space of the fastest/smallest algorithm is. At worst, it's O( |V| * |E| ) on each change to V or E. Unverified. You might be able to store the shortest path in a mapping from 2- tuples to paths. Or better yet, map each node in A to the set of all nodes reachable from it only entering A once. Then, you might save time on either adding a node/vertex to A or G, or removing one from A or G, or both. Maybe mapping each node in G to that relative set. You didn't say whether the graph was directed and/or cyclic. A good place to start would be, if you add a node to G, can S decrease? No, because the original path P still exists. If you remove one from A, still in G, S can stay the same size, might decrease. If you remove a node from G, not in A, same, etc. -- http://mail.python.org/mailman/listinfo/python-list