On Fri, Dec 11, 2009 at 3:12 AM, Wolodja Wentland < wentl...@cl.uni-heidelberg.de> wrote:
> Hi all, > > I am writing a library for accessing Wikipedia data and include a module > that generates graphs from the Link structure between articles and other > pages (like categories). > > These graphs could easily contain some million nodes which are frequently > linked. The graphs I am building right now have around 300.000 nodes > with an average in/out degree of - say - 4 and already need around 1-2GB of > memory. I use networkx to model the graphs and serialise them to files on > the disk. (using adjacency list format, pickle and/or graphml). > > The recent thread on including a graph library in the stdlib spurred my > interest and introduced me to a number of libraries I have not seen > before. I would like to reevaluate my choice of networkx and need some > help in doing so. > > I really like the API of networkx but have no problem in switching to > another one (right now) .... I have the impression that graph-tool might > be faster and have a smaller memory footprint than networkx, but am > unsure about that. > > Which library would you choose? This decision is quite important for me > as the choice will influence my libraries external interface. Or is > there something like WSGI for graph libraries? > > kind regards > I once computed the PageRank of the English Wikipedia. I ended up using the Boost graph library, of which there is a parallel implementation that runs on clusters. I tried to do it using Python but failed as the memory requirements were so large. Boost and the parallel version both have python interfaces.
-- http://mail.python.org/mailman/listinfo/python-list