On Jun 24, 1:26 am, [EMAIL PROTECTED] wrote: > I need to represent the hyperlinks between a large number of HTML > files as a graph. My non-directed graph will have about 63,000 nodes > and and probably close to 500,000 edges. > > I have looked into igraph (http://cneurocvs.rmki.kfki.hu/igraph/doc/ > python/index.html) and networkX (https://networkx.lanl.gov/wiki) for > generating a file to store the graph, and I have also looked into > Graphviz for visualization. I'm just not sure which modules are > best. I need to be able to do the following: > > 1) The names of my nodes are not known ahead of time, so I will > extract the title from all the HTML files to name the nodes prior to > parsing the files for hyperlinks (edges). > > 2) Every file will be parsed for links and nondirectional connections > will be drawn between the two nodes. > > 3) The files might link to each other so the graph package needs to > be able to check to see if an edge between two nodes already exists, > or at least not double draw connections between the two nodes when > adding edges. > > I'm relatively new to graph theory so I would greatly appreciate any > suggestions for filetypes. I imagine doing this as a python > dictionary with a list for the edges and a node:list paring is out of > the question for such a large graph?
Perhaps a dictionary where the key is a node and the value is a set of destination nodes? -- http://mail.python.org/mailman/listinfo/python-list