Re: Graph library recommendations for large graphs

Diez B. Roggisch Mon, 24 Aug 2009 15:27:40 -0700

VanL schrieb:

I am working on a project that will require building and querying largegraph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,100M edges). NetworkX seems to be the most popular, but I am concernedthat a dict representation for nodes would use too much memory -- myinitial tests suggest that a graph with 3M nodes and 12M edges createssubstantial memory pressure on my machine.
Can anybody who has worked with large graphs before give a recommendation?

My initial tests show otherwise. The below test-script creates 3 millionnodes with 12 million adjacencies, on my 2GB Notebook.

The theoretical limit for this (if we assume pointer-basedadjacency-references which makes sense if we have sparse graphs as yournumbers indicate) is (32 bits assumed):

- 8 bytes per node (4 byte pointer to adjacency list, 4 byte int forcounting the number of adjacencies in that list)

 - 4 bytes per adjacency

This is 60.000.000 for your example - roughly 60MB. On my machine, theprocess has 320.000.000MB - (roughly) a factor five. Given the muchricher properties a Python-object (and python-lists) have thas is prettygood I'd say.

So for your eventual size of 40M nodes, 100M edges, we have atheoretical amount of 560MB, times 5 makes 2.5 GB. Not exactly a lowmemory profile, but manageable on modern hardware.

I don't know anything about NetworkX - it still might be the bettersolution, given the underlying C-based algorithms. But if all you needis to represent a graph of that size, it appears to be working.



---- test.py ----

import random
import gc
import time

class Node(object):

    __slots__ = ["adjacencies", "value", "id"]

    def __init__(self, id):
        id = id
        value = random.random()
        self.adjacencies = []


nodes = []

gc.disable()
nc = 3000000

for i in xrange(nc):
    nodes.append(Node(i))
    if (i % 1000) == 0:
        print i

for i in xrange(12000000):
    a = random.randint(0, nc - 1)
    b = random.randint(0, nc - 1)
    while a == b:
        b = random.randint(0, nc)
    nodes[a].adjacencies.append(nodes[b])
    if (i % 1000) == 0:
        print "e", i


gc.enable()
while True:
    time.sleep(1)
    traversed = set()
    def depth_search(node, depth=0):
        traversed.add(node)
        if depth == 4:
            return
        for child in node.adjacencies:
            if child not in traversed:
                depth_search(child, depth+1)

    depth_search(nodes[random.randint(0, nc - 1)])

------


Diez
--
http://mail.python.org/mailman/listinfo/python-list

Re: Graph library recommendations for large graphs

Reply via email to