Re: Social network data / Graph properties of Riak

Aphyr Fri, 18 Nov 2011 12:28:43 -0800

On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:

And I also didn't include the riak user list for this reply:



On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <ap...@aphyr.com
<mailto:ap...@aphyr.com>> wrote:

    Depending on whether you think it will be more efficient to store
    the graph or its dual, consider each node a vertex and write the
    adjacency list as a part of its data. You can store whatever
    weights, etc. you need on the edges there.

    Don't use links; they're just a thin layer on top of mapreduce, so
    there's really not much advantage to using them. Links are subject
    to the HTTP header lengths too, so storing more than a thousand on
    any node is likely to break down.


Thank you for this suggestion. Also thanks for the warning on not using
links for what I want. So you are saying each vertex will have a list of
the other vertices that it is connected to? And save each edge as
key/value pair? Or are you saying each vertex should embed the adjacent
edges, meaning duplicated edges?

Depends on whether you want to store the graph or its dual. If you'redealing with a sparse DAG where the input keys are likely to be vertexnames, the natural choice is to store each vertex as an object in riakand each outbound edge from it as a property of that vertex.


/users/user1:
  owns: [item1, ...]
  follows: [user2, ...]

Of course, this method doesn't work well when you need to follow edgesin reverse, so you may need to store the reciprocal relationship ontarget nodes:


/items/item1:
  owner: user1,

/users/user2:
  followed-by: [user1, ...]

and so forth. That requires two writes and potentially lossy conflictresolution strategies, e.g. a follows b but b is not followed by a. Weuse processes which walk the graph continuously and enforce relationshipintegrity as they go. We have a social graph of several hundred millionobjects stored this way in Riak, and it works reasonably well.

Naturally, you'll want to choose an encoding which is fast andspace-efficient for a large dataset. Depending on your needs, JSON,protocol buffers, or fixed-length record entries might be work well.

Also consider the depth of traversals you'll need to perform. It may bebest to use a graph database like neo4j for deep traversals. You couldstore your primary dataset in Riak and replicate only the importantgraph information to a graph DB via post-commit hooks. That would solvethe reciprocal consistency problem (at the cost of replication lag), andcould reduce the amount of data you need to put into the graph DB.

Given that Linkedin has this problem, you might look into their tools aswell.


--Kyle

I'm guessing you mean the former, because that makes sense to me. So you
would save a graph assuming users and items like the following key/value
pairs:

//Vertices
user1: properties
user2: properties
item1: properties

//Edges
user1-owns-item1: properties
user1-follows-user2: properties
user2-follows-user1: properties

To be able to find the available edges, each vertices would need to
reference the keys of the edges. Is this what you mean?

If so, one more question about a possible problem. Say I have an item
with many many outgoing edges, so it needs to embed these references.
This would make it really costly to fetch this item from Riak I assume,
even if you are only interested in normal properties. Wouldn't that mean
you will have to save the properties seperately from the edges
references to it feasible?

Did I grasp what you were proposing Kyle?

Thanks,
Jeroen

    --Kyle


    On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:

        Hi all,

        I'm currently evaluating whether Riak would fit as the main
        storage of
        my current project, a social network. The reason I am attracted
        to Riak
        and less to a Graph database as main storage is that I want the easy
        horizontal scalability and multi-site replication that Riak
        provides.
        The only thing I doubt is whether the key-value/link model of
        Riak is
        flexible enough to be able to store a property graph
        (http://arxiv.org/abs/1006.__2361
        <http://arxiv.org/abs/1006.2361>). I am not asking whether the
        querying/graph traversing will be easy; I'm probably going to use a
        graph database or a Pregel like platform (e.g.
        http://www.goldenorbos.org/) for that problem. I guess my main
        question
        is whether it would be easy/feasible to import and export a property
        graph in and from Riak? Has someone done this before?

        I realize the above might be too specific, so here are two more
        questions that I think are relevant:

        - Is there a known upper limit of links that can be stored (I
        don't want
        to add them all at once so 1000 per request is fine,
        
http://lists.basho.com/__pipermail/riak-users_lists.__basho.com/2010-March/000786.__html
        
<http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000786.html>)
        - Is there a way to add meta data to links (edges)? E.g. weigths and
        other attributes.

        Any other ideas or advise are also highly appreciated.

        Cheers,

        Jeroen



        _________________________________________________
        riak-users mailing list
        riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
        http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
        <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>





_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Social network data / Graph properties of Riak

Reply via email to