On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:
And I also didn't include the riak user list for this reply:
On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <ap...@aphyr.com
<mailto:ap...@aphyr.com>> wrote:
Depending on whether you think it will be more efficient to store
the graph or its dual, consider each node a vertex and write the
adjacency list as a part of its data. You can store whatever
weights, etc. you need on the edges there.
Don't use links; they're just a thin layer on top of mapreduce, so
there's really not much advantage to using them. Links are subject
to the HTTP header lengths too, so storing more than a thousand on
any node is likely to break down.
Thank you for this suggestion. Also thanks for the warning on not using
links for what I want. So you are saying each vertex will have a list of
the other vertices that it is connected to? And save each edge as
key/value pair? Or are you saying each vertex should embed the adjacent
edges, meaning duplicated edges?
Depends on whether you want to store the graph or its dual. If you're
dealing with a sparse DAG where the input keys are likely to be vertex
names, the natural choice is to store each vertex as an object in riak
and each outbound edge from it as a property of that vertex.
/users/user1:
owns: [item1, ...]
follows: [user2, ...]
Of course, this method doesn't work well when you need to follow edges
in reverse, so you may need to store the reciprocal relationship on
target nodes:
/items/item1:
owner: user1,
/users/user2:
followed-by: [user1, ...]
and so forth. That requires two writes and potentially lossy conflict
resolution strategies, e.g. a follows b but b is not followed by a. We
use processes which walk the graph continuously and enforce relationship
integrity as they go. We have a social graph of several hundred million
objects stored this way in Riak, and it works reasonably well.
Naturally, you'll want to choose an encoding which is fast and
space-efficient for a large dataset. Depending on your needs, JSON,
protocol buffers, or fixed-length record entries might be work well.
Also consider the depth of traversals you'll need to perform. It may be
best to use a graph database like neo4j for deep traversals. You could
store your primary dataset in Riak and replicate only the important
graph information to a graph DB via post-commit hooks. That would solve
the reciprocal consistency problem (at the cost of replication lag), and
could reduce the amount of data you need to put into the graph DB.
Given that Linkedin has this problem, you might look into their tools as
well.
--Kyle
I'm guessing you mean the former, because that makes sense to me. So you
would save a graph assuming users and items like the following key/value
pairs:
//Vertices
user1: properties
user2: properties
item1: properties
//Edges
user1-owns-item1: properties
user1-follows-user2: properties
user2-follows-user1: properties
To be able to find the available edges, each vertices would need to
reference the keys of the edges. Is this what you mean?
If so, one more question about a possible problem. Say I have an item
with many many outgoing edges, so it needs to embed these references.
This would make it really costly to fetch this item from Riak I assume,
even if you are only interested in normal properties. Wouldn't that mean
you will have to save the properties seperately from the edges
references to it feasible?
Did I grasp what you were proposing Kyle?
Thanks,
Jeroen
--Kyle
On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:
Hi all,
I'm currently evaluating whether Riak would fit as the main
storage of
my current project, a social network. The reason I am attracted
to Riak
and less to a Graph database as main storage is that I want the easy
horizontal scalability and multi-site replication that Riak
provides.
The only thing I doubt is whether the key-value/link model of
Riak is
flexible enough to be able to store a property graph
(http://arxiv.org/abs/1006.__2361
<http://arxiv.org/abs/1006.2361>). I am not asking whether the
querying/graph traversing will be easy; I'm probably going to use a
graph database or a Pregel like platform (e.g.
http://www.goldenorbos.org/) for that problem. I guess my main
question
is whether it would be easy/feasible to import and export a property
graph in and from Riak? Has someone done this before?
I realize the above might be too specific, so here are two more
questions that I think are relevant:
- Is there a known upper limit of links that can be stored (I
don't want
to add them all at once so 1000 per request is fine,
http://lists.basho.com/__pipermail/riak-users_lists.__basho.com/2010-March/000786.__html
<http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000786.html>)
- Is there a way to add meta data to links (edges)? E.g. weigths and
other attributes.
Any other ideas or advise are also highly appreciated.
Cheers,
Jeroen
_________________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com