On Fri, Nov 18, 2011 at 9:28 PM, Aphyr <ap...@aphyr.com> wrote: > On 11/18/2011 11:50 AM, Jeroen van Dijk wrote: > >> And I also didn't include the riak user list for this reply: >> >> >> On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <ap...@aphyr.com >> <mailto:ap...@aphyr.com>> wrote: >> >> Depending on whether you think it will be more efficient to store >> the graph or its dual, consider each node a vertex and write the >> adjacency list as a part of its data. You can store whatever >> weights, etc. you need on the edges there. >> >> Don't use links; they're just a thin layer on top of mapreduce, so >> there's really not much advantage to using them. Links are subject >> to the HTTP header lengths too, so storing more than a thousand on >> any node is likely to break down. >> >> >> Thank you for this suggestion. Also thanks for the warning on not using >> links for what I want. So you are saying each vertex will have a list of >> the other vertices that it is connected to? And save each edge as >> key/value pair? Or are you saying each vertex should embed the adjacent >> edges, meaning duplicated edges? >> > > Depends on whether you want to store the graph or its dual. If you're > dealing with a sparse DAG where the input keys are likely to be vertex > names, the natural choice is to store each vertex as an object in riak and > each outbound edge from it as a property of that vertex. > > /users/user1: > owns: [item1, ...] > follows: [user2, ...] > > Of course, this method doesn't work well when you need to follow edges in > reverse, so you may need to store the reciprocal relationship on target > nodes: > > /items/item1: > owner: user1, > > /users/user2: > followed-by: [user1, ...] > > and so forth. That requires two writes and potentially lossy conflict > resolution strategies, e.g. a follows b but b is not followed by a. We use > processes which walk the graph continuously and enforce relationship > integrity as they go. We have a social graph of several hundred million > objects stored this way in Riak, and it works reasonably well. > > Naturally, you'll want to choose an encoding which is fast and > space-efficient for a large dataset. Depending on your needs, JSON, > protocol buffers, or fixed-length record entries might be work well. > > Also consider the depth of traversals you'll need to perform. It may be > best to use a graph database like neo4j for deep traversals. You could > store your primary dataset in Riak and replicate only the important graph > information to a graph DB via post-commit hooks. That would solve the > reciprocal consistency problem (at the cost of replication lag), and could > reduce the amount of data you need to put into the graph DB. > > Given that Linkedin has this problem, you might look into their tools as > well. > > Thanks for your detailed reply. I'm very happy to hear that you are using Riak for a similar use case. My dataset will be of a similar magnitude. It will be a graph where edges have properties (e.g. user 1 rates item x with 2 stars), but I think saving these edges as separate objects will work. I'll use post-commit hooks to get all this data in a graph database (probably Neo4j) and use this for graph traversing.
A side question Kyle, how do you make sure the graph database is always (eventually) consistent with Riak? E.g. how do you correct potential errors during the post-commit hooks for instance? And say this database gets corrupt how do you recover this graph database and get it in sync again with Riak? Cheers, Jeroen > --Kyle > > I'm guessing you mean the former, because that makes sense to me. So you >> would save a graph assuming users and items like the following key/value >> pairs: >> >> //Vertices >> user1: properties >> user2: properties >> item1: properties >> >> //Edges >> user1-owns-item1: properties >> user1-follows-user2: properties >> user2-follows-user1: properties >> >> To be able to find the available edges, each vertices would need to >> reference the keys of the edges. Is this what you mean? >> >> If so, one more question about a possible problem. Say I have an item >> with many many outgoing edges, so it needs to embed these references. >> This would make it really costly to fetch this item from Riak I assume, >> even if you are only interested in normal properties. Wouldn't that mean >> you will have to save the properties seperately from the edges >> references to it feasible? >> >> Did I grasp what you were proposing Kyle? >> >> Thanks, >> Jeroen >> >> --Kyle >> >> >> On 11/18/2011 07:38 AM, Jeroen van Dijk wrote: >> >> Hi all, >> >> I'm currently evaluating whether Riak would fit as the main >> storage of >> my current project, a social network. The reason I am attracted >> to Riak >> and less to a Graph database as main storage is that I want the >> easy >> horizontal scalability and multi-site replication that Riak >> provides. >> The only thing I doubt is whether the key-value/link model of >> Riak is >> flexible enough to be able to store a property graph >> (http://arxiv.org/abs/1006.__**2361<http://arxiv.org/abs/1006.__2361> >> <http://arxiv.org/abs/1006.**2361 <http://arxiv.org/abs/1006.2361>>). >> I am not asking whether the >> >> querying/graph traversing will be easy; I'm probably going to use a >> graph database or a Pregel like platform (e.g. >> http://www.goldenorbos.org/) for that problem. I guess my main >> question >> is whether it would be easy/feasible to import and export a >> property >> graph in and from Riak? Has someone done this before? >> >> I realize the above might be too specific, so here are two more >> questions that I think are relevant: >> >> - Is there a known upper limit of links that can be stored (I >> don't want >> to add them all at once so 1000 per request is fine, >> http://lists.basho.com/__**pipermail/riak-users_lists.__** >> basho.com/2010-March/000786.__**html<http://lists.basho.com/__pipermail/riak-users_lists.__basho.com/2010-March/000786.__html> >> <http://lists.basho.com/**pipermail/riak-users_lists.** >> basho.com/2010-March/000786.**html<http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000786.html> >> >) >> >> - Is there a way to add meta data to links (edges)? E.g. weigths >> and >> other attributes. >> >> Any other ideas or advise are also highly appreciated. >> >> Cheers, >> >> Jeroen >> >> >> >> ______________________________**___________________ >> riak-users mailing list >> riak-users@lists.basho.com >> <mailto:riak-users@lists.**basho.com<riak-users@lists.basho.com> >> > >> http://lists.basho.com/__**mailman/listinfo/riak-users___** >> lists.basho.com<http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com> >> <http://lists.basho.com/**mailman/listinfo/riak-users_** >> lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> >> > >> >> >> >> >> >> >> ______________________________**_________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com