Don't fall into the trap of "one size fits all". Riak is an amazing product that can solve a number of tough problems. I don't think this is one of them. You need/want a triple store to (correctly) model this sort of problem. You need flexible schemas, ontologies, and a graph based query language.
Take a look at: http://www.bigdata.com/ <http://www.bigdata.com/>Regards, -Eric On Thu, Apr 22, 2010 at 4:23 AM, Orlin Bozhinov <o...@soundsapiens.com> wrote: > Riak Users, > > Thinking about a data modeling pattern that will allow one to not worry > about how many links can be had with one-to-many (or many-to-many) > scenarios. This question has come up before in various places. One answer > I like is Sean's from this thread > http://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: "... at the point > where you have that many links it becomes necessary to consider other > options, including intermediary objects or alternative ways of representing > the relationship". I wonder if an _intermediary way_ could be baked into > Ripple (or your client library of choice). This is for the cases when > one-to-many can become one-to-very-many. > > To make it more interesting, let's say we want to add metadata to the > relationship as described in the pre-last paragraph of > http://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/. > Here is what I have in mind: {from}->{from_association}->{association}->{to} > -- the {curlied} are bucket / objects and -> are links. For example if > {from} = "user"; and {association} = "interest"; and {to} = {whatever} there > is interest in - e.g. "event", "place", "story", another "user" or even > self-interest :) But I'm getting ahead of myself. Let's use a recent > example from Basho's blog where a "user" links {to} = "task". So we get: > user -has--> user_interest -meta--> interest -in--> task. > > The "interest" association could imply "ownership" but maybe the > application allows its "users" to express interest another's "task". Maybe > it's a collaborative effort... Reverse-linking from the many interests / > tasks to their respective owners is easy because it's just a single link for > task -of--> user or interest -of--> user. In the interests bucket I want to > put all kinds of useful metadata. There I would embed (via Composition as > Ripple calls it) not only all the "tags", but also "notes", "star", etc. > Think delicious bookmarks or google reader items and so on. It seems like a > common pattern. Something that may fit the use case of @botanicus too. One > could represent all possible links (various associations) between two > objects as metadata contained in a given "interest". Ownership can be a > type of interest for the sake of link-walking. > > There are three things happening here: > 1. the "very many" (links through intermediary objects) > 2. optional metadata (yet another intermediary object) - multiple > associations between any two objects can be expressed through extra metadata > rather than extra links > 3. reusing the "very-many" and / or metadata intermediaries -linking--> to > objects in different buckets > > The real issue (that #1 solves) is not having an easy ability to do "very > many" links originating from the same object. The #2 metadata object vs a > few extra links for tags / notes (which are insignificant compared to the > many interests a user can have) - makes it easier (in my eyes) to put in > Redis for filtering... Of-course interests (#2) could be specialized > (different metadata models) with regards to what they are about (#3). On > delicious that's just bookmarks. I've got close to 6,000 of them. Does > that approach "very many" in terms of Riak? If "very many" were easier to > do (with client-library models or otherwise Riak itself) #2 & #3 would be > indifferent about which intermediary leads to them (an extra link-walk step) > as they are already possible anyway. How could we step (automagically) > through an intermediary object (the user_interest "very many" enabler > bucket) - having a specific target object in mind? > > I think it may already be possible with current link-walking. Then it's > all a matter of managing the intermediary bucket / objects. Not exactly > sure how the max links are calculated. According to one formula from the > mailing list I may get 1000 headers (limit in mochiweb) * 200 links ("around > 40 chars per link") = 200,000 links max? That seems like "very many", but > there was also something about performance burden... If we took those 200 > with just a single header, pointing to 200 intermediary objects, each > pointing to another 200 target objects we would get 40,000 links. That's > quite a few. Of-course that number could easily get much much bigger > (square the default limit). What decides how many links per intermediary > object is ideal? Is it a setting that Basho could recommend a default for? > Could Ripple automate that? Some link creation logic is needed and if Riak > doesn't support it, the client libraries that do "associations" are a good > candidate for the task. Also with link deletion - we'll need to either keep > track of link count per intermediary or run map-reduce jobs to clean-up once > in a while... > > In either case, link creation should be as simple as knowing which > intermediary object is the last one and whether we should add the next new > link through it or through a new intermediary (when a certain link count > _setting_ is reached). If this could be automated then it wouldn't matter > how many the links are. Otherwise Riak would have to be monitored and if > certain links begin to get "very many" then a model migration is run to make > the transition from few straight links to very many. If client libraries > could work with both kinds of links then this transition would mean tweaking > the model association (and link walking remaining the same). But when using > Riak's raw interface there would be a difference, which means a switch from > one-to-many to one-to-very-many will usually take some thinking / effort. > Any time I'm in doubt, it seems safer to side with the very-many (just in > case). What is the cost of an extra step of link-walking as compared to > changing application code? > > As another example, if one were to build GitHub with Riak, how would you > model the watching & following associations? Many users would use few, but > some would use many, which in a few cases get to be very many, which means > everybody will watch and follow in very-many style. If the app ui allows > it, one has to assume it will happen... Here is the following / watching > example for very many - different words for having interest in something: > > * separate very-many: > user -does--> user_following -what--> user > user -does--> user_watching -what--> repo (-of--> user) > * combined very-many: > user -has--> user_interest -for--> {whatever} > * combined very-many + metadata: > user -has--> user_interest -meta--> interest -for--> {whatever} > * and if metadata was different, perhaps: > user -has--> user_interest -meta--> interest_user -what--> user > user -has--> user_interest -meta--> interest_repo -what--> repo > > Whatever the pattern, it would be nice to have the best practices defined > and implemented for reuse (via association-proficient client libs - a la > Ripple). After all, Riak users like big data, which is another way of > saying very many items of stuff -- and why not very many hands / links too > :) > > Orlin > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com