Re: one-to-very-many link associations

Eric Gaumer Thu, 22 Apr 2010 06:04:38 -0700

Don't fall into the trap of "one size fits all". Riak is an amazing product
that can solve a number of tough problems. I don't think this is one of
them. You need/want a triple store to (correctly) model this sort of
problem. You need flexible schemas, ontologies, and a graph based query
language.


Take a look at: http://www.bigdata.com/

<http://www.bigdata.com/>Regards,
-Eric


On Thu, Apr 22, 2010 at 4:23 AM, Orlin Bozhinov <o...@soundsapiens.com> wrote:

>  Riak Users,
>
> Thinking about a data modeling pattern that will allow one to not worry
> about how many links can be had with one-to-many (or many-to-many)
> scenarios.  This question has come up before in various places.  One answer
> I like is Sean's from this thread
> http://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: "... at the point
> where you have that many links it becomes necessary to consider other
> options, including intermediary objects or alternative ways of representing
> the relationship".  I wonder if an _intermediary way_ could be baked into
> Ripple (or your client library of choice).  This is for the cases when
> one-to-many can become one-to-very-many.
>
> To make it more interesting, let's say we want to add metadata to the
> relationship as described in the pre-last paragraph of
> http://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/.
> Here is what I have in mind: {from}->{from_association}->{association}->{to}
> -- the {curlied} are bucket / objects and -> are links.  For example if
> {from} = "user"; and {association} = "interest"; and {to} = {whatever} there
> is interest in - e.g. "event", "place", "story", another "user" or even
> self-interest :)  But I'm getting ahead of myself.  Let's use a recent
> example from Basho's blog where a "user" links {to} = "task".  So we get:
> user -has--> user_interest -meta--> interest -in--> task.
>
> The "interest" association could imply "ownership" but maybe the
> application allows its "users" to express interest another's "task".  Maybe
> it's a collaborative effort...  Reverse-linking from the many interests /
> tasks to their respective owners is easy because it's just a single link for
> task -of--> user or interest -of--> user.  In the interests bucket I want to
> put all kinds of useful metadata.  There I would embed (via Composition as
> Ripple calls it) not only all the "tags", but also "notes", "star", etc.
> Think delicious bookmarks or google reader items and so on.  It seems like a
> common pattern.  Something that may fit the use case of @botanicus too.  One
> could represent all possible links (various associations) between two
> objects as metadata contained in a given "interest".  Ownership can be a
> type of interest for the sake of link-walking.
>
> There are three things happening here:
> 1. the "very many" (links through intermediary objects)
> 2. optional metadata (yet another intermediary object) - multiple
> associations between any two objects can be expressed through extra metadata
> rather than extra links
> 3. reusing the "very-many" and / or metadata intermediaries -linking--> to
> objects in different buckets
>
> The real issue (that #1 solves) is not having an easy ability to do "very
> many" links originating from the same object.  The #2 metadata object vs a
> few extra links for tags / notes (which are insignificant compared to the
> many interests a user can have) - makes it easier (in my eyes) to put in
> Redis for filtering...  Of-course interests (#2) could be specialized
> (different metadata models) with regards to what they are about (#3).  On
> delicious that's just bookmarks.  I've got close to 6,000 of them.  Does
> that approach "very many" in terms of Riak?  If "very many" were easier to
> do (with client-library models or otherwise Riak itself) #2 & #3 would be
> indifferent about which intermediary leads to them (an extra link-walk step)
> as they are already possible anyway.  How could we step (automagically)
> through an intermediary object (the user_interest "very many" enabler
> bucket) - having a specific target object in mind?
>
> I think it may already be possible with current link-walking.  Then it's
> all a matter of managing the intermediary bucket / objects.  Not exactly
> sure how the max links are calculated.  According to one formula from the
> mailing list I may get 1000 headers (limit in mochiweb) * 200 links ("around
> 40 chars per link") = 200,000 links max?  That seems like "very many", but
> there was also something about performance burden...  If we took those 200
> with just a single header, pointing to 200 intermediary objects, each
> pointing to another 200 target objects we would get 40,000 links.  That's
> quite a few.  Of-course that number could easily get much much bigger
> (square the default limit).  What decides how many links per intermediary
> object is ideal?  Is it a setting that Basho could recommend a default for?
> Could Ripple automate that?  Some link creation logic is needed and if Riak
> doesn't support it, the client libraries that do "associations" are a good
> candidate for the task.  Also with link deletion - we'll need to either keep
> track of link count per intermediary or run map-reduce jobs to clean-up once
> in a while...
>
> In either case, link creation should be as simple as knowing which
> intermediary object is the last one and whether we should add the next new
> link through it or through a new intermediary (when a certain link count
> _setting_ is reached).  If this could be automated then it wouldn't matter
> how many the links are.  Otherwise Riak would have to be monitored and if
> certain links begin to get "very many" then a model migration is run to make
> the transition from few straight links to very many.  If client libraries
> could work with both kinds of links then this transition would mean tweaking
> the model association (and link walking remaining the same).  But when using
> Riak's raw interface there would be a difference, which means a switch from
> one-to-many to one-to-very-many will usually take some thinking / effort.
> Any time I'm in doubt, it seems safer to side with the very-many (just in
> case).  What is the cost of an extra step of link-walking as compared to
> changing application code?
>
> As another example, if one were to build GitHub with Riak, how would you
> model the watching & following associations?  Many users would use few, but
> some would use many, which in a few cases get to be very many, which means
> everybody will watch and follow in very-many style.  If the app ui allows
> it, one has to assume it will happen...  Here is the following / watching
> example for very many - different words for having interest in something:
>
> * separate very-many:
> user -does--> user_following -what--> user
> user -does--> user_watching -what--> repo (-of--> user)
> * combined very-many:
> user -has--> user_interest -for--> {whatever}
> * combined very-many + metadata:
> user -has--> user_interest -meta--> interest -for--> {whatever}
> * and if metadata was different, perhaps:
> user -has--> user_interest -meta--> interest_user -what--> user
> user -has--> user_interest -meta--> interest_repo -what--> repo
>
> Whatever the pattern, it would be nice to have the best practices defined
> and implemented for reuse (via association-proficient client libs - a la
> Ripple).  After all, Riak users like big data, which is another way of
> saying very many items of stuff -- and why not very many hands / links too
> :)
>
> Orlin
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: one-to-very-many link associations

Reply via email to