Riak Users,
Thinking about a data modeling pattern that will allow one to not worry
about how many links can be had with one-to-many (or many-to-many)
scenarios. This question has come up before in various places. One
answer I like is Sean's from this thread
http://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: "... at the
point where you have that many links it becomes necessary to consider
other options, including intermediary objects or alternative ways of
representing the relationship". I wonder if an _intermediary way_ could
be baked into Ripple (or your client library of choice). This is for
the cases when one-to-many can become one-to-very-many.
To make it more interesting, let's say we want to add metadata to the
relationship as described in the pre-last paragraph of
http://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/. Here
is what I have in mind: {from}->{from_association}->{association}->{to}
-- the {curlied} are bucket / objects and -> are links. For example if
{from} = "user"; and {association} = "interest"; and {to} = {whatever}
there is interest in - e.g. "event", "place", "story", another "user" or
even self-interest :) But I'm getting ahead of myself. Let's use a
recent example from Basho's blog where a "user" links {to} = "task". So
we get: user -has--> user_interest -meta--> interest -in--> task.
The "interest" association could imply "ownership" but maybe the
application allows its "users" to express interest another's "task".
Maybe it's a collaborative effort... Reverse-linking from the many
interests / tasks to their respective owners is easy because it's just a
single link for task -of--> user or interest -of--> user. In the
interests bucket I want to put all kinds of useful metadata. There I
would embed (via Composition as Ripple calls it) not only all the
"tags", but also "notes", "star", etc. Think delicious bookmarks or
google reader items and so on. It seems like a common pattern.
Something that may fit the use case of @botanicus too. One could
represent all possible links (various associations) between two objects
as metadata contained in a given "interest". Ownership can be a type of
interest for the sake of link-walking.
There are three things happening here:
1. the "very many" (links through intermediary objects)
2. optional metadata (yet another intermediary object) - multiple
associations between any two objects can be expressed through extra
metadata rather than extra links
3. reusing the "very-many" and / or metadata intermediaries -linking-->
to objects in different buckets
The real issue (that #1 solves) is not having an easy ability to do
"very many" links originating from the same object. The #2 metadata
object vs a few extra links for tags / notes (which are insignificant
compared to the many interests a user can have) - makes it easier (in my
eyes) to put in Redis for filtering... Of-course interests (#2) could
be specialized (different metadata models) with regards to what they are
about (#3). On delicious that's just bookmarks. I've got close to
6,000 of them. Does that approach "very many" in terms of Riak? If
"very many" were easier to do (with client-library models or otherwise
Riak itself) #2 & #3 would be indifferent about which intermediary leads
to them (an extra link-walk step) as they are already possible anyway.
How could we step (automagically) through an intermediary object (the
user_interest "very many" enabler bucket) - having a specific target
object in mind?
I think it may already be possible with current link-walking. Then it's
all a matter of managing the intermediary bucket / objects. Not exactly
sure how the max links are calculated. According to one formula from
the mailing list I may get 1000 headers (limit in mochiweb) * 200 links
("around 40 chars per link") = 200,000 links max? That seems like "very
many", but there was also something about performance burden... If we
took those 200 with just a single header, pointing to 200 intermediary
objects, each pointing to another 200 target objects we would get 40,000
links. That's quite a few. Of-course that number could easily get much
much bigger (square the default limit). What decides how many links per
intermediary object is ideal? Is it a setting that Basho could
recommend a default for? Could Ripple automate that? Some link
creation logic is needed and if Riak doesn't support it, the client
libraries that do "associations" are a good candidate for the task.
Also with link deletion - we'll need to either keep track of link count
per intermediary or run map-reduce jobs to clean-up once in a while...
In either case, link creation should be as simple as knowing which
intermediary object is the last one and whether we should add the next
new link through it or through a new intermediary (when a certain link
count _setting_ is reached). If this could be automated then it
wouldn't matter how many the links are. Otherwise Riak would have to be
monitored and if certain links begin to get "very many" then a model
migration is run to make the transition from few straight links to very
many. If client libraries could work with both kinds of links then this
transition would mean tweaking the model association (and link walking
remaining the same). But when using Riak's raw interface there would be
a difference, which means a switch from one-to-many to one-to-very-many
will usually take some thinking / effort. Any time I'm in doubt, it
seems safer to side with the very-many (just in case). What is the cost
of an extra step of link-walking as compared to changing application code?
As another example, if one were to build GitHub with Riak, how would you
model the watching & following associations? Many users would use few,
but some would use many, which in a few cases get to be very many, which
means everybody will watch and follow in very-many style. If the app ui
allows it, one has to assume it will happen... Here is the following /
watching example for very many - different words for having interest in
something:
* separate very-many:
user -does--> user_following -what--> user
user -does--> user_watching -what--> repo (-of--> user)
* combined very-many:
user -has--> user_interest -for--> {whatever}
* combined very-many + metadata:
user -has--> user_interest -meta--> interest -for--> {whatever}
* and if metadata was different, perhaps:
user -has--> user_interest -meta--> interest_user -what--> user
user -has--> user_interest -meta--> interest_repo -what--> repo
Whatever the pattern, it would be nice to have the best practices
defined and implemented for reuse (via association-proficient client
libs - a la Ripple). After all, Riak users like big data, which is
another way of saying very many items of stuff -- and why not very many
hands / links too :)
Orlin
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com