Orlin,

One thing that you imply is that you would always be using the Link header to 
represent links.  Another way to cope with large numbers of links is to 
encapsulate them in the object itself, rather than in the headers.  This 
removes the header-length/count limitation, but would require you to have a map 
function that understands the internals of the object.  Also, you would need to 
deal with the larger size of the object, which could potentially slow down your 
request.

In Ripple, I intend to support secondary link-storing objects through 
associations.  You would associate the secondary object by key (i.e. it would 
have the same key as the owner), and then that secondary object would have a 
link association to the targets.  Then the origin/owner object would have the 
transitive association via a simple method delegation.  As I told the Raleigh 
Ruby group on Tuesday, "I want to provide just enough tools to shoot yourself 
in the foot."  Having a convention is helpful, but won't substitute for 
thoughtful evaluation of your data model.

One of the "other options" that I didn't mention was a graph database.  If your 
model seems to beg lots and lots of links, you might be better off looking at 
something that fits the traversal model better, like Neo4J, AllegroGraph, etc.  
You could still use Riak for storing the primary objects, but keep your tightly 
interrelated stuff in the graph DB.  Remember, nosql is about choice and using 
the best tool for the job!

Sean Cribbs <s...@basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Apr 22, 2010, at 4:23 AM, Orlin Bozhinov wrote:

> Riak Users, 
> 
> Thinking about a data modeling pattern that will allow one to not worry about 
> how many links can be had with one-to-many (or many-to-many) scenarios.  This 
> question has come up before in various places.  One answer I like is Sean's 
> from this thread http://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: 
> "... at the point where you have that many links it becomes necessary to 
> consider other options, including intermediary objects or alternative ways of 
> representing the relationship".  I wonder if an _intermediary way_ could be 
> baked into Ripple (or your client library of choice).  This is for the cases 
> when one-to-many can become one-to-very-many.  
> 
> To make it more interesting, let's say we want to add metadata to the 
> relationship as described in the pre-last paragraph of 
> http://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/.  
> Here is what I have in mind: {from}->{from_association}->{association}->{to} 
> -- the {curlied} are bucket / objects and -> are links.  For example if 
> {from} = "user"; and {association} = "interest"; and {to} = {whatever} there 
> is interest in - e.g. "event", "place", "story", another "user" or even 
> self-interest :)  But I'm getting ahead of myself.  Let's use a recent 
> example from Basho's blog where a "user" links {to} = "task".  So we get: 
> user -has--> user_interest -meta--> interest -in--> task.  
> 
> The "interest" association could imply "ownership" but maybe the application 
> allows its "users" to express interest another's "task".  Maybe it's a 
> collaborative effort...  Reverse-linking from the many interests / tasks to 
> their respective owners is easy because it's just a single link for task 
> -of--> user or interest -of--> user.  In the interests bucket I want to put 
> all kinds of useful metadata.  There I would embed (via Composition as Ripple 
> calls it) not only all the "tags", but also "notes", "star", etc.  Think 
> delicious bookmarks or google reader items and so on.  It seems like a common 
> pattern.  Something that may fit the use case of @botanicus too.  One could 
> represent all possible links (various associations) between two objects as 
> metadata contained in a given "interest".  Ownership can be a type of 
> interest for the sake of link-walking.  
> 
> There are three things happening here:  
> 1. the "very many" (links through intermediary objects)
> 2. optional metadata (yet another intermediary object) - multiple 
> associations between any two objects can be expressed through extra metadata 
> rather than extra links
> 3. reusing the "very-many" and / or metadata intermediaries -linking--> to 
> objects in different buckets
> 
> The real issue (that #1 solves) is not having an easy ability to do "very 
> many" links originating from the same object.  The #2 metadata object vs a 
> few extra links for tags / notes (which are insignificant compared to the 
> many interests a user can have) - makes it easier (in my eyes) to put in 
> Redis for filtering...  Of-course interests (#2) could be specialized 
> (different metadata models) with regards to what they are about (#3).  On 
> delicious that's just bookmarks.  I've got close to 6,000 of them.  Does that 
> approach "very many" in terms of Riak?  If "very many" were easier to do 
> (with client-library models or otherwise Riak itself) #2 & #3 would be 
> indifferent about which intermediary leads to them (an extra link-walk step) 
> as they are already possible anyway.  How could we step (automagically) 
> through an intermediary object (the user_interest "very many" enabler bucket) 
> - having a specific target object in mind?  
> 
> I think it may already be possible with current link-walking.  Then it's all 
> a matter of managing the intermediary bucket / objects.  Not exactly sure how 
> the max links are calculated.  According to one formula from the mailing list 
> I may get 1000 headers (limit in mochiweb) * 200 links ("around 40 chars per 
> link") = 200,000 links max?  That seems like "very many", but there was also 
> something about performance burden...  If we took those 200 with just a 
> single header, pointing to 200 intermediary objects, each pointing to another 
> 200 target objects we would get 40,000 links.  That's quite a few.  Of-course 
> that number could easily get much much bigger (square the default limit).  
> What decides how many links per intermediary object is ideal?  Is it a 
> setting that Basho could recommend a default for?  Could Ripple automate 
> that?  Some link creation logic is needed and if Riak doesn't support it, the 
> client libraries that do "associations" are a good candidate for the task.  
> Also with link deletion - we'll need to either keep track of link count per 
> intermediary or run map-reduce jobs to clean-up once in a while...  
> 
> In either case, link creation should be as simple as knowing which 
> intermediary object is the last one and whether we should add the next new 
> link through it or through a new intermediary (when a certain link count 
> _setting_ is reached).  If this could be automated then it wouldn't matter 
> how many the links are.  Otherwise Riak would have to be monitored and if 
> certain links begin to get "very many" then a model migration is run to make 
> the transition from few straight links to very many.  If client libraries 
> could work with both kinds of links then this transition would mean tweaking 
> the model association (and link walking remaining the same).  But when using 
> Riak's raw interface there would be a difference, which means a switch from 
> one-to-many to one-to-very-many will usually take some thinking / effort.  
> Any time I'm in doubt, it seems safer to side with the very-many (just in 
> case).  What is the cost of an extra step of link-walking as compared to 
> changing application code?  
> 
> As another example, if one were to build GitHub with Riak, how would you 
> model the watching & following associations?  Many users would use few, but 
> some would use many, which in a few cases get to be very many, which means 
> everybody will watch and follow in very-many style.  If the app ui allows it, 
> one has to assume it will happen...  Here is the following / watching example 
> for very many - different words for having interest in something:
> 
> * separate very-many:
> user -does--> user_following -what--> user
> user -does--> user_watching -what--> repo (-of--> user)
> * combined very-many:
> user -has--> user_interest -for--> {whatever}
> * combined very-many + metadata:
> user -has--> user_interest -meta--> interest -for--> {whatever}
> * and if metadata was different, perhaps:
> user -has--> user_interest -meta--> interest_user -what--> user
> user -has--> user_interest -meta--> interest_repo -what--> repo
> 
> Whatever the pattern, it would be nice to have the best practices defined and 
> implemented for reuse (via association-proficient client libs - a la Ripple). 
>  After all, Riak users like big data, which is another way of saying very 
> many items of stuff -- and why not very many hands / links too :)
> 
> Orlin
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to