Re: one-to-very-many link associations

Orlin Bozhinov Thu, 22 Apr 2010 13:35:40 -0700

Sean,

Sure, mapping is another way, but even with very few object-encapsulatedlinks it's slower than header link walking, correct? I just wanted totreat them all the same. More importantly, get decent response times,appropriate for an interactive ui. Perhaps with stored map functionsand the 0.9 addition of consecutively interleaving map and link phases-- the two approaches have become interchangeable / complementary? Thequestion is how would map+link compare to link+link in terms ofperformance?

Would these secondary key-correspondent link-storing objects use linkheaders or link encapsulation? I'm guessing it's the latter - whichRipple calls composition (embedding)... That would solve the "verymany" issue. Yes, I'm wary about the foot thing. Having to do map for(some of the) links feels somewhat like that.

I started this thread for being cautious rather than expecting that manylinks. Neo4j was something I looked at and didn't like because of theirlicensing and because my links are predictable (i.e. don't go to unknowndepths) as far as traversal. Long before that, I had considered anentirely semantic (triple store) solution - which would have beenall-links and all-standards. While that's mentally stimulating (orinherently "good"), it's even heavier time investment than riak. Riakdoes links, so I thought good chance to postpone the inevitable web oflinked data... Yet, I'm all for best-of-breed hybrids. That's why Iintend to use redis for some of the browsing / filtering via sets.That's why I fancy your suggestion of riak + graphs (in another db)too. Thanks!

What if all the "primary objects" (json) have 1:1 key-correspondent rdfobjects (e.g. rdf-json). The api or proxy would serve get requestsbased on content-type. And all the links (whether stored in headers orobjects) will have their triple store doubles that interconnect the rdfobjects. The "link walking" would be done entirely with semantic webstandards. The map-reduce with riak, as expected.

This data / link duplication will probably happen with post-commithooks. Are header link changes part of the new post-commit feature?Could the hook just post the changes somewhere? I'll have a messagequeue with an http endpoint, so I can post-process the changes withwhatever language. Nothing against erlang, only that I have zeroexperience with it and too many things on my learning plate.

Last but not least, ruby has recently emerged as a full-fledged semanticweb citizen. One of the very useful consequences is the growingabundance of storage adaptershttp://lists.w3.org/Archives/Public/public-rdf-ruby/2010Apr/0030.html --would anyone be interested in writing an adapter for riak... I don'thave the time for such feat, but could gladly bleed a little on thatedge. Thus riak may become its own graph database as well. And if itall ends up being stored in Riak, then Ripple gets yet another reasonfor this other type of association - i.e. :graph :)

Too far out? It would make ruby more special. Yet that's kind of thecase anyway.


Orlin


Sean Cribbs wrote:

Orlin,
One thing that you imply is that you would always be using the Linkheader to represent links. Another way to cope with large numbers oflinks is to encapsulate them in the object itself, rather than in theheaders. This removes the header-length/count limitation, but wouldrequire you to have a map function that understands the internals ofthe object. Also, you would need to deal with the larger size of theobject, which could potentially slow down your request.
In Ripple, I intend to support secondary link-storing objects throughassociations. You would associate the secondary object by key (i.e.it would have the same key as the owner), and then that secondaryobject would have a link association to the targets. Then theorigin/owner object would have the transitive association via a simplemethod delegation. As I told the Raleigh Ruby group on Tuesday, "Iwant to provide just enough tools to shoot yourself in the foot."Having a convention is helpful, but won't substitute for thoughtfulevaluation of your data model.
One of the "other options" that I didn't mention was a graph database.If your model seems to beg lots and lots of links, you might bebetter off looking at something that fits the traversal model better,like Neo4J, AllegroGraph, etc. You could still use Riak for storingthe primary objects, but keep your tightly interrelated stuff in thegraph DB. Remember, nosql is about choice and using the best tool forthe job!
Sean Cribbs <s...@basho.com <mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Apr 22, 2010, at 4:23 AM, Orlin Bozhinov wrote:
Riak Users,
Thinking about a data modeling pattern that will allow one to notworry about how many links can be had with one-to-many (ormany-to-many) scenarios. This question has come up before in variousplaces. One answer I like is Sean's from this threadhttp://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: "... at thepoint where you have that many links it becomes necessary to considerother options, including intermediary objects or alternative ways ofrepresenting the relationship". I wonder if an _intermediary way_could be baked into Ripple (or your client library of choice). Thisis for the cases when one-to-many can become one-to-very-many.
To make it more interesting, let's say we want to add metadata to therelationship as described in the pre-last paragraph ofhttp://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/.Here is what I have in mind:{from}->{from_association}->{association}->{to} -- the {curlied} arebucket / objects and -> are links. For example if {from} = "user";and {association} = "interest"; and {to} = {whatever} there isinterest in - e.g. "event", "place", "story", another "user" or evenself-interest :) But I'm getting ahead of myself. Let's use arecent example from Basho's blog where a "user" links {to} = "task".So we get: user -has--> user_interest -meta--> interest -in--> task.
The "interest" association could imply "ownership" but maybe theapplication allows its "users" to express interest another's "task".Maybe it's a collaborative effort... Reverse-linking from the manyinterests / tasks to their respective owners is easy because it'sjust a single link for task -of--> user or interest -of--> user. Inthe interests bucket I want to put all kinds of useful metadata.There I would embed (via Composition as Ripple calls it) not only allthe "tags", but also "notes", "star", etc. Think delicious bookmarksor google reader items and so on. It seems like a common pattern.Something that may fit the use case of @botanicus too. One couldrepresent all possible links (various associations) between twoobjects as metadata contained in a given "interest". Ownership canbe a type of interest for the sake of link-walking.
There are three things happening here:
1. the "very many" (links through intermediary objects)
2. optional metadata (yet another intermediary object) - multipleassociations between any two objects can be expressed through extrametadata rather than extra links3. reusing the "very-many" and / or metadata intermediaries-linking--> to objects in different buckets
The real issue (that #1 solves) is not having an easy ability to do"very many" links originating from the same object. The #2 metadataobject vs a few extra links for tags / notes (which are insignificantcompared to the many interests a user can have) - makes it easier (inmy eyes) to put in Redis for filtering... Of-course interests (#2)could be specialized (different metadata models) with regards to whatthey are about (#3). On delicious that's just bookmarks. I've gotclose to 6,000 of them. Does that approach "very many" in terms ofRiak? If "very many" were easier to do (with client-library modelsor otherwise Riak itself) #2 & #3 would be indifferent about whichintermediary leads to them (an extra link-walk step) as they arealready possible anyway. How could we step (automagically) throughan intermediary object (the user_interest "very many" enabler bucket)- having a specific target object in mind?
I think it may already be possible with current link-walking. Thenit's all a matter of managing the intermediary bucket / objects. Notexactly sure how the max links are calculated. According to oneformula from the mailing list I may get 1000 headers (limit inmochiweb) * 200 links ("around 40 chars per link") = 200,000 linksmax? That seems like "very many", but there was also something aboutperformance burden... If we took those 200 with just a singleheader, pointing to 200 intermediary objects, each pointing toanother 200 target objects we would get 40,000 links. That's quite afew. Of-course that number could easily get much much bigger (squarethe default limit). What decides how many links per intermediaryobject is ideal? Is it a setting that Basho could recommend adefault for? Could Ripple automate that? Some link creation logicis needed and if Riak doesn't support it, the client libraries thatdo "associations" are a good candidate for the task. Also with linkdeletion - we'll need to either keep track of link count perintermediary or run map-reduce jobs to clean-up once in a while...
In either case, link creation should be as simple as knowing whichintermediary object is the last one and whether we should add thenext new link through it or through a new intermediary (when acertain link count _setting_ is reached). If this could be automatedthen it wouldn't matter how many the links are. Otherwise Riak wouldhave to be monitored and if certain links begin to get "very many"then a model migration is run to make the transition from fewstraight links to very many. If client libraries could work withboth kinds of links then this transition would mean tweaking themodel association (and link walking remaining the same). But whenusing Riak's raw interface there would be a difference, which means aswitch from one-to-many to one-to-very-many will usually take somethinking / effort. Any time I'm in doubt, it seems safer to sidewith the very-many (just in case). What is the cost of an extra stepof link-walking as compared to changing application code?
As another example, if one were to build GitHub with Riak, how wouldyou model the watching & following associations? Many users woulduse few, but some would use many, which in a few cases get to be verymany, which means everybody will watch and follow in very-manystyle. If the app ui allows it, one has to assume it will happen...Here is the following / watching example for very many - differentwords for having interest in something:
* separate very-many:
user -does--> user_following -what--> user
user -does--> user_watching -what--> repo (-of--> user)
* combined very-many:
user -has--> user_interest -for--> {whatever}
* combined very-many + metadata:
user -has--> user_interest -meta--> interest -for--> {whatever}
* and if metadata was different, perhaps:
user -has--> user_interest -meta--> interest_user -what--> user
user -has--> user_interest -meta--> interest_repo -what--> repo
Whatever the pattern, it would be nice to have the best practicesdefined and implemented for reuse (via association-proficient clientlibs - a la Ripple). After all, Riak users like big data, which isanother way of saying very many items of stuff -- and why not verymany hands / links too :)
Orlin

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: one-to-very-many link associations

Reply via email to