Eric,

Thanks for the bigdata.com <http://bigdata.com> link. It's something I had missed spotting.

The Semantic Web doesn't fit it all either. I anticipate to much rather search riak (with its upcoming query language) than rely entirely on sparql. Though I've always had the option in mind. If you look at my previous reply you'll see I'm considering it. I wonder what you think about the combined idea...

If riak won't also become a real graph backend, then I'll likely go with an additional hosted database. It could be a relational db (the most readily available kind), mongodb (i.e. mongohq), talis, etc.

Best,

Orlin


Eric Gaumer wrote:
Don't fall into the trap of "one size fits all". Riak is an amazing product that can solve a number of tough problems. I don't think this is one of them. You need/want a triple store to (correctly) model this sort of problem. You need flexible schemas, ontologies, and a graph based query language.

Take a look at: http://www.bigdata.com/

Regards,
-Eric

On Thu, Apr 22, 2010 at 4:23 AM, Orlin Bozhinov <o...@soundsapiens.com <mailto:o...@soundsapiens.com>> wrote:

    Riak Users,

    Thinking about a data modeling pattern that will allow one to not
    worry about how many links can be had with one-to-many (or
    many-to-many) scenarios.  This question has come up before in
    various places.  One answer I like is Sean's from this thread
    http://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: "... at
    the point where you have that many links it becomes necessary to
    consider other options, including intermediary objects or
    alternative ways of representing the relationship".  I wonder if
    an _intermediary way_ could be baked into Ripple (or your client
    library of choice).  This is for the cases when one-to-many can
    become one-to-very-many.

    To make it more interesting, let's say we want to add metadata to
    the relationship as described in the pre-last paragraph of
http://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/. Here is what I have in mind:
    {from}->{from_association}->{association}->{to} -- the {curlied}
    are bucket / objects and -> are links.  For example if {from} =
    "user"; and {association} = "interest"; and {to} = {whatever}
    there is interest in - e.g. "event", "place", "story", another
"user" or even self-interest :) But I'm getting ahead of myself. Let's use a recent example from Basho's blog where a "user" links
    {to} = "task".  So we get: user -has--> user_interest -meta-->
    interest -in--> task.

    The "interest" association could imply "ownership" but maybe the
    application allows its "users" to express interest another's
    "task".  Maybe it's a collaborative effort...  Reverse-linking
    from the many interests / tasks to their respective owners is easy
    because it's just a single link for task -of--> user or interest
    -of--> user.  In the interests bucket I want to put all kinds of
    useful metadata.  There I would embed (via Composition as Ripple
calls it) not only all the "tags", but also "notes", "star", etc. Think delicious bookmarks or google reader items and so on. It
    seems like a common pattern.  Something that may fit the use case
    of @botanicus too.  One could represent all possible links
    (various associations) between two objects as metadata contained
    in a given "interest".  Ownership can be a type of interest for
    the sake of link-walking.

    There are three things happening here:
    1. the "very many" (links through intermediary objects)
    2. optional metadata (yet another intermediary object) - multiple
    associations between any two objects can be expressed through
    extra metadata rather than extra links
    3. reusing the "very-many" and / or metadata intermediaries
    -linking--> to objects in different buckets

    The real issue (that #1 solves) is not having an easy ability to
    do "very many" links originating from the same object.  The #2
    metadata object vs a few extra links for tags / notes (which are
    insignificant compared to the many interests a user can have) -
makes it easier (in my eyes) to put in Redis for filtering... Of-course interests (#2) could be specialized (different metadata
    models) with regards to what they are about (#3).  On delicious
    that's just bookmarks.  I've got close to 6,000 of them.  Does
    that approach "very many" in terms of Riak?  If "very many" were
    easier to do (with client-library models or otherwise Riak itself)
    #2 & #3 would be indifferent about which intermediary leads to
    them (an extra link-walk step) as they are already possible
    anyway.  How could we step (automagically) through an intermediary
    object (the user_interest "very many" enabler bucket) - having a
    specific target object in mind?

I think it may already be possible with current link-walking. Then it's all a matter of managing the intermediary bucket / objects. Not exactly sure how the max links are calculated. According to one formula from the mailing list I may get 1000
    headers (limit in mochiweb) * 200 links ("around 40 chars per
    link") = 200,000 links max?  That seems like "very many", but
    there was also something about performance burden...  If we took
    those 200 with just a single header, pointing to 200 intermediary
    objects, each pointing to another 200 target objects we would get
    40,000 links.  That's quite a few.  Of-course that number could
    easily get much much bigger (square the default limit).  What
    decides how many links per intermediary object is ideal?  Is it a
    setting that Basho could recommend a default for?  Could Ripple
    automate that?  Some link creation logic is needed and if Riak
    doesn't support it, the client libraries that do "associations"
    are a good candidate for the task.  Also with link deletion -
    we'll need to either keep track of link count per intermediary or
    run map-reduce jobs to clean-up once in a while...

    In either case, link creation should be as simple as knowing which
    intermediary object is the last one and whether we should add the
    next new link through it or through a new intermediary (when a
    certain link count _setting_ is reached).  If this could be
automated then it wouldn't matter how many the links are. Otherwise Riak would have to be monitored and if certain links
    begin to get "very many" then a model migration is run to make the
    transition from few straight links to very many.  If client
    libraries could work with both kinds of links then this transition
    would mean tweaking the model association (and link walking
    remaining the same).  But when using Riak's raw interface there
    would be a difference, which means a switch from one-to-many to
    one-to-very-many will usually take some thinking / effort.  Any
    time I'm in doubt, it seems safer to side with the very-many (just
    in case).  What is the cost of an extra step of link-walking as
    compared to changing application code?

    As another example, if one were to build GitHub with Riak, how
    would you model the watching & following associations?  Many users
    would use few, but some would use many, which in a few cases get
    to be very many, which means everybody will watch and follow in
    very-many style.  If the app ui allows it, one has to assume it
    will happen...  Here is the following / watching example for very
    many - different words for having interest in something:

    * separate very-many:
    user -does--> user_following -what--> user
    user -does--> user_watching -what--> repo (-of--> user)
    * combined very-many:
    user -has--> user_interest -for--> {whatever}
    * combined very-many + metadata:
    user -has--> user_interest -meta--> interest -for--> {whatever}
    * and if metadata was different, perhaps:
    user -has--> user_interest -meta--> interest_user -what--> user
    user -has--> user_interest -meta--> interest_repo -what--> repo

    Whatever the pattern, it would be nice to have the best practices
    defined and implemented for reuse (via association-proficient
    client libs - a la Ripple).  After all, Riak users like big data,
    which is another way of saying very many items of stuff -- and why
    not very many hands / links too :)

    Orlin


    _______________________________________________
    riak-users mailing list
    riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


------------------------------------------------------------------------

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to