Sean,
Sure, mapping is another way, but even with very few object-encapsulated
links it's slower than header link walking, correct? I just wanted to
treat them all the same. More importantly, get decent response times,
appropriate for an interactive ui. Perhaps with stored map functions
and the 0.9 addition of consecutively interleaving map and link phases
-- the two approaches have become interchangeable / complementary? The
question is how would map+link compare to link+link in terms of
performance?
Would these secondary key-correspondent link-storing objects use link
headers or link encapsulation? I'm guessing it's the latter - which
Ripple calls composition (embedding)... That would solve the "very
many" issue. Yes, I'm wary about the foot thing. Having to do map for
(some of the) links feels somewhat like that.
I started this thread for being cautious rather than expecting that many
links. Neo4j was something I looked at and didn't like because of their
licensing and because my links are predictable (i.e. don't go to unknown
depths) as far as traversal. Long before that, I had considered an
entirely semantic (triple store) solution - which would have been
all-links and all-standards. While that's mentally stimulating (or
inherently "good"), it's even heavier time investment than riak. Riak
does links, so I thought good chance to postpone the inevitable web of
linked data... Yet, I'm all for best-of-breed hybrids. That's why I
intend to use redis for some of the browsing / filtering via sets.
That's why I fancy your suggestion of riak + graphs (in another db)
too. Thanks!
What if all the "primary objects" (json) have 1:1 key-correspondent rdf
objects (e.g. rdf-json). The api or proxy would serve get requests
based on content-type. And all the links (whether stored in headers or
objects) will have their triple store doubles that interconnect the rdf
objects. The "link walking" would be done entirely with semantic web
standards. The map-reduce with riak, as expected.
This data / link duplication will probably happen with post-commit
hooks. Are header link changes part of the new post-commit feature?
Could the hook just post the changes somewhere? I'll have a message
queue with an http endpoint, so I can post-process the changes with
whatever language. Nothing against erlang, only that I have zero
experience with it and too many things on my learning plate.
Last but not least, ruby has recently emerged as a full-fledged semantic
web citizen. One of the very useful consequences is the growing
abundance of storage adapters
http://lists.w3.org/Archives/Public/public-rdf-ruby/2010Apr/0030.html --
would anyone be interested in writing an adapter for riak... I don't
have the time for such feat, but could gladly bleed a little on that
edge. Thus riak may become its own graph database as well. And if it
all ends up being stored in Riak, then Ripple gets yet another reason
for this other type of association - i.e. :graph :)
Too far out? It would make ruby more special. Yet that's kind of the
case anyway.
Orlin
Sean Cribbs wrote:
Orlin,
One thing that you imply is that you would always be using the Link
header to represent links. Another way to cope with large numbers of
links is to encapsulate them in the object itself, rather than in the
headers. This removes the header-length/count limitation, but would
require you to have a map function that understands the internals of
the object. Also, you would need to deal with the larger size of the
object, which could potentially slow down your request.
In Ripple, I intend to support secondary link-storing objects through
associations. You would associate the secondary object by key (i.e.
it would have the same key as the owner), and then that secondary
object would have a link association to the targets. Then the
origin/owner object would have the transitive association via a simple
method delegation. As I told the Raleigh Ruby group on Tuesday, "I
want to provide just enough tools to shoot yourself in the foot."
Having a convention is helpful, but won't substitute for thoughtful
evaluation of your data model.
One of the "other options" that I didn't mention was a graph database.
If your model seems to beg lots and lots of links, you might be
better off looking at something that fits the traversal model better,
like Neo4J, AllegroGraph, etc. You could still use Riak for storing
the primary objects, but keep your tightly interrelated stuff in the
graph DB. Remember, nosql is about choice and using the best tool for
the job!
Sean Cribbs <s...@basho.com <mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/
On Apr 22, 2010, at 4:23 AM, Orlin Bozhinov wrote:
Riak Users,
Thinking about a data modeling pattern that will allow one to not
worry about how many links can be had with one-to-many (or
many-to-many) scenarios. This question has come up before in various
places. One answer I like is Sean's from this thread
http://riak.markmail.org/thread/6e7ypt5ndjzjk7mr saying: "... at the
point where you have that many links it becomes necessary to consider
other options, including intermediary objects or alternative ways of
representing the relationship". I wonder if an _intermediary way_
could be baked into Ripple (or your client library of choice). This
is for the cases when one-to-many can become one-to-very-many.
To make it more interesting, let's say we want to add metadata to the
relationship as described in the pre-last paragraph of
http://blog.basho.com/2010/03/25/schema-design-in-riak---relationships/.
Here is what I have in mind:
{from}->{from_association}->{association}->{to} -- the {curlied} are
bucket / objects and -> are links. For example if {from} = "user";
and {association} = "interest"; and {to} = {whatever} there is
interest in - e.g. "event", "place", "story", another "user" or even
self-interest :) But I'm getting ahead of myself. Let's use a
recent example from Basho's blog where a "user" links {to} = "task".
So we get: user -has--> user_interest -meta--> interest -in--> task.
The "interest" association could imply "ownership" but maybe the
application allows its "users" to express interest another's "task".
Maybe it's a collaborative effort... Reverse-linking from the many
interests / tasks to their respective owners is easy because it's
just a single link for task -of--> user or interest -of--> user. In
the interests bucket I want to put all kinds of useful metadata.
There I would embed (via Composition as Ripple calls it) not only all
the "tags", but also "notes", "star", etc. Think delicious bookmarks
or google reader items and so on. It seems like a common pattern.
Something that may fit the use case of @botanicus too. One could
represent all possible links (various associations) between two
objects as metadata contained in a given "interest". Ownership can
be a type of interest for the sake of link-walking.
There are three things happening here:
1. the "very many" (links through intermediary objects)
2. optional metadata (yet another intermediary object) - multiple
associations between any two objects can be expressed through extra
metadata rather than extra links
3. reusing the "very-many" and / or metadata intermediaries
-linking--> to objects in different buckets
The real issue (that #1 solves) is not having an easy ability to do
"very many" links originating from the same object. The #2 metadata
object vs a few extra links for tags / notes (which are insignificant
compared to the many interests a user can have) - makes it easier (in
my eyes) to put in Redis for filtering... Of-course interests (#2)
could be specialized (different metadata models) with regards to what
they are about (#3). On delicious that's just bookmarks. I've got
close to 6,000 of them. Does that approach "very many" in terms of
Riak? If "very many" were easier to do (with client-library models
or otherwise Riak itself) #2 & #3 would be indifferent about which
intermediary leads to them (an extra link-walk step) as they are
already possible anyway. How could we step (automagically) through
an intermediary object (the user_interest "very many" enabler bucket)
- having a specific target object in mind?
I think it may already be possible with current link-walking. Then
it's all a matter of managing the intermediary bucket / objects. Not
exactly sure how the max links are calculated. According to one
formula from the mailing list I may get 1000 headers (limit in
mochiweb) * 200 links ("around 40 chars per link") = 200,000 links
max? That seems like "very many", but there was also something about
performance burden... If we took those 200 with just a single
header, pointing to 200 intermediary objects, each pointing to
another 200 target objects we would get 40,000 links. That's quite a
few. Of-course that number could easily get much much bigger (square
the default limit). What decides how many links per intermediary
object is ideal? Is it a setting that Basho could recommend a
default for? Could Ripple automate that? Some link creation logic
is needed and if Riak doesn't support it, the client libraries that
do "associations" are a good candidate for the task. Also with link
deletion - we'll need to either keep track of link count per
intermediary or run map-reduce jobs to clean-up once in a while...
In either case, link creation should be as simple as knowing which
intermediary object is the last one and whether we should add the
next new link through it or through a new intermediary (when a
certain link count _setting_ is reached). If this could be automated
then it wouldn't matter how many the links are. Otherwise Riak would
have to be monitored and if certain links begin to get "very many"
then a model migration is run to make the transition from few
straight links to very many. If client libraries could work with
both kinds of links then this transition would mean tweaking the
model association (and link walking remaining the same). But when
using Riak's raw interface there would be a difference, which means a
switch from one-to-many to one-to-very-many will usually take some
thinking / effort. Any time I'm in doubt, it seems safer to side
with the very-many (just in case). What is the cost of an extra step
of link-walking as compared to changing application code?
As another example, if one were to build GitHub with Riak, how would
you model the watching & following associations? Many users would
use few, but some would use many, which in a few cases get to be very
many, which means everybody will watch and follow in very-many
style. If the app ui allows it, one has to assume it will happen...
Here is the following / watching example for very many - different
words for having interest in something:
* separate very-many:
user -does--> user_following -what--> user
user -does--> user_watching -what--> repo (-of--> user)
* combined very-many:
user -has--> user_interest -for--> {whatever}
* combined very-many + metadata:
user -has--> user_interest -meta--> interest -for--> {whatever}
* and if metadata was different, perhaps:
user -has--> user_interest -meta--> interest_user -what--> user
user -has--> user_interest -meta--> interest_repo -what--> repo
Whatever the pattern, it would be nice to have the best practices
defined and implemented for reuse (via association-proficient client
libs - a la Ripple). After all, Riak users like big data, which is
another way of saying very many items of stuff -- and why not very
many hands / links too :)
Orlin
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com