You can check roughly how well your approach will work with basho_bench. If you estimate roughly how big your pages will be, set up an appropriate benchmark and run it against the cluster or a staging setup so you can get an idea of what performance you should expect.
I don't think there's anything fundamentally wrong with your approach. In fact I'm working on a similar storage scheme and I'm fairly comfortable with it. You can find examples of real-world applications in http://docs.basho.com/riak/latest/cookbooks/use-cases/. The Yammer presentation, linked here, http://docs.basho.com/riak/latest/cookbooks/use-cases/user-events-timelines/ also has similar ideas, it's worth checking out. On 22 January 2013 14:56, Bach Le <[email protected]> wrote: > Hi, I'm currently using Riak for my project. It works well for single > documents, however I often need to present to users a stream of (loosely) > time ordered documents, Riak's keys are unordered by nature so there's no > straight forward way of traversing data. I came up with the following > approach: > > Make a bucket (i.e: "pages"), set allow_mult to true. Inside this bucket > store a number that points to the "current" page, this number is initialized > to 0, I call this a cursor. For every "page" of data, create an object in > the same bucket, e.g: first page is associated with the key page_0, second > page: page_1 etc... These page objects are sets modeled using statebox for > conflict resolution. > > When a document is inserted, read the cursor value. Since the cursor can > only be increasing, we resolve conflicts by choosing the largest value among > the siblings. Next, read the page that it points to (if cursor is 0, read > the key "page_0", if it is 1, read "page_1" etc). If the number of objects > inside this set exceeds the page size, increment the counter and create a > new page to insert the object into, otherwise, leave the counter be and > insert into this page. > > To retrieve data in reverse chronological order, read the cursor to find out > the current page and then read the last page (which is shown to users as the > first page). > > Currently, my document's ids are monotonically increasing using this: > https://github.com/boundary/flake so I can sort documents within a page. > > I do realize that a page size can exceed its limit however, I don't know how > badly it can be with respect to writing rate. All I need is some form of > bulk get and chunking without resorting to 2i which can cover the whole > cluster. > > So, is there any major problem with this approach? Thanks. > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
