Mike,

Sounds like an interesting project. Here are some things to think about 
(corresponding to your bullet points):

1) What seems fairly natural and obvious for key choice is the timestamp, since 
so many of your operations are time-oriented.
2) For playback of historical data, consider using MapReduce to grab more than 
just single seconds of data... maybe 30 at a time.  You could even put some of 
your preprocessing into the map or reduce phase.  Make sure to generate the key 
list (since you'll know them) instead of trying to do a full-bucket query with 
filtering.
3) Beware of race-conditions and the possibility of not all clients seeing the 
data right away.  This can be somewhat alleviated by using DW=W=quorum when 
writing, but you're still talking about dogpiling a bunch of requests on the 
same key. An in-memory write-through cache of the "latest second" might be what 
you need here.
4) This is another case where you could use MapReduce to crunch the data. 60 
items is not very much, so I think you'll have good results here. The internal 
MapReduce cache will also reduce the pain of multiple computations on the same 
data.

All in all, I think Riak will be a good fit for your application, with the 
possible exception of the polling-every-second thing.  A couple of tips to make 
sure you have a good experience with Riak:

First, benchmark your usage pattern as best you can to make sure that Riak will 
meet your performance needs.  For example, I might create some basho_bench 
tests with appropriate key and value generators that have:

a) 1 write per second (the snapshot data)
b) X reads per second (where X is the number of expected clients)
c) 1-5 historical replays per second (via MapReduce)
d) X roll-up reports per minute (X = number of clients again)

I'd then run them concurrently, and in different combinations to simulate the 
load.

Second, make sure you start with at least 3 nodes (even in your local developer 
setup).  Because Riak is designed to be distributed, there are certain things 
that are sub-optimal when the number of nodes is less than the replication 
factor (N value, default 3).

Let us know if there's anything else we can help you with.

Sean Cribbs <s...@basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Feb 11, 2011, at 8:49 AM, Mike Stoddart wrote:

> Riak is very appealing for several reasons; scalability, durability,
> open-source, performance etc. I'm currently using PostgreSQL for all
> my storage needs, but I'm investigating nosql (can I use that name?)
> solutions for scalability and to experiment with map/reduce
> functionality for statistics and reporting.
> 
> I have a few requirements that nosql solutions might not be able to meet.
> 
> 1) Every second I take a snapshot of my data and store it in the
> database in one record. Each recorded snapshot includes the timestamp
> it was taken.
> 
> 2) I have a playback feature that lets me retrieve historical data.
> During playback, the browser requests a recorded snapshot every
> second:
> 
>   2011-01-01 08:00:00
>   2011-01-01 08:00:01
>   2011-01-01 08:00:02
>   2011-01-01 08:00:03 ...
> 
> Currently it takes less than 75ms for the server to retrieve the data
> from PostgreSQL and to return it to the browser. Some processing is
> done before the response is sent.
> 
> 3) Every second each client's browser requests the current data
> snapshot (i.e. not in playback mode). The same comment for timing and
> processing applies from 2).
> 
> 4) Every minute I retrieve statistics and a report for a specific type
> of data to present on the browser. Currently with PostgreSQL this
> takes about 2-3s for the web server to retrieve the data, process it
> and return it to the browser.
> 
> The only primary key I use is a serial integer, only because that's
> the default. I don't see anything in my data that would be useful as a
> key when using a key/value database. My data is a good fit for storing
> as a 'document' though.
> 
> I know there might not be enough information here but do you think
> Riak is a good fit?
> 
> Thanks
> Mike
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to