Cool - some very useful information. Much appreciated. I need to digest this a bit before replying with more questions. I'm still trying to understand map/reduce so I'm struggling to get my head around how to apply it to my requirements. I'm obviously used to indexing. :)
Thanks Mike On Fri, Feb 11, 2011 at 9:23 AM, Sean Cribbs <s...@basho.com> wrote: > Mike, > > Sounds like an interesting project. Here are some things to think about > (corresponding to your bullet points): > > 1) What seems fairly natural and obvious for key choice is the timestamp, > since so many of your operations are time-oriented. > 2) For playback of historical data, consider using MapReduce to grab more > than just single seconds of data... maybe 30 at a time. You could even put > some of your preprocessing into the map or reduce phase. Make sure to > generate the key list (since you'll know them) instead of trying to do a > full-bucket query with filtering. > 3) Beware of race-conditions and the possibility of not all clients seeing > the data right away. This can be somewhat alleviated by using DW=W=quorum > when writing, but you're still talking about dogpiling a bunch of requests on > the same key. An in-memory write-through cache of the "latest second" might > be what you need here. > 4) This is another case where you could use MapReduce to crunch the data. 60 > items is not very much, so I think you'll have good results here. The > internal MapReduce cache will also reduce the pain of multiple computations > on the same data. > > All in all, I think Riak will be a good fit for your application, with the > possible exception of the polling-every-second thing. A couple of tips to > make sure you have a good experience with Riak: > > First, benchmark your usage pattern as best you can to make sure that Riak > will meet your performance needs. For example, I might create some > basho_bench tests with appropriate key and value generators that have: > > a) 1 write per second (the snapshot data) > b) X reads per second (where X is the number of expected clients) > c) 1-5 historical replays per second (via MapReduce) > d) X roll-up reports per minute (X = number of clients again) > > I'd then run them concurrently, and in different combinations to simulate the > load. > > Second, make sure you start with at least 3 nodes (even in your local > developer setup). Because Riak is designed to be distributed, there are > certain things that are sub-optimal when the number of nodes is less than the > replication factor (N value, default 3). > > Let us know if there's anything else we can help you with. > > Sean Cribbs <s...@basho.com> > Developer Advocate > Basho Technologies, Inc. > http://basho.com/ > > On Feb 11, 2011, at 8:49 AM, Mike Stoddart wrote: > >> Riak is very appealing for several reasons; scalability, durability, >> open-source, performance etc. I'm currently using PostgreSQL for all >> my storage needs, but I'm investigating nosql (can I use that name?) >> solutions for scalability and to experiment with map/reduce >> functionality for statistics and reporting. >> >> I have a few requirements that nosql solutions might not be able to meet. >> >> 1) Every second I take a snapshot of my data and store it in the >> database in one record. Each recorded snapshot includes the timestamp >> it was taken. >> >> 2) I have a playback feature that lets me retrieve historical data. >> During playback, the browser requests a recorded snapshot every >> second: >> >> 2011-01-01 08:00:00 >> 2011-01-01 08:00:01 >> 2011-01-01 08:00:02 >> 2011-01-01 08:00:03 ... >> >> Currently it takes less than 75ms for the server to retrieve the data >> from PostgreSQL and to return it to the browser. Some processing is >> done before the response is sent. >> >> 3) Every second each client's browser requests the current data >> snapshot (i.e. not in playback mode). The same comment for timing and >> processing applies from 2). >> >> 4) Every minute I retrieve statistics and a report for a specific type >> of data to present on the browser. Currently with PostgreSQL this >> takes about 2-3s for the web server to retrieve the data, process it >> and return it to the browser. >> >> The only primary key I use is a serial integer, only because that's >> the default. I don't see anything in my data that would be useful as a >> key when using a key/value database. My data is a good fit for storing >> as a 'document' though. >> >> I know there might not be enough information here but do you think >> Riak is a good fit? >> >> Thanks >> Mike >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com