Cool - some very useful information. Much appreciated. I need to
digest this a bit before replying with more questions. I'm still
trying to understand map/reduce so I'm struggling to get my head
around how to apply it to my requirements. I'm obviously used to
indexing. :)

Thanks
Mike

On Fri, Feb 11, 2011 at 9:23 AM, Sean Cribbs <s...@basho.com> wrote:
> Mike,
>
> Sounds like an interesting project. Here are some things to think about 
> (corresponding to your bullet points):
>
> 1) What seems fairly natural and obvious for key choice is the timestamp, 
> since so many of your operations are time-oriented.
> 2) For playback of historical data, consider using MapReduce to grab more 
> than just single seconds of data... maybe 30 at a time.  You could even put 
> some of your preprocessing into the map or reduce phase.  Make sure to 
> generate the key list (since you'll know them) instead of trying to do a 
> full-bucket query with filtering.
> 3) Beware of race-conditions and the possibility of not all clients seeing 
> the data right away.  This can be somewhat alleviated by using DW=W=quorum 
> when writing, but you're still talking about dogpiling a bunch of requests on 
> the same key. An in-memory write-through cache of the "latest second" might 
> be what you need here.
> 4) This is another case where you could use MapReduce to crunch the data. 60 
> items is not very much, so I think you'll have good results here. The 
> internal MapReduce cache will also reduce the pain of multiple computations 
> on the same data.
>
> All in all, I think Riak will be a good fit for your application, with the 
> possible exception of the polling-every-second thing.  A couple of tips to 
> make sure you have a good experience with Riak:
>
> First, benchmark your usage pattern as best you can to make sure that Riak 
> will meet your performance needs.  For example, I might create some 
> basho_bench tests with appropriate key and value generators that have:
>
> a) 1 write per second (the snapshot data)
> b) X reads per second (where X is the number of expected clients)
> c) 1-5 historical replays per second (via MapReduce)
> d) X roll-up reports per minute (X = number of clients again)
>
> I'd then run them concurrently, and in different combinations to simulate the 
> load.
>
> Second, make sure you start with at least 3 nodes (even in your local 
> developer setup).  Because Riak is designed to be distributed, there are 
> certain things that are sub-optimal when the number of nodes is less than the 
> replication factor (N value, default 3).
>
> Let us know if there's anything else we can help you with.
>
> Sean Cribbs <s...@basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Feb 11, 2011, at 8:49 AM, Mike Stoddart wrote:
>
>> Riak is very appealing for several reasons; scalability, durability,
>> open-source, performance etc. I'm currently using PostgreSQL for all
>> my storage needs, but I'm investigating nosql (can I use that name?)
>> solutions for scalability and to experiment with map/reduce
>> functionality for statistics and reporting.
>>
>> I have a few requirements that nosql solutions might not be able to meet.
>>
>> 1) Every second I take a snapshot of my data and store it in the
>> database in one record. Each recorded snapshot includes the timestamp
>> it was taken.
>>
>> 2) I have a playback feature that lets me retrieve historical data.
>> During playback, the browser requests a recorded snapshot every
>> second:
>>
>>   2011-01-01 08:00:00
>>   2011-01-01 08:00:01
>>   2011-01-01 08:00:02
>>   2011-01-01 08:00:03 ...
>>
>> Currently it takes less than 75ms for the server to retrieve the data
>> from PostgreSQL and to return it to the browser. Some processing is
>> done before the response is sent.
>>
>> 3) Every second each client's browser requests the current data
>> snapshot (i.e. not in playback mode). The same comment for timing and
>> processing applies from 2).
>>
>> 4) Every minute I retrieve statistics and a report for a specific type
>> of data to present on the browser. Currently with PostgreSQL this
>> takes about 2-3s for the web server to retrieve the data, process it
>> and return it to the browser.
>>
>> The only primary key I use is a serial integer, only because that's
>> the default. I don't see anything in my data that would be useful as a
>> key when using a key/value database. My data is a good fit for storing
>> as a 'document' though.
>>
>> I know there might not be enough information here but do you think
>> Riak is a good fit?
>>
>> Thanks
>> Mike
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to