Re: Storage of time-series data

2010-05-18 Thread Alexander Sicular
That is exactly correct. Most everything performance wise in riak when it comes to m/r, best as I can tell, revolves around total number of objects in a bucket. If your architecture can be constructed in such a way that your buckets will have tens of thousands of keys vs. hundreds of thousands o

Re: Storage of time-series data

2010-05-18 Thread Daniel Einspanjer
I do a lot of temporal aggregate statistics in the Mozilla Socorro project using HBase. The problem is made much easier there because you can have a rowkey that uses the timestamp as a prefix making it easy to do a range query, and then HBase also has an atomic increment function that can be

Re: Storage of time-series data

2010-05-18 Thread Sean Cribbs
Buckets are essentially free if you are not changing their properties from the defaults (which you can set globally in app.config). Keep in mind the options I presented are not the only ones, just points of departure for your own schema design. Sean Cribbs Developer Advocate Basho Technologie

Re: Storage of time-series data

2010-05-18 Thread Joel Pitt
Thanks Sean. Looks like 3 might be the best plan. And, pre/post-commit hooks... cool! I didn't see those - that's something I've been looking for (since I'd prefer to keep that kind of stuff happening on the data nodes rather than in the client/app itself). One further question, is there any limi

Re: Storage of time-series data

2010-05-18 Thread Sean Cribbs
Joel, Riak's only query mechanism aside from simple key retrieval is map-reduce. However, there are a number of strategies you could take, depending on what you want to query. I don't know the requirements of your application, but here are some options: 1) Store the data either keyed on the t