Re: Storage of time-series data

2010-05-18 Thread Alexander Sicular
That is exactly correct. Most everything performance wise in riak when it comes to m/r, best as I can tell, revolves around total number of objects in a bucket. If your architecture can be constructed in such a way that your buckets will have tens of thousands of keys vs. hundreds of thousands o

Re: Storage of time-series data

2010-05-18 Thread Daniel Einspanjer
I do a lot of temporal aggregate statistics in the Mozilla Socorro project using HBase. The problem is made much easier there because you can have a rowkey that uses the timestamp as a prefix making it easy to do a range query, and then HBase also has an atomic increment function that can be

Re: Storage of time-series data

2010-05-18 Thread Sean Cribbs
Buckets are essentially free if you are not changing their properties from the defaults (which you can set globally in app.config). Keep in mind the options I presented are not the only ones, just points of departure for your own schema design. Sean Cribbs Developer Advocate Basho Technologie

Re: Storage of time-series data

2010-05-18 Thread Joel Pitt
Thanks Sean. Looks like 3 might be the best plan. And, pre/post-commit hooks... cool! I didn't see those - that's something I've been looking for (since I'd prefer to keep that kind of stuff happening on the data nodes rather than in the client/app itself). One further question, is there any limi

Re: Rebalancing (newbie alert)

2010-05-18 Thread Sean Cribbs
> > 1. I vaguely recall one of Riak authors replying to someone here that it is > not possible to know where particular bucket resides, i.e. at which vnode. > > If so, how can one say after one physical node crashed & burned (say, its > hard drive failed totally) or another physical node was ad

Re: loadbalancing

2010-05-18 Thread Ryan Tilder
Using any of the various VIP implementations but we don't recommend VRRP behaviour[1] for the VIP because you'll lose the benefit of spreading client query load to all nodes in a ring. For the plain HTTP client interface haproxy, squid, varnish, nginx, lighttpd, and even Apache can be used in a va

loadbalancing

2010-05-18 Thread Johnny Tan
I assume the best practice is to use a virtual IP that is loadbalanced to each member of a riak ring to read/write data? Since there is no state, I assume stickiness is not an issue. Are there any other potential gotchas? johnny ___ riak-users mailing

Rebalancing (newbie alert)

2010-05-18 Thread Marcin Krol
Hello everyone, I'm a total newbie with Riak and Erlang, reading the docs for now to see whether it could be evaluated positively for our purposes. I have a few questions, mostly related to rebalancing across vnodes in case of failure of one node or physical addition of new servers. 1. I va

Riak Recap for 5/17

2010-05-18 Thread Mark Phillips
Morning, Afternoon, Evening - Short recap for today - a convo about N and R values, a code snippet for testing, a fix to the Fast Track, and a heads up to anyone at google i/o. Enjoy - Mark Community Manager Basho Technologies wiki.basho.com twitter.com/pharkmillups - Riak Recap for 5/17

Re: Recovering datas when a node was joining again the cluster (with all node datas lost)

2010-05-18 Thread Justin Sheehy
Hello, Germain. You've already come across read-repair. Between that and hinted-handoff a great deal of passive anti-entropy is performed in a Riak cluster. As long as one doesn't use requests with R=N these mechanisms are generally sufficient. We do have plans for a more "active" anti-entropy

Re: Storage of time-series data

2010-05-18 Thread Sean Cribbs
Joel, Riak's only query mechanism aside from simple key retrieval is map-reduce. However, there are a number of strategies you could take, depending on what you want to query. I don't know the requirements of your application, but here are some options: 1) Store the data either keyed on the t

Re: Tuning Innostore backend (flush_mode option error)

2010-05-18 Thread Sean Cribbs
I'm sorry, that was a typo (that I seem to make frequently). It's flush_method, not flush_mode. I've corrected the wiki. Sean Cribbs Developer Advocate Basho Technologies, Inc. http://basho.com/ On May 18, 2010, at 5:29 AM, Germain Maurice wrote: > I red this page : > https://wiki.basho.com

Tuning Innostore backend (flush_mode option error)

2010-05-18 Thread Germain Maurice
I red this page : https://wiki.basho.com/display/RIAK/Innostore+Configuration+and+Tuning Then, I put this configuration to my node : {innostore, [ {data_home_dir, "/reiser/riak/innodb"}, %% Where data files go {log_group_home_dir, "/reiser/riak/innodb"}, %% Where log files go {log_files_in_group

Re: Recovering datas when a node was joining again the cluster (with all node datas lost)

2010-05-18 Thread Germain Maurice
Hi Dan, Thank you for this "trick", it's faster than GET operation on objects. HEAD requests on all docs will balance the replication for the node where we make the requests. However, i make only about 100 000 HEAD requests by an hour, seems to be normal for you ? The HEAD requests made the nod