Replication behavior
Hi all, I just added another node to my cluster (moving from 1 to 2) and was surprised by some behavior. As I understand replication, with an n_val of 3 (default), all nodes will have all data until I add my 4th node. First, is this right? Second, I watched my disk usage on node 1 go from 1GB to 600MB after adding the second node. If I still have a full copy of all data on node 1, what was in that 400MB? Thanks, Jimmy ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak Search release date
Hi, It has been while Basho announced about Riak Search (I guess it is almost an year back, if I remember correctly), do you have any time frame of release? Or will it be available only to the Enterprise DS customer? I am one of the persons looking for Riak Search after moving from Couchdb to Riak. Is it possible to release at least minimalistic API so that other can build something on top of that rather than reinventing from scratch? I asked this question privately to Basho team but did not get any useful response so far. Currently Riak is suffering full bucket scan through mapred as searching particular documents in a bucket is not possible otherwise implementing another indexing mechnism on top of it. In my case mapred_bucket is killing the performance (noting that same API is not recommended by Basho). This is one of the frustrating stumbling block when coming from Couchdb. Riak Search, as per presentation( http://www.slideshare.net/jmuellerleile/riak-search-presentation-erlang-factory-2010-sf), if available, is going to pull most of the customers looking for best in class KV solution. Thanks, Senthil ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search release date
On Thu, May 13, 2010 at 11:02 AM, Senthilkumar Peelikkampatti < senthilkumar.peelikkampa...@gmail.com> wrote: > Hi, > It has been while Basho announced about Riak Search (I guess it is > almost an year back, if I remember correctly), do you have any time frame of > release? Or will it be available only to the Enterprise DS customer? > I've been holding back on the same question. I've got a Fortune 100 client that I'm currently working with to prototype a system for doing distributed transaction analysis over several billion transactions. We're obviously leveraging the Map/Reduce framework in Riak. At times, our goal is to run stats across the whole data set but we'd also like to run against subsets that match some search criteria. Using filters in the map/reduce job is a solution but requires visiting every document. Riak Search may be able to help. We've got a generous amount of funding due to the complexity and urgency of the problem. Since we're prototyping, we can live with bugs. Is the beta trial closed? If we can put a solution together using Riak then my client will be looking at the enterprise support and monitoring. Regards, -Eric ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Replication behavior
Hi, Jimmy. With an n_val of 3, there will be 3 copies of each data item in the cluster even when there are less than 3 hosts. With 2 nodes in that situation, each node will have either 1 or 2 copies of each item. Does that help with your understanding? -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Replication behavior
Ah, yes. So with 1 node, I had a guaranteed 3 copies of the data, and when I added another node, some of those got reduced to 1 or 2 on that node. I understand now, thanks. Jimmy On Thu, May 13, 2010 at 11:34 AM, Justin Sheehy wrote: > Hi, Jimmy. > > With an n_val of 3, there will be 3 copies of each data item in the > cluster even when there are less than 3 hosts. With 2 nodes in that > situation, each node will have either 1 or 2 copies of each item. > > Does that help with your understanding? > > -Justin > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: FastTrack Slowing Point
For future reference, the problem was the GNU Coreutils version. My version was relatively old (6.5) and did not have -n as an option. I updated GNU Coreutils to 8.5 via this source: http://ftp.gnu.org/gnu/coreutils/ I no longer get the invalid option error. But I now get the following error when entering make devrel : cp -Rn dev/riak dev/dev1 cp: cannot stat 'dev/riak' : No such file or directory make: *** [dev1] Error 1 On Tue, May 11, 2010 at 7:39 PM, Ted Karmel wrote: > OS = Ubuntu Hardy > > On Tue, May 11, 2010 at 7:25 PM, Grant Schofield wrote: >> What OS are you running through the FastTrack tutorial on? >> >> Thanks, >> Grant Schofield >> Developer Advocate >> Basho Technologies >> >> >> On May 11, 2010, at 12:22 PM, Ted Karmel wrote: >> >>> I am following the Riak FastTrack tutorial: >>> >>> https://wiki.basho.com/display/RIAK/Building+a+Development+Environment >>> >>> But I am stumbling on one step: >>> make devrel >>> >>> For which I get the following error message: >>> >>> cp -Rn dev/riak dev/dev1 >>> cp: invalid option -- n >>> Try 'cp --help' for more information. >>> make: *** [dev1] Error 1 >>> >>> >>> Any suggestions much appreciated. >>> >>> Thanks. >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: FastTrack Slowing Point
Running "make dev" first will create the "dev/riak" directory. The "make devrel" command will copy the generated "dev/riak" directory into three additional release directories "dev/dev1", "dev/dev2", "dev/dev3". The name, web_port, pb_port, and handoff_port settings are then updated in the three releases so each release has a unique value. It's important for each release to have unique values for these settings so that they can run at the same time. On Thu, May 13, 2010 at 10:02 AM, Ted Karmel wrote: > For future reference, the problem was the GNU Coreutils version. My > version was relatively old (6.5) and did not have -n as an option. I > updated GNU Coreutils to 8.5 via this source: > http://ftp.gnu.org/gnu/coreutils/ > > I no longer get the invalid option error. But I now get the following > error when entering make devrel : > > cp -Rn dev/riak dev/dev1 > cp: cannot stat 'dev/riak' : No such file or directory > make: *** [dev1] Error 1 > > > On Tue, May 11, 2010 at 7:39 PM, Ted Karmel wrote: > > OS = Ubuntu Hardy > > > > On Tue, May 11, 2010 at 7:25 PM, Grant Schofield > wrote: > >> What OS are you running through the FastTrack tutorial on? > >> > >> Thanks, > >> Grant Schofield > >> Developer Advocate > >> Basho Technologies > >> > >> > >> On May 11, 2010, at 12:22 PM, Ted Karmel wrote: > >> > >>> I am following the Riak FastTrack tutorial: > >>> > >>> https://wiki.basho.com/display/RIAK/Building+a+Development+Environment > >>> > >>> But I am stumbling on one step: > >>> > make devrel > >>> > >>> For which I get the following error message: > >>> > >>> cp -Rn dev/riak dev/dev1 > >>> cp: invalid option -- n > >>> Try 'cp --help' for more information. > >>> make: *** [dev1] Error 1 > >>> > >>> > >>> Any suggestions much appreciated. > >>> > >>> Thanks. > >>> > >>> ___ > >>> riak-users mailing list > >>> riak-users@lists.basho.com > >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >> > >> > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak Search release date
I'm keen on hearing more about this as well. I'm at the National Snow and Ice Data Center and we're looking at an Enterprise Database that will need to support terabytes (and getting close to a petabyte) of research data. Thinking of storing all this in a RDBMSwait...just shoot me! I'm trying to pitch Riak as one of the prototype data stores but search is an issue. Initially we're looking @ store indexes external in Solr. Regards, Dan ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search release date
Senthilkumar, Eric, others - Thanks for your detailed and thoughtful questions about Riak Search. We are well aware that many of you are building and prototyping applications on Riak and that you are expecting to use Riak Search heavily once it is available. This is great. MapReduce is a great feature of Riak but it does not solve exactly the same problems that Riak Search intends to. Our beta program is very limited and is closed to new entrants at this time in order to allow us to focus on delivering a product that we are happy to support. We want to do everything we can to accommodate the community's need for and curiosity about Search. You can expect a blog post from the Search team next week, and further communication from us as progress continues. Thanks for your patience. Mark Mark Phillips Community Manager Basho Technologies wiki.basho.com twitter.com/pharkmillups ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: install on freebsd and dragonflybsd
Hi Ryan, I have the same issue under FreeBSD. And your Makefile.test gives this result: CURDIR = "" If change $(CURDIR) to $(.CURDIR) I get CURDIR = "/usr/home/vitalka" I had a little progress with building Riak, but stopped on building erlang_js, can't find a way to build it. FreeBSD 8.0 gmake 3.81 rurlbc> Can you mail the list the output of the attached test Makefile? If the rurlbc> builtin gmake CURDIR variable isn't being set, there isn't much we can do rurlbc> for you, I'm afraid. It means that your gmake build is woefully broken or rurlbc> something, somewhere is managing to set it to an empty string. rurlbc> --Ryan -- Best regards, Vitalymailto:vita...@vv.net.ua ICQ: #12963384 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
CAP controls
I am thinking about how to possibly replace an existing system that has heavy I/O load, low CPU usage, with riak. Its a file storage system, with smallish files, a few K normally, but billions of them. The architecture, I think, would be one riak node per disk on the hardware, and probably run about 16 riak nodes per physical machine. Say I had 4 of these machines, which would be 64 riak nodes. With something like this, if I set W=3 as a CAP tuning, I would want to make sure that at least 2 of those writes where on 2 physically different machines, so in case I had a hardware failure, and it took out a physical machine, I could still operate with the other 3 machines. Is something like this possible with riak? enjoy, -jeremy -- Jeremy Hinegardner jer...@hinegardner.org ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: CAP controls
Hi, Jeremy. It sounds like an interesting project. At this time, there is no way to indicate in Riak that two nodes are actually on the same host (and therefore should not overlap in replica sets). It could certainly be done, but to do so today would require modification to the ring partition claim logic. Best, -Justin On Thu, May 13, 2010 at 4:57 PM, Jeremy Hinegardner wrote: > I am thinking about how to possibly replace an existing system that has heavy > I/O load, low CPU usage, with riak. Its a file storage system, with smallish > files, a few K normally, but billions of them. > > The architecture, I think, would be one riak node per disk on the hardware, > and probably run about 16 riak nodes per physical machine. Say I had > 4 of these machines, which would be 64 riak nodes. > > With something like this, if I set W=3 as a CAP tuning, I would want to make > sure that at least 2 of those writes where on 2 physically different machines, > so in case I had a hardware failure, and it took out a physical machine, I > could > still operate with the other 3 machines. > > Is something like this possible with riak? > > enjoy, > > -jeremy > > -- > > Jeremy Hinegardner jer...@hinegardner.org > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: CAP controls
Hi Jeremy, If it were me I would start off with the simplest set up possible - one riak node per physical machine. It'll be easier to see what is going on and reason about performance if there are less moving parts. For the storage configuration - if you are using innostore then keep one for disk for logging (or use the drive the o/s is on if it is doing little else) and create a RAID0 disk out of the remaining disks. Similar set up for bitcask except you don't need to worry about the log data. Try that as a baseline and benchmark it. If it isn't up to the task then you can explore more complex options. Jon. On 5/13/10 2:57 PM, Jeremy Hinegardner wrote: I am thinking about how to possibly replace an existing system that has heavy I/O load, low CPU usage, with riak. Its a file storage system, with smallish files, a few K normally, but billions of them. The architecture, I think, would be one riak node per disk on the hardware, and probably run about 16 riak nodes per physical machine. Say I had 4 of these machines, which would be 64 riak nodes. With something like this, if I set W=3 as a CAP tuning, I would want to make sure that at least 2 of those writes where on 2 physically different machines, so in case I had a hardware failure, and it took out a physical machine, I could still operate with the other 3 machines. Is something like this possible with riak? enjoy, -jeremy ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: CAP controls
Thanks Justin, Good to know. I'll throw it out there as a feature request :-). Or at least something to think about. I believe it is HDFS that has something along these lines, where you can say which nodes are in which racks and which data centers. This is so it knows who its nearby neighbors are for block replication and job distribution. I think same sort of meta-node location knowledge could be used for ring partitioning so the paranoid (like me :-)) could tune for catastrophic events. enjoy, -jeremy On Thu, May 13, 2010 at 05:05:19PM -0400, Justin Sheehy wrote: > Hi, Jeremy. > > It sounds like an interesting project. At this time, there is no way > to indicate in Riak that two nodes are actually on the same host (and > therefore should not overlap in replica sets). It could certainly be > done, but to do so today would require modification to the ring > partition claim logic. > > Best, > > -Justin > > > > On Thu, May 13, 2010 at 4:57 PM, Jeremy Hinegardner > wrote: > > I am thinking about how to possibly replace an existing system that has > > heavy > > I/O load, low CPU usage, with riak. ?Its a file storage system, with > > smallish > > files, a few K normally, but billions of them. > > > > The architecture, I think, would be one riak node per disk on the hardware, > > and probably run about 16 riak nodes per physical machine. ?Say I had > > 4 of these machines, which would be 64 riak nodes. > > > > With something like this, if I set W=3 as a CAP tuning, I would want to make > > sure that at least 2 of those writes where on 2 physically different > > machines, > > so in case I had a hardware failure, and it took out a physical machine, I > > could > > still operate with the other 3 machines. > > > > Is something like this possible with riak? > > > > enjoy, > > > > -jeremy > > > > -- > > > > ?Jeremy Hinegardner ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jer...@hinegardner.org > > > > > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- Jeremy Hinegardner jer...@hinegardner.org ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
question about postcommit hooks and patch
I've been playing with using postcommit hooks in some code. I couldn't find an example, so looking at the source, I think the right way to set one up is something like: PHook = {struct, [ {<<"mod">>, <>}, {<<"fun">>, <<"notify_change">>}]}, RiakClient:set_bucket(<>, [{postcommit, [PHook]}]), Is there a better way? Also, in debugging my hook, I found that wrapping the hook so I could see exceptions made it much easier. I made the following patch that I think might be useful to others, as well: diff -r e836ea266eca apps/riak_kv/src/riak_kv_put_fsm.erl --- a/apps/riak_kv/src/riak_kv_put_fsm.erl Thu May 13 17:28:01 2010 -0400 +++ b/apps/riak_kv/src/riak_kv_put_fsm.erl Thu May 13 14:50:07 2010 -0700 @@ -314,7 +314,7 @@ invoke_hook(precommit, Mod0, Fun0, undefined, RObj) -> Mod = binary_to_atom(Mod0, utf8), Fun = binary_to_atom(Fun0, utf8), -Mod:Fun(RObj); +wrap_hook(Mod, Fun, RObj); invoke_hook(precommit, undefined, undefined, JSName, RObj) -> case riak_kv_js_manager:blocking_dispatch({{jsfun, JSName}, RObj}) of {ok, <<"fail">>} -> @@ -331,13 +331,22 @@ invoke_hook(postcommit, Mod0, Fun0, undefined, Obj) -> Mod = binary_to_atom(Mod0, utf8), Fun = binary_to_atom(Fun0, utf8), -proc_lib:spawn(fun() -> Mod:Fun(Obj) end); +proc_lib:spawn(fun() -> wrap_hook(Mod,Fun,Obj) end); invoke_hook(postcommit, undefined, undefined, _JSName, _Obj) -> error_logger:warning_msg("Javascript post-commit hooks aren't implemented"); %% NOP to handle all other cases invoke_hook(_, _, _, _, RObj) -> RObj. +wrap_hook(Mod, Fun, Obj)-> +try Mod:Fun(Obj) +catch +EType:X -> +error_logger:error_msg("problem invoking hook ~p:~p -> ~p:~p~n~p~n", + [Mod, Fun, EType, X, erlang:get_stacktrace()]), +fail +end. + merge_robjs(RObjs0,AllowMult) -> RObjs1 = [X || X <- [riak_kv_util:obj_not_deleted(O) || O <- RObjs0], X /= undefined], Bruce ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: CAP controls
Thanks Jon, I was thinking in terms of what we have right now, which is many processes per physical machine, each one of them managing a dedicated disk to optimize I/O throughput. If I get a chance to do some prototyping on this, I'll try out the RAID0 option. Although I'm a bit suspect at the moment because something like that, with say a 3TB RAID0 partition, it seems would store the data linearlly down that stripe, instead of across all the disks evenly. Altough the RAID controller might take care of that, not sure. good idea, I appreciate it. -jeremy On Thu, May 13, 2010 at 03:11:43PM -0600, Jon Meredith wrote: > Hi Jeremy, > > If it were me I would start off with the simplest set up possible - one > riak node per physical machine. It'll be easier to see what is going on and > reason about performance if there are less moving parts. > > For the storage configuration - if you are using innostore then keep one > for disk for logging (or use the drive the o/s is on if it is doing little > else) and create a RAID0 disk out of the remaining disks. Similar set up > for bitcask except you don't need to worry about the log data. > > Try that as a baseline and benchmark it. If it isn't up to the task then > you can explore more complex options. > > Jon. > > > On 5/13/10 2:57 PM, Jeremy Hinegardner wrote: >> I am thinking about how to possibly replace an existing system that has >> heavy >> I/O load, low CPU usage, with riak. Its a file storage system, with >> smallish >> files, a few K normally, but billions of them. >> >> The architecture, I think, would be one riak node per disk on the >> hardware, >> and probably run about 16 riak nodes per physical machine. Say I had >> 4 of these machines, which would be 64 riak nodes. >> >> With something like this, if I set W=3 as a CAP tuning, I would want to >> make >> sure that at least 2 of those writes where on 2 physically different >> machines, >> so in case I had a hardware failure, and it took out a physical machine, I >> could >> still operate with the other 3 machines. >> >> Is something like this possible with riak? >> >> enjoy, >> >> -jeremy >> >> > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Jeremy Hinegardner jer...@hinegardner.org ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: question about postcommit hooks and patch
Hi Bruce, Thanks for the patch - it's definitely worthwhile and we'll likely commit it to tip soon. - Andy -- Andy Gross VP, Engineering Basho Technologies, Inc. On Thu, May 13, 2010 at 6:16 PM, Bruce Lowekamp wrote: > I've been playing with using postcommit hooks in some code. I > couldn't find an example, so looking at the source, I think the right > way to set one up is something like: > > PHook = {struct, [ {<<"mod">>, <>}, {<<"fun">>, > <<"notify_change">>}]}, > RiakClient:set_bucket(<>, [{postcommit, [PHook]}]), > > Is there a better way? > > > Also, in debugging my hook, I found that wrapping the hook so I could > see exceptions made it much easier. I made the following patch that I > think might be useful to others, as well: > > diff -r e836ea266eca apps/riak_kv/src/riak_kv_put_fsm.erl > --- a/apps/riak_kv/src/riak_kv_put_fsm.erl Thu May 13 17:28:01 2010 > -0400 > +++ b/apps/riak_kv/src/riak_kv_put_fsm.erl Thu May 13 14:50:07 2010 > -0700 > @@ -314,7 +314,7 @@ > invoke_hook(precommit, Mod0, Fun0, undefined, RObj) -> > Mod = binary_to_atom(Mod0, utf8), > Fun = binary_to_atom(Fun0, utf8), > -Mod:Fun(RObj); > +wrap_hook(Mod, Fun, RObj); > invoke_hook(precommit, undefined, undefined, JSName, RObj) -> > case riak_kv_js_manager:blocking_dispatch({{jsfun, JSName}, RObj}) of > {ok, <<"fail">>} -> > @@ -331,13 +331,22 @@ > invoke_hook(postcommit, Mod0, Fun0, undefined, Obj) -> > Mod = binary_to_atom(Mod0, utf8), > Fun = binary_to_atom(Fun0, utf8), > -proc_lib:spawn(fun() -> Mod:Fun(Obj) end); > +proc_lib:spawn(fun() -> wrap_hook(Mod,Fun,Obj) end); > invoke_hook(postcommit, undefined, undefined, _JSName, _Obj) -> > error_logger:warning_msg("Javascript post-commit hooks aren't > implemented"); > %% NOP to handle all other cases > invoke_hook(_, _, _, _, RObj) -> > RObj. > > +wrap_hook(Mod, Fun, Obj)-> > +try Mod:Fun(Obj) > +catch > +EType:X -> > +error_logger:error_msg("problem invoking hook ~p:~p -> > ~p:~p~n~p~n", > + [Mod, Fun, EType, X, > erlang:get_stacktrace()]), > +fail > +end. > + > merge_robjs(RObjs0,AllowMult) -> > RObjs1 = [X || X <- [riak_kv_util:obj_not_deleted(O) || > O <- RObjs0], X /= undefined], > > > > Bruce > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
returning multiple documents
hey guys, given that i have an array of keys and want to return the related documents. what would be the best way to do this? so far i can think of two ways 1. open a socket and call RpbGetReq asynchronously for each key 2. run a map reduce through the rest interface and deserialise the results is there a 3rd way? regards, gareth stokes ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com