Replication behavior

2010-05-13 Thread Jimmy Thrasher
Hi all,

I just added another node to my cluster (moving from 1 to 2) and was
surprised by some behavior.

As I understand replication, with an n_val of 3 (default), all nodes will
have all data until I add my 4th node.  First, is this right?

Second, I watched my disk usage on node 1 go from 1GB to 600MB after adding
the second node.  If I still have a full copy of all data on node 1, what
was in that 400MB?


riak-users mailing list

Riak Search release date

2010-05-13 Thread Senthilkumar Peelikkampatti
  It has been while Basho announced about Riak Search (I guess it is
almost an year back, if I remember correctly), do you have any time frame of
release? Or will it be available only to the Enterprise DS customer? I am
one of the persons looking for Riak Search after moving from Couchdb to
Riak. Is it possible to release at least minimalistic API so that other can
build something on top of that rather than reinventing from scratch? I asked
this question  privately to Basho team but did not get any useful response
so far. Currently Riak is suffering full bucket scan through mapred as
searching particular documents in a bucket is not possible otherwise
implementing another indexing mechnism on top of it. In my case
mapred_bucket is killing the performance (noting that same API is not
recommended by Basho). This is one of the frustrating stumbling block when
coming from Couchdb. Riak Search, as per presentation(,
if available, is going to pull most of the customers looking for best in
class KV solution.

riak-users mailing list

Re: Riak Search release date

2010-05-13 Thread Eric Gaumer
On Thu, May 13, 2010 at 11:02 AM, Senthilkumar Peelikkampatti <> wrote:

> Hi,
>   It has been while Basho announced about Riak Search (I guess it is
> almost an year back, if I remember correctly), do you have any time frame of
> release? Or will it be available only to the Enterprise DS customer?

I've been holding back on the same question. I've got a Fortune 100 client
that I'm currently working with to prototype a system for doing distributed
transaction analysis over several billion transactions. We're obviously
leveraging the Map/Reduce framework in Riak. At times, our goal is to run
stats across the whole data set but we'd also like to run against subsets
that match some search criteria. Using filters in the map/reduce job is a
solution but requires visiting every document.

Riak Search may be able to help. We've got a generous amount of funding due
to the complexity and urgency of the problem. Since we're prototyping, we
can live with bugs. Is the beta trial closed? If we can put a solution
together using Riak then my client will be looking at the enterprise support
and monitoring.

riak-users mailing list

Re: Replication behavior

2010-05-13 Thread Justin Sheehy
Hi, Jimmy.

With an n_val of 3, there will be 3 copies of each data item in the
cluster even when there are less than 3 hosts.  With 2 nodes in that
situation, each node will have either 1 or 2 copies of each item.

Does that help with your understanding?


riak-users mailing list

Re: Replication behavior

2010-05-13 Thread Jimmy Thrasher
Ah, yes.  So with 1 node, I had a guaranteed 3 copies of the data, and when
I added another node, some of those got reduced to 1 or 2 on that node.  I
understand now, thanks.


On Thu, May 13, 2010 at 11:34 AM, Justin Sheehy  wrote:

> Hi, Jimmy.
> With an n_val of 3, there will be 3 copies of each data item in the
> cluster even when there are less than 3 hosts.  With 2 nodes in that
> situation, each node will have either 1 or 2 copies of each item.
> Does that help with your understanding?
> -Justin
> ___
> riak-users mailing list
riak-users mailing list

Re: FastTrack Slowing Point

2010-05-13 Thread Ted Karmel
For future reference, the problem was the GNU Coreutils version.  My
version was relatively old (6.5) and did not have -n as an option.  I
updated GNU Coreutils to 8.5 via this source:

I no longer get the invalid option error.  But I now get the following
error when entering make devrel :

cp -Rn dev/riak dev/dev1
cp:  cannot stat  'dev/riak' : No such file or directory
make: *** [dev1] Error 1

On Tue, May 11, 2010 at 7:39 PM, Ted Karmel  wrote:
> OS = Ubuntu Hardy
> On Tue, May 11, 2010 at 7:25 PM, Grant Schofield  wrote:
>> What OS are you running through the FastTrack tutorial on?
>> Thanks,
>> Grant Schofield
>> Developer Advocate
>> Basho Technologies
>> On May 11, 2010, at 12:22 PM, Ted Karmel wrote:
>>> I am following the Riak FastTrack tutorial:
>>> But I am stumbling on one step:
 make devrel
>>> For which I get the following error message:
>>> cp -Rn dev/riak dev/dev1
>>> cp: invalid option -- n
>>> Try 'cp --help' for more information.
>>> make: *** [dev1] Error 1
>>> Any suggestions much appreciated.
>>> Thanks.
>>> ___
>>> riak-users mailing list

riak-users mailing list

Re: FastTrack Slowing Point

2010-05-13 Thread Dan Reverri
Running "make dev" first will create the "dev/riak" directory.

The "make devrel" command will copy the generated "dev/riak" directory into
three additional release directories "dev/dev1", "dev/dev2", "dev/dev3". The
name, web_port, pb_port, and handoff_port settings are then updated in the
three releases so each release has a unique value. It's important for each
release to have unique values for these settings so that they can run at the
same time.

On Thu, May 13, 2010 at 10:02 AM, Ted Karmel  wrote:

> For future reference, the problem was the GNU Coreutils version.  My
> version was relatively old (6.5) and did not have -n as an option.  I
> updated GNU Coreutils to 8.5 via this source:
> I no longer get the invalid option error.  But I now get the following
> error when entering make devrel :
> cp -Rn dev/riak dev/dev1
> cp:  cannot stat  'dev/riak' : No such file or directory
> make: *** [dev1] Error 1
> On Tue, May 11, 2010 at 7:39 PM, Ted Karmel  wrote:
> > OS = Ubuntu Hardy
> >
> > On Tue, May 11, 2010 at 7:25 PM, Grant Schofield 
> wrote:
> >> What OS are you running through the FastTrack tutorial on?
> >>
> >> Thanks,
> >> Grant Schofield
> >> Developer Advocate
> >> Basho Technologies
> >>
> >>
> >> On May 11, 2010, at 12:22 PM, Ted Karmel wrote:
> >>
> >>> I am following the Riak FastTrack tutorial:
> >>>
> >>>
> >>>
> >>> But I am stumbling on one step:
> >>>
>  make devrel
> >>>
> >>> For which I get the following error message:
> >>>
> >>> cp -Rn dev/riak dev/dev1
> >>> cp: invalid option -- n
> >>> Try 'cp --help' for more information.
> >>> make: *** [dev1] Error 1
> >>>
> >>>
> >>> Any suggestions much appreciated.
> >>>
> >>> Thanks.
> >>>
> >>> ___
> >>> riak-users mailing list
> >>>
> >>>
> >>
> >>
> >
> ___
> riak-users mailing list
riak-users mailing list

Riak Search release date

2010-05-13 Thread Dan Young
I'm keen on hearing more about this as well. I'm at the National Snow
and Ice Data Center and we're looking at an Enterprise Database that
will need to support terabytes (and getting close to a petabyte) of
research data. Thinking of storing all this in a RDBMSwait...just
shoot me!

I'm trying to pitch Riak as one of the prototype data stores but
search is an issue.  Initially we're looking @ store indexes external
in Solr.



riak-users mailing list

Re: Riak Search release date

2010-05-13 Thread Mark Phillips
Senthilkumar, Eric, others -

Thanks for your detailed and thoughtful questions about Riak Search.

We are well aware that many of you are building and prototyping
applications on Riak and that you are expecting to use Riak Search
heavily once it is available. This is great. MapReduce is a great
feature of Riak but it does not solve exactly the same problems that
Riak Search intends to.

Our beta program is very limited and is closed to new entrants at this
time in order to allow us to focus on delivering a product that we are
happy to support.

We want to do everything we can to accommodate the community's need
for and curiosity about Search. You can expect a blog post from the
Search team next week, and further communication from us as progress

Thanks for your patience.


Mark Phillips
Community Manager
Basho Technologies

riak-users mailing list

Re: install on freebsd and dragonflybsd

2010-05-13 Thread Vitaly Martynenko
Hi Ryan,

  I have the same issue under FreeBSD.
  And your Makefile.test gives this result:
  CURDIR = ""

  If change $(CURDIR) to $(.CURDIR) I get
  CURDIR = "/usr/home/vitalka"

  I had a little progress with building Riak, but stopped on building
  erlang_js, can't find a way to build it.

  FreeBSD 8.0
  gmake 3.81

rurlbc> Can you mail the list the output of the attached test Makefile?  If the
rurlbc> builtin gmake CURDIR variable isn't being set, there isn't much we can 
rurlbc> for you, I'm afraid.  It means that your gmake build is woefully broken 
rurlbc> something, somewhere is managing to set it to an empty string.

rurlbc> --Ryan

Best regards,

ICQ: #12963384

riak-users mailing list

CAP controls

2010-05-13 Thread Jeremy Hinegardner
I am thinking about how to possibly replace an existing system that has heavy
I/O load, low CPU usage, with riak.  Its a file storage system, with smallish
files, a few K normally, but billions of them.

The architecture, I think, would be one riak node per disk on the hardware,
and probably run about 16 riak nodes per physical machine.  Say I had
4 of these machines, which would be 64 riak nodes.

With something like this, if I set W=3 as a CAP tuning, I would want to make
sure that at least 2 of those writes where on 2 physically different machines,
so in case I had a hardware failure, and it took out a physical machine, I could
still operate with the other 3 machines.

Is something like this possible with riak?




 Jeremy Hinegardner 

riak-users mailing list

Re: CAP controls

2010-05-13 Thread Justin Sheehy
Hi, Jeremy.

It sounds like an interesting project.  At this time, there is no way
to indicate in Riak that two nodes are actually on the same host (and
therefore should not overlap in replica sets).  It could certainly be
done, but to do so today would require modification to the ring
partition claim logic.



On Thu, May 13, 2010 at 4:57 PM, Jeremy Hinegardner
> I am thinking about how to possibly replace an existing system that has heavy
> I/O load, low CPU usage, with riak.  Its a file storage system, with smallish
> files, a few K normally, but billions of them.
> The architecture, I think, would be one riak node per disk on the hardware,
> and probably run about 16 riak nodes per physical machine.  Say I had
> 4 of these machines, which would be 64 riak nodes.
> With something like this, if I set W=3 as a CAP tuning, I would want to make
> sure that at least 2 of those writes where on 2 physically different machines,
> so in case I had a hardware failure, and it took out a physical machine, I 
> could
> still operate with the other 3 machines.
> Is something like this possible with riak?
> enjoy,
> -jeremy
> --
>  Jeremy Hinegardner                    
> ___
> riak-users mailing list

riak-users mailing list

Re: CAP controls

2010-05-13 Thread Jon Meredith

Hi Jeremy,

If it were me I would start off with the simplest set up possible - one 
riak node per physical machine. It'll be easier to see what is going on 
and reason about performance if there are less moving parts.

For the storage configuration - if you are using innostore then keep one 
for disk for logging (or use the drive the o/s is on if it is doing 
little else) and create a RAID0 disk out of the remaining disks.  
Similar set up for bitcask except you don't need to worry about the log 

Try that as a baseline and benchmark it.  If it isn't up to the task 
then you can explore more complex options.


On 5/13/10 2:57 PM, Jeremy Hinegardner wrote:

I am thinking about how to possibly replace an existing system that has heavy
I/O load, low CPU usage, with riak.  Its a file storage system, with smallish
files, a few K normally, but billions of them.

The architecture, I think, would be one riak node per disk on the hardware,
and probably run about 16 riak nodes per physical machine.  Say I had
4 of these machines, which would be 64 riak nodes.

With something like this, if I set W=3 as a CAP tuning, I would want to make
sure that at least 2 of those writes where on 2 physically different machines,
so in case I had a hardware failure, and it took out a physical machine, I could
still operate with the other 3 machines.

Is something like this possible with riak?




riak-users mailing list

Re: CAP controls

2010-05-13 Thread Jeremy Hinegardner
Thanks Justin,

Good to know.  I'll throw it out there as a feature request :-).   Or at least
something to think about.

I believe it is HDFS that has something along these lines, where you can say
which nodes are in which racks and which data centers.  This is so it knows who
its nearby neighbors are for block replication and job distribution.

I think same sort of meta-node location knowledge could be used for ring
partitioning so the paranoid (like me :-)) could tune for catastrophic events.


On Thu, May 13, 2010 at 05:05:19PM -0400, Justin Sheehy wrote:
> Hi, Jeremy.
> It sounds like an interesting project.  At this time, there is no way
> to indicate in Riak that two nodes are actually on the same host (and
> therefore should not overlap in replica sets).  It could certainly be
> done, but to do so today would require modification to the ring
> partition claim logic.
> Best,
> -Justin
> On Thu, May 13, 2010 at 4:57 PM, Jeremy Hinegardner
>  wrote:
> > I am thinking about how to possibly replace an existing system that has 
> > heavy
> > I/O load, low CPU usage, with riak. ?Its a file storage system, with 
> > smallish
> > files, a few K normally, but billions of them.
> >
> > The architecture, I think, would be one riak node per disk on the hardware,
> > and probably run about 16 riak nodes per physical machine. ?Say I had
> > 4 of these machines, which would be 64 riak nodes.
> >
> > With something like this, if I set W=3 as a CAP tuning, I would want to make
> > sure that at least 2 of those writes where on 2 physically different 
> > machines,
> > so in case I had a hardware failure, and it took out a physical machine, I 
> > could
> > still operate with the other 3 machines.
> >
> > Is something like this possible with riak?
> >
> > enjoy,
> >
> > -jeremy
> >
> > --
> > 
> > ?Jeremy Hinegardner ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> >
> >
> > ___
> > riak-users mailing list
> >
> >
> >


 Jeremy Hinegardner 

riak-users mailing list

question about postcommit hooks and patch

2010-05-13 Thread Bruce Lowekamp
I've been playing with using postcommit hooks in some code.  I
couldn't find an example, so looking at the source, I think the right
way to set one up is something like:

PHook = {struct, [ {<<"mod">>, <>}, {<<"fun">>,
RiakClient:set_bucket(<>, [{postcommit, [PHook]}]),

Is there a better way?

Also, in debugging my hook, I found that wrapping the hook so I could
see exceptions made it much easier.  I made the following patch that I
think might be useful to others, as well:

diff -r e836ea266eca apps/riak_kv/src/riak_kv_put_fsm.erl
--- a/apps/riak_kv/src/riak_kv_put_fsm.erl  Thu May 13 17:28:01 2010 -0400
+++ b/apps/riak_kv/src/riak_kv_put_fsm.erl  Thu May 13 14:50:07 2010 -0700
@@ -314,7 +314,7 @@
 invoke_hook(precommit, Mod0, Fun0, undefined, RObj) ->
 Mod = binary_to_atom(Mod0, utf8),
 Fun = binary_to_atom(Fun0, utf8),
+wrap_hook(Mod, Fun, RObj);
 invoke_hook(precommit, undefined, undefined, JSName, RObj) ->
 case riak_kv_js_manager:blocking_dispatch({{jsfun, JSName}, RObj}) of
 {ok, <<"fail">>} ->
@@ -331,13 +331,22 @@
 invoke_hook(postcommit, Mod0, Fun0, undefined, Obj) ->
 Mod = binary_to_atom(Mod0, utf8),
 Fun = binary_to_atom(Fun0, utf8),
-proc_lib:spawn(fun() -> Mod:Fun(Obj) end);
+proc_lib:spawn(fun() -> wrap_hook(Mod,Fun,Obj) end);
 invoke_hook(postcommit, undefined, undefined, _JSName, _Obj) ->
 error_logger:warning_msg("Javascript post-commit hooks aren't
 %% NOP to handle all other cases
 invoke_hook(_, _, _, _, RObj) ->

+wrap_hook(Mod, Fun, Obj)->
+try Mod:Fun(Obj)
+EType:X ->
+error_logger:error_msg("problem invoking hook ~p:~p ->
+   [Mod, Fun, EType, X,
 merge_robjs(RObjs0,AllowMult) ->
 RObjs1 = [X || X <- [riak_kv_util:obj_not_deleted(O) ||
 O <- RObjs0], X /= undefined],


riak-users mailing list

Re: CAP controls

2010-05-13 Thread Jeremy Hinegardner
Thanks Jon,

I was thinking in terms of what we have right now, which is many processes per
physical machine, each one of them managing a dedicated disk to optimize I/O

If I get a chance to do some prototyping on this, I'll try out the RAID0 option.
Although I'm a bit suspect at the moment because something like that, with 
say a 3TB RAID0 partition, it seems would store the data linearlly down that
stripe, instead of across all the disks evenly. Altough the RAID controller
might take care of that, not sure.

good idea, I appreciate it.


On Thu, May 13, 2010 at 03:11:43PM -0600, Jon Meredith wrote:
> Hi Jeremy,
> If it were me I would start off with the simplest set up possible - one 
> riak node per physical machine. It'll be easier to see what is going on and 
> reason about performance if there are less moving parts.
> For the storage configuration - if you are using innostore then keep one 
> for disk for logging (or use the drive the o/s is on if it is doing little 
> else) and create a RAID0 disk out of the remaining disks.  Similar set up 
> for bitcask except you don't need to worry about the log data.
> Try that as a baseline and benchmark it.  If it isn't up to the task then 
> you can explore more complex options.
> Jon.
> On 5/13/10 2:57 PM, Jeremy Hinegardner wrote:
>> I am thinking about how to possibly replace an existing system that has 
>> heavy
>> I/O load, low CPU usage, with riak.  Its a file storage system, with 
>> smallish
>> files, a few K normally, but billions of them.
>> The architecture, I think, would be one riak node per disk on the 
>> hardware,
>> and probably run about 16 riak nodes per physical machine.  Say I had
>> 4 of these machines, which would be 64 riak nodes.
>> With something like this, if I set W=3 as a CAP tuning, I would want to 
>> make
>> sure that at least 2 of those writes where on 2 physically different 
>> machines,
>> so in case I had a hardware failure, and it took out a physical machine, I 
>> could
>> still operate with the other 3 machines.
>> Is something like this possible with riak?
>> enjoy,
>> -jeremy
> ___
> riak-users mailing list


 Jeremy Hinegardner 

riak-users mailing list

Re: question about postcommit hooks and patch

2010-05-13 Thread Andy Gross
Hi Bruce,

Thanks for the patch - it's definitely worthwhile and we'll likely commit it
to tip soon.

- Andy

Andy Gross 
VP, Engineering
Basho Technologies, Inc.

On Thu, May 13, 2010 at 6:16 PM, Bruce Lowekamp wrote:

> I've been playing with using postcommit hooks in some code.  I
> couldn't find an example, so looking at the source, I think the right
> way to set one up is something like:
> PHook = {struct, [ {<<"mod">>, <>}, {<<"fun">>,
> <<"notify_change">>}]},
> RiakClient:set_bucket(<>, [{postcommit, [PHook]}]),
> Is there a better way?
> Also, in debugging my hook, I found that wrapping the hook so I could
> see exceptions made it much easier.  I made the following patch that I
> think might be useful to others, as well:
> diff -r e836ea266eca apps/riak_kv/src/riak_kv_put_fsm.erl
> --- a/apps/riak_kv/src/riak_kv_put_fsm.erl  Thu May 13 17:28:01 2010
> -0400
> +++ b/apps/riak_kv/src/riak_kv_put_fsm.erl  Thu May 13 14:50:07 2010
> -0700
> @@ -314,7 +314,7 @@
>  invoke_hook(precommit, Mod0, Fun0, undefined, RObj) ->
> Mod = binary_to_atom(Mod0, utf8),
> Fun = binary_to_atom(Fun0, utf8),
> -Mod:Fun(RObj);
> +wrap_hook(Mod, Fun, RObj);
>  invoke_hook(precommit, undefined, undefined, JSName, RObj) ->
> case riak_kv_js_manager:blocking_dispatch({{jsfun, JSName}, RObj}) of
> {ok, <<"fail">>} ->
> @@ -331,13 +331,22 @@
>  invoke_hook(postcommit, Mod0, Fun0, undefined, Obj) ->
> Mod = binary_to_atom(Mod0, utf8),
> Fun = binary_to_atom(Fun0, utf8),
> -proc_lib:spawn(fun() -> Mod:Fun(Obj) end);
> +proc_lib:spawn(fun() -> wrap_hook(Mod,Fun,Obj) end);
>  invoke_hook(postcommit, undefined, undefined, _JSName, _Obj) ->
> error_logger:warning_msg("Javascript post-commit hooks aren't
> implemented");
>  %% NOP to handle all other cases
>  invoke_hook(_, _, _, _, RObj) ->
> RObj.
> +wrap_hook(Mod, Fun, Obj)->
> +try Mod:Fun(Obj)
> +catch
> +EType:X ->
> +error_logger:error_msg("problem invoking hook ~p:~p ->
> ~p:~p~n~p~n",
> +   [Mod, Fun, EType, X,
> erlang:get_stacktrace()]),
> +fail
> +end.
> +
>  merge_robjs(RObjs0,AllowMult) ->
> RObjs1 = [X || X <- [riak_kv_util:obj_not_deleted(O) ||
> O <- RObjs0], X /= undefined],
> Bruce
> ___
> riak-users mailing list
riak-users mailing list

returning multiple documents

2010-05-13 Thread Gareth Stokes
hey guys,

given that i have an array of keys and want to return the related documents.
what would be the best way to do this?
so far i can think of two ways

1. open a socket and call RpbGetReq asynchronously for each key
2. run a map reduce through the rest interface and deserialise the results

is there a 3rd way?

gareth stokes
riak-users mailing list