Building a native Riak executable

2011-10-04 Thread Olivier Rossel
Hi all.

I need to deploy Riak on a Linux server with no Erlang VM.
Is there a procedure to build a native Riak executable on my own Ubuntu
so I can deploy it on this server?
Which Erlang .deb packages are required on my Ubuntu to build this native Riak?

Any help is very welcome.

PS: I am *absolute* newbie in Erlang :)

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Erlang Map/Reduce error

2011-10-04 Thread Lyes Amazouz
Hi everybody,

I'm learning how to execute a map/reduce using the riak_client module, so I
tried this:

>Map = fun(O, undefined, none) -> [riak_object:get_value(O)] end.
>C:mapred([{<<"buck">>, <<"key1">>}, {<<"buck">>, <<"key2">>}], [{map,
{qfun, Map}, none, true}]).

but I got  this error:

{error,{error,{badfun,#Fun},
  [{riak_kv_mapper,run_map,9},
   {riak_kv_mapper,'-do_map/2-fun-0-',9},
   {lists,foldl,3},
   {riak_kv_mapper,do_map,2},
   {gen_fsm,handle_msg,7},
   {proc_lib,init_p_do_apply,3}]}}

What is wring with my code??
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Erlang Map/Reduce error

2011-10-04 Thread Kresten Krab Thorup
The error badfun most likely means that the function's code is not available 
inside Riak.

You need to add {add_paths, "/path/to/your/ebin"} in the riak_kv section in 
app.config.  Read more about it here 
http://wiki.basho.com/MapReduce.html#MapReduce-via-the-Erlang-API

For an explanation of why the badfun happens, you can read this blog:
http://www.javalimit.com/2010/05/passing-funs-to-other-erlang-nodes.html
... which can also give you an idea to how to do it without loading the code 
into riak, i.e. something like this may likely work, but will invoke the 
interpreter inside riak; which may be too slow for your needs.

Kresten


1> FunStr = "fun(O, undefined, none) -> [riak_object:get_value(O)] end.".
2> {ok, Tokens, _} = erl_scan:string(FunStr).
3> {ok, [Form]} = erl_parse:parse_exprs(Tokens).
4> Bindings = erl_eval:add_binding('B', 2, erl_eval:new_bindings()).
5> {value, Fun, _} = erl_eval:expr(Form, Bindings).
6> C:mapred([{<<"buck">>, <<"key1">>}, {<<"buck">>, <<"key2">>}], [{map, {qfun, 
Fun}, none, true}]).


Kresten




On Oct 4, 2011, at 11:00 AM, Lyes Amazouz wrote:

Hi everybody,

I'm learning how to execute a map/reduce using the riak_client module, so I 
tried this:

>Map = fun(O, undefined, none) -> [riak_object:get_value(O)] end.
>C:mapred([{<<"buck">>, <<"key1">>}, {<<"buck">>, <<"key2">>}], [{map, {qfun, 
>Map}, none, true}]).

but I got  this error:

{error,{error,{badfun,#Fun},
  [{riak_kv_mapper,run_map,9},
   {riak_kv_mapper,'-do_map/2-fun-0-',9},
   {lists,foldl,3},
   {riak_kv_mapper,do_map,2},
   {gen_fsm,handle_msg,7},
   {proc_lib,init_p_do_apply,3}]}}

What is wring with my code??
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delete takes 5 seconds in riak 1.0 ???

2011-10-04 Thread Roland Karlsson
OK - thanx. I wrote a put function that looks like this.
It works in 1.0 - but I get disconnected if I connect to an
0.14 server and try to use it.

Does it look OK?




put(Pid, Bucket, Key, Value) ->
Obj2 =
case riakc_pb_socket:get(Pid, Bucket, Key, [deletedvclock]) of
{ok, Obj1} ->
riakc_obj:update_value(Obj1, Value);
{error, notfound, VClock} ->
riakc_obj:set_vclock(riakc_obj:new(Bucket, Key, Value), VClock);
{error, notfound} ->
riakc_obj:new(Bucket, Key, Value)
end,
  
riakc_pb_socket:put(Pid, Obj2).

-


/Roland






- Original Message -
From: "Jon Meredith" 
To: "Roland Karlsson" 
Cc: riak-users@lists.basho.com
Sent: Monday, October 3, 2011 5:02:27 PM
Subject: Re: Delete takes 5 seconds in riak 1.0 ???

Hi Roland, 


Riak deletes by first writing a tombstone and then when all replicas are in 
sync removing the object from the underlying key/value store. We have made some 
changes in 1.0.0 to increase the length of time the tombstones are around when 
all nodes are up (in the Delete Changes section of the release notes here 
https://github.com/basho/riak/blob/1.0/RELEASE-NOTES.org ). 


If you just want to make sure the object is deleted, you could use the 
deletedvclock option on the get request. You will get a 3-tuple {error, 
notfound, VClock} until the tombstone is removed, then a 2-tuple of {error, 
notfound} when it has gone. 


If you wish to rewrite the key, you can use the returned vclock on a new object 
O = riakc_obj:set_vclock(riakc_obj:new(<<"b">>,<<<"k">>,<,"v">>), VClock) 


and it will overwrite the tombstone for you. 


Best Regards, 


Jon Meredith 
Basho Technologies 


On Mon, Oct 3, 2011 at 6:49 AM, Roland Karlsson < 
roland.karls...@erlang-solutions.com > wrote: 


Hi Basho, 

I had written some simple tests using riakc client library. 

The tests are attached to this mail. 

In particular I made this call in the tests at the start of 
of each test in order to get a clean DB. 

riakc_pb_socket:delete(Pid, Bucket, Key), 

In 0.14 that worked just fine. But ... in 1.0 it did not work. 
The tests failed and complained about siblings. 

But ... after some time of experimenting I found that if I 
added a sleep of 5 seconds after the delete ... then the tests 
were OK again. 

Is this correct? Does delete take 5 seconds? 

NOTE that the attached file assumes inits is started. 

/Roland 

___ 
riak-users mailing list 
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delete takes 5 seconds in riak 1.0 ???

2011-10-04 Thread Roland Karlsson
Another comment

I also tried an implementation of the test code where delete
waited in a loop as long as the tombstone was still there.
But as far as I can see it might then wait forever. Or at least
several minutes - until I did not care to wait any longer.

Is that so - an a tombstone exist forever?

/Roland



- Original Message -
From: "Jon Meredith" 
To: "Roland Karlsson" 
Cc: riak-users@lists.basho.com
Sent: Monday, October 3, 2011 5:02:27 PM
Subject: Re: Delete takes 5 seconds in riak 1.0 ???

Hi Roland, 


Riak deletes by first writing a tombstone and then when all replicas are in 
sync removing the object from the underlying key/value store. We have made some 
changes in 1.0.0 to increase the length of time the tombstones are around when 
all nodes are up (in the Delete Changes section of the release notes here 
https://github.com/basho/riak/blob/1.0/RELEASE-NOTES.org ). 


If you just want to make sure the object is deleted, you could use the 
deletedvclock option on the get request. You will get a 3-tuple {error, 
notfound, VClock} until the tombstone is removed, then a 2-tuple of {error, 
notfound} when it has gone. 


If you wish to rewrite the key, you can use the returned vclock on a new object 
O = riakc_obj:set_vclock(riakc_obj:new(<<"b">>,<<<"k">>,<,"v">>), VClock) 


and it will overwrite the tombstone for you. 


Best Regards, 


Jon Meredith 
Basho Technologies 


On Mon, Oct 3, 2011 at 6:49 AM, Roland Karlsson < 
roland.karls...@erlang-solutions.com > wrote: 


Hi Basho, 

I had written some simple tests using riakc client library. 

The tests are attached to this mail. 

In particular I made this call in the tests at the start of 
of each test in order to get a clean DB. 

riakc_pb_socket:delete(Pid, Bucket, Key), 

In 0.14 that worked just fine. But ... in 1.0 it did not work. 
The tests failed and complained about siblings. 

But ... after some time of experimenting I found that if I 
added a sleep of 5 seconds after the delete ... then the tests 
were OK again. 

Is this correct? Does delete take 5 seconds? 

NOTE that the attached file assumes inits is started. 

/Roland 

___ 
riak-users mailing list 
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Building a native Riak executable

2011-10-04 Thread OJ Reeves
Hi Olivier,

What I do in this scenario is

   - Build a VM which is running the same operating system as your target
   server. Let's say you're deploying to a 64-bit Arch Linux machine, that's
   the exact flavour of OS that you set up on the VM.
   - Install the required version of Erlang.
   - Install all the build tools, git, etc.
   - Download Riak from Github and checkout the version that you're looking
   to deploy.

At this point you're ready to create a release. Riak comes with a
feature-rich makefile that should cover what you're looking for. To generate
a self-contained release using rebar, change to the Riak directory and type:

make && make rel

This should pull all the required dependencies down, compile Riak and
generate a release for you in a folder called rel/riak. When finished, this
folder can be archived (using tar/zip/etc) and copied over to your server.
This archive is a self-contained instance of Riak and all the required bits
and pieces to run on the target machine. Extract on your server and you can
run it directly from there. Bear in mind that you'll probably have to modify
settings in etc/app.config to whatever you may need for your environment.

Pardon the high-level/crass response :) I'm no expert, but this has worked
for me just fine. I'm sure the more seasoned Riak lads will be able to chime
in and correct me if I've said something stupid or if they have a better
process.

Hope that helps!
OJ



On 4 October 2011 18:29, Olivier Rossel  wrote:

> Hi all.
>
> I need to deploy Riak on a Linux server with no Erlang VM.
> Is there a procedure to build a native Riak executable on my own Ubuntu
> so I can deploy it on this server?
> Which Erlang .deb packages are required on my Ubuntu to build this native
> Riak?
>
> Any help is very welcome.
>
> PS: I am *absolute* newbie in Erlang :)
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 

OJ Reeves
http://buffered.io/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delete takes 5 seconds in riak 1.0 ???

2011-10-04 Thread Roland Karlsson
I use riakc vsn=1.2.0.
The same as I used for riak 0.14
Thats the current "master" branch on github.

Is that the right version for riak 0.14 ?
Is that the right version for riak 1.0 ?

NOTE - this client can connect to both 0.14 and 1.0.
But on 0.14 the get with the argument deletedvclock disconnects.


/Roland




- Original Message -
From: "Kresten Krab Thorup" 
To: "Roland Karlsson" 
Sent: Tuesday, October 4, 2011 12:28:59 PM
Subject: Re: Delete takes 5 seconds in riak 1.0 ???

Hi roland,

I think there were some protocol changes from 0.14 to 1.0, and so the new 
client probably can't be connected to an old server.  Kind of annoying, but 
that's the only reason I can see for this.

Kresten


Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 
8787  |  www.trifork.com





On Oct 4, 2011, at 12:15 PM, Roland Karlsson wrote:

   case riakc_pb_socket:get(Pid, Bucket, Key, [deletedvclock]) of


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Building a native Riak executable

2011-10-04 Thread Bryan Fink
On Tue, Oct 4, 2011 at 4:29 AM, Olivier Rossel  wrote:
> I need to deploy Riak on a Linux server with no Erlang VM.
> Is there a procedure to build a native Riak executable on my own Ubuntu
> so I can deploy it on this server?
> Which Erlang .deb packages are required on my Ubuntu to build this native 
> Riak?

Hi, Oliver.  If you just need to deploy Riak, and you don't need to
recompile changes you have made, you can just grab the Ubuntu Riak
package from

http://downloads.basho.com/riak/

That .deb contains its own Erlang system prebuilt and embedded, no
need to install one separately.  See our instructions on our wiki for
next steps:

http://wiki.basho.com/Installing-on-Debian-and-Ubuntu.html

Good Luck,
Bryan

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Not updateable buckets

2011-10-04 Thread Anton Podviaznikov
Hi,

I have question regarding buckets configuration.

I want to be able just to save some document to bucket but not update it or
delete. E.x. history bucket in which just read-only revisions will be
stored.

Is it possible to make it via bucket configuration inside riak?


-- 
Best regards,

Anton Podviaznikov
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delete takes 5 seconds in riak 1.0 ???

2011-10-04 Thread Ian Plosker
Roland,

There were a number of changes to the protocol buffers API between versions 
0.14 and 1.0. Therefore, if you send messages using fields introduced in 1.0, 
0.14.2 will have trouble decoding the message. Riak 0.14.2 should be used with 
riakc 1.1; Riak 1.0 should be used with riakc 1.2.

Ian Plosker
Developer Advocate
Basho Technologies


On Oct 4, 2011, at 6:42 AM, Roland Karlsson wrote:

> I use riakc vsn=1.2.0.
> The same as I used for riak 0.14
> Thats the current "master" branch on github.
> 
> Is that the right version for riak 0.14 ?
> Is that the right version for riak 1.0 ?
> 
> NOTE - this client can connect to both 0.14 and 1.0.
> But on 0.14 the get with the argument deletedvclock disconnects.
> 
> 
> /Roland
> 
> 
> 
> 
> - Original Message -
> From: "Kresten Krab Thorup" 
> To: "Roland Karlsson" 
> Sent: Tuesday, October 4, 2011 12:28:59 PM
> Subject: Re: Delete takes 5 seconds in riak 1.0 ???
> 
> Hi roland,
> 
> I think there were some protocol changes from 0.14 to 1.0, and so the new 
> client probably can't be connected to an old server.  Kind of annoying, but 
> that's the only reason I can see for this.
> 
> Kresten
> 
> 
> Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
> Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 
> 8787  |  www.trifork.com
> 
> 
> 
> 
> 
> On Oct 4, 2011, at 12:15 PM, Roland Karlsson wrote:
> 
>   case riakc_pb_socket:get(Pid, Bucket, Key, [deletedvclock]) of
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: The power of the siblings....

2011-10-04 Thread Ryan Zezeski
On Tue, Oct 4, 2011 at 12:07 AM, Mike Oxford  wrote:

> SSDs are an option, sure.  I have one in my laptop; we have a bunch
> of X25s on the way already for the servers.  Yes, they're good.  But
> IOPS is not the core issue since the whole thing can sit in RAM
> which is faster yet.  Disk-flush "later" isn't time critical.  Getting the
> data into the buckets is.
>

If you're writing to bitcask, which I assumed, then IOPS is very much an
issue.  If using bitcask and the I/O throughput is not there you are going
to have major bakcups on the vnode mailbox.  If there are any selective
receives in the vnode implementation things will really get nasty.

Are you saying you're using an in-memory backend for these keys?


>
> 5k per second per key, over multiple concurrent writers (3-6 initially,
> possibly more later.) Pre-cache+flush doesn't work because you
> lose the interleave from the multiple writers.  NTP's resolution is only
> "so good." :)
>

So for each key you have 3-6 concurrent writers averaging around 5kw/s.  How
many keys do you have like this?


>
> The buckets can by cycled/sharded based on time, so slicing it into
> "5 second buckets of children" is possible but this is just a
> specialization
> of the sharding ideology.
>

I assume you mean that every 5s you would change the bucket name to avoid
overloading the same key space?  Yea, that would probably help but I still
think you'll have trouble with 5kw/s on a single key if using a durable
backend.


>
> Point being: If it's basically used as an append-only-bucket (throw it
> all in, sort it out later) how painful, underneath, is the child resolution
> vs
> the traditional "get it, write it" and then dealing with children ANYWAY
> when you do get collisions (which, at 5kps, you ARE going to end up with.
>

Yea, I agree either way you'll end up with children.  I would imagine you'd
have faster writes without the get/modify/put cycle but I've also never seen
anyone explode siblings that high on purpose so for all I know it will be
worse.  I'd be curious to see how Riak handles large sibling counts like
that but my gut says it won't do so well.


> This was touched on that it uses lists underneath.  Given high-end modern
> hardware, (6 core CPUs, SSDs, etc.) ballpark, where would you guess the
> red-line is?  10k children? 25k? 100k?  I won't hold anyone to it, but if
> you say "hell no, children are really expensive" then I'll abort the idea
> right here compared to "they're pretty efficient underneath, it might be
> doable."
>

I think it's a bad idea, no matter what the practical limit is.  Siblings,
when possible, are to be avoided.  They only exist because when you write a
distributed application like Riak there are certain scenarios where they
can't be avoided.  You can certainly try to use them as you describe, but I
can tell you the code was not written with that in mind.  Like I said, I'd
be curious to see the results.

>
> I'm familiar with all the HA/clustering "normal stuff" but I'm curious
> about Riak in particular because while Riak isn't built to be fast,
> I'm curious about how much load you can push a ring through before
> the underlying architecture stresses.
>

In most cases we expect Riak to be I/O bound.  If you're not stressing I/O
then my first instinct would be to raise the ring size so that each node has
more partitions.  There is no hard and fast rule about how many partitions a
node should have but is dependent on the type of disk you have.  Obviously,
SSDs and the like will handle more.  We even have some people that run SSDs
RAID 0.

Also, since ring size is something that you can't change once a cluster has
been created you need to do some form of capacity planning ahead of time to
guess what will be the best node/partition ratio.  In 1.0 we did some work
to make better use of I/O without relying on the ring size (such as async
folding and whatnot) but I'm not sure on all the details and I'm hoping one
of my colleagues can help me out if I'm missing something.


> I know Yammer was putting some load on theirs; something around 4k
> per sec over a few boxes but not to a single key.
>

The key part of that sentence: _not to a single key_.  Writing to a single
key is serialized and therefore it can only move as fast as the vnodes that
map to it.


>
> The big "problem" is that you have to have "knowledge of the buckets"
> to later correlate them. Listing buckets is expensive.  I don't want to
> hard-code bucket names into the application space if I can help it.
> Writing "list of buckets" to another key simply moves the bottleneck
> from one key to another.  Shifting buckets based on time works, but
> it's obnoxious to have to correlate at 10 second intervals 
> 8640 buckets worth of obnoxious.  Every day.  Much easier to sort a
> large dataset all at once from a single bucket.
>

I'm not sure if you realize this but "bucket" is really just a namespace in
the key.  Said another way =/.  The  is
what's hashed and det

Re: Erlang Map/Reduce error

2011-10-04 Thread Sean Cribbs
Note that add_paths must be a list of paths, e.g. {add_paths,
["/path/to/your/ebin","/path/to/your/other/ebin"]}.

On Tue, Oct 4, 2011 at 5:24 AM, Kresten Krab Thorup wrote:

> The error badfun most likely means that the function's code is not
> available inside Riak.
>
> You need to add {add_paths, "/path/to/your/ebin"} in the riak_kv section in
> app.config.  Read more about it here
> http://wiki.basho.com/MapReduce.html#MapReduce-via-the-Erlang-API
>
> For an explanation of why the badfun happens, you can read this blog:
> http://www.javalimit.com/2010/05/passing-funs-to-other-erlang-nodes.html
> ... which can also give you an idea to how to do it without loading the
> code into riak, i.e. something like this may likely work, but will invoke
> the interpreter inside riak; which may be too slow for your needs.
>
> Kresten
>
>
> 1> FunStr = "fun(O, undefined, none) -> [riak_object:get_value(O)] end.".
> 2> {ok, Tokens, _} = erl_scan:string(FunStr).
> 3> {ok, [Form]} = erl_parse:parse_exprs(Tokens).
> 4> Bindings = erl_eval:add_binding('B', 2, erl_eval:new_bindings()).
> 5> {value, Fun, _} = erl_eval:expr(Form, Bindings).
> 6> C:mapred([{<<"buck">>, <<"key1">>}, {<<"buck">>, <<"key2">>}], [{map,
> {qfun, Fun}, none, true}]).
>
>
> Kresten
>
>
>
>
> On Oct 4, 2011, at 11:00 AM, Lyes Amazouz wrote:
>
> Hi everybody,
>
> I'm learning how to execute a map/reduce using the riak_client module, so I
> tried this:
>
> >Map = fun(O, undefined, none) -> [riak_object:get_value(O)] end.
> >C:mapred([{<<"buck">>, <<"key1">>}, {<<"buck">>, <<"key2">>}], [{map,
> {qfun, Map}, none, true}]).
>
> but I got  this error:
>
> {error,{error,{badfun,#Fun},
>  [{riak_kv_mapper,run_map,9},
>   {riak_kv_mapper,'-do_map/2-fun-0-',9},
>   {lists,foldl,3},
>   {riak_kv_mapper,do_map,2},
>   {gen_fsm,handle_msg,7},
>   {proc_lib,init_p_do_apply,3}]}}
>
> What is wring with my code??
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://www.basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Not updateable buckets

2011-10-04 Thread Sean Cribbs
Anton,

No, that is not possible directly in Riak. However, if you use a reverse
proxy in front of Riak, you could restrict HTTP methods to only POST, for
instance, and filter out requests where the key is included in the URL.  An
appropriate response from the proxy might be a 403 for invalid URL/method
combinations.

On Tue, Oct 4, 2011 at 7:09 AM, Anton Podviaznikov
wrote:

> Hi,
>
> I have question regarding buckets configuration.
>
> I want to be able just to save some document to bucket but not update it or
> delete. E.x. history bucket in which just read-only revisions will be
> stored.
>
> Is it possible to make it via bucket configuration inside riak?
>
>
> --
> Best regards,
>
> Anton Podviaznikov
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://www.basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Not updateable buckets

2011-10-04 Thread Anton Podviaznikov
Thank you for the tip with reverse proxy.

Best regards,
Anton Podviaznikov

On 4 October 2011 15:26, Sean Cribbs  wrote:

> Anton,
>
> No, that is not possible directly in Riak. However, if you use a reverse
> proxy in front of Riak, you could restrict HTTP methods to only POST, for
> instance, and filter out requests where the key is included in the URL.  An
> appropriate response from the proxy might be a 403 for invalid URL/method
> combinations.
>
> On Tue, Oct 4, 2011 at 7:09 AM, Anton Podviaznikov  > wrote:
>
>> Hi,
>>
>> I have question regarding buckets configuration.
>>
>> I want to be able just to save some document to bucket but not update it
>> or delete. E.x. history bucket in which just read-only revisions will be
>> stored.
>>
>> Is it possible to make it via bucket configuration inside riak?
>>
>>
>> --
>> Best regards,
>>
>> Anton Podviaznikov
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://www.basho.com/
>
>


-- 
Best regards,

Anton Podviaznikov
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delete takes 5 seconds in riak 1.0 ???

2011-10-04 Thread Roland Karlsson
Thanx!

H ... is this mapping / riak-version <--> riakc-version documented anywhere?
Maybe via a protocol number ...

/Roland



- Original Message -
From: "Ian Plosker" 
To: "Roland Karlsson" 
Cc: riak-users@lists.basho.com
Sent: Tuesday, October 4, 2011 2:50:30 PM
Subject: Re: Delete takes 5 seconds in riak 1.0 ???


Roland, 


There were a number of changes to the protocol buffers API between versions 
0.14 and 1.0. Therefore, if you send messages using fields introduced in 1.0, 
0.14.2 will have trouble decoding the message. Riak 0.14.2 should be used with 
riakc 1.1; Riak 1.0 should be used with riakc 1.2. 



Ian Plosker 
Developer Advocate 
Basho Technologies 




On Oct 4, 2011, at 6:42 AM, Roland Karlsson wrote: 



I use riakc vsn=1.2.0. 
The same as I used for riak 0.14 
Thats the current "master" branch on github. 

Is that the right version for riak 0.14 ? 
Is that the right version for riak 1.0 ? 

NOTE - this client can connect to both 0.14 and 1.0. 
But on 0.14 the get with the argument deletedvclock disconnects. 


/Roland 




- Original Message - 
From: "Kresten Krab Thorup" < k...@trifork.com > 
To: "Roland Karlsson" < roland.karls...@erlang-solutions.com > 
Sent: Tuesday, October 4, 2011 12:28:59 PM 
Subject: Re: Delete takes 5 seconds in riak 1.0 ??? 

Hi roland, 

I think there were some protocol changes from 0.14 to 1.0, and so the new 
client probably can't be connected to an old server. Kind of annoying, but 
that's the only reason I can see for this. 

Kresten 


Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab 
Trifork A/S | Margrethepladsen 4 | DK- 8000 Aarhus C | Phone : +45 8732 8787 | 
www.trifork.com < http://www.trifork.com/ > 





On Oct 4, 2011, at 12:15 PM, Roland Karlsson wrote: 

case riakc_pb_socket:get(Pid, Bucket, Key, [deletedvclock]) of 


___ 
riak-users mailing list 
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delete takes 5 seconds in riak 1.0 ???

2011-10-04 Thread Ian Plosker
On Oct 4, 2011, at 10:16 AM, Roland Karlsson wrote:

> Thanx!
> 
> H ... is this mapping / riak-version <--> riakc-version documented 
> anywhere?
> Maybe via a protocol number ...
> 
> /Roland


Roland,

You can check the `rebar.config` in the riak_kv repository 
(https://github.com/basho/riak_kv) for the version of Riak you're working 
against:

https://github.com/basho/riak_kv/blob/riak_kv-0.14.2/rebar.config
https://github.com/basho/riak_kv/blob/1.0.0/rebar.config

Ian Plosker
Developer Advocate
Basho Technologies___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: The power of the siblings....

2011-10-04 Thread Mike Oxford
Good info; thanks for taking the time to respond!
I can use now/0 and shard on the mod to spread them out in a
sub-second spread.

The only thing stopping me from doing so was that I'm lazy and
didn't want to have to write the correlation aspect over that many
buckets if I didn't have to.  Enough other things to do.  :)

Rock on.

-mox

On Tue, Oct 4, 2011 at 6:18 AM, Ryan Zezeski  wrote:
>
>
> On Tue, Oct 4, 2011 at 12:07 AM, Mike Oxford  wrote:
>>
>> SSDs are an option, sure.  I have one in my laptop; we have a bunch
>> of X25s on the way already for the servers.  Yes, they're good.  But
>> IOPS is not the core issue since the whole thing can sit in RAM
>> which is faster yet.  Disk-flush "later" isn't time critical.  Getting the
>> data into the buckets is.
>
> If you're writing to bitcask, which I assumed, then IOPS is very much an
> issue.  If using bitcask and the I/O throughput is not there you are going
> to have major bakcups on the vnode mailbox.  If there are any selective
> receives in the vnode implementation things will really get nasty.
> Are you saying you're using an in-memory backend for these keys?
>
>>
>> 5k per second per key, over multiple concurrent writers (3-6 initially,
>> possibly more later.) Pre-cache+flush doesn't work because you
>> lose the interleave from the multiple writers.  NTP's resolution is only
>> "so good." :)
>
> So for each key you have 3-6 concurrent writers averaging around 5kw/s.  How
> many keys do you have like this?
>
>>
>> The buckets can by cycled/sharded based on time, so slicing it into
>> "5 second buckets of children" is possible but this is just a
>> specialization
>> of the sharding ideology.
>
> I assume you mean that every 5s you would change the bucket name to avoid
> overloading the same key space?  Yea, that would probably help but I still
> think you'll have trouble with 5kw/s on a single key if using a durable
> backend.
>
>>
>> Point being: If it's basically used as an append-only-bucket (throw it
>> all in, sort it out later) how painful, underneath, is the child
>> resolution vs
>> the traditional "get it, write it" and then dealing with children ANYWAY
>> when you do get collisions (which, at 5kps, you ARE going to end up with.
>
> Yea, I agree either way you'll end up with children.  I would imagine you'd
> have faster writes without the get/modify/put cycle but I've also never seen
> anyone explode siblings that high on purpose so for all I know it will be
> worse.  I'd be curious to see how Riak handles large sibling counts like
> that but my gut says it won't do so well.
>>
>> This was touched on that it uses lists underneath.  Given high-end modern
>> hardware, (6 core CPUs, SSDs, etc.) ballpark, where would you guess the
>> red-line is?  10k children? 25k? 100k?  I won't hold anyone to it, but if
>> you say "hell no, children are really expensive" then I'll abort the idea
>> right here compared to "they're pretty efficient underneath, it might be
>> doable."
>
> I think it's a bad idea, no matter what the practical limit is.  Siblings,
> when possible, are to be avoided.  They only exist because when you write a
> distributed application like Riak there are certain scenarios where they
> can't be avoided.  You can certainly try to use them as you describe, but I
> can tell you the code was not written with that in mind.  Like I said, I'd
> be curious to see the results.
>>
>> I'm familiar with all the HA/clustering "normal stuff" but I'm curious
>> about Riak in particular because while Riak isn't built to be fast,
>> I'm curious about how much load you can push a ring through before
>> the underlying architecture stresses.
>
> In most cases we expect Riak to be I/O bound.  If you're not stressing I/O
> then my first instinct would be to raise the ring size so that each node has
> more partitions.  There is no hard and fast rule about how many partitions a
> node should have but is dependent on the type of disk you have.  Obviously,
> SSDs and the like will handle more.  We even have some people that run SSDs
> RAID 0.
> Also, since ring size is something that you can't change once a cluster has
> been created you need to do some form of capacity planning ahead of time to
> guess what will be the best node/partition ratio.  In 1.0 we did some work
> to make better use of I/O without relying on the ring size (such as async
> folding and whatnot) but I'm not sure on all the details and I'm hoping one
> of my colleagues can help me out if I'm missing something.
>>
>> I know Yammer was putting some load on theirs; something around 4k
>> per sec over a few boxes but not to a single key.
>
> The key part of that sentence: _not to a single key_.  Writing to a single
> key is serialized and therefore it can only move as fast as the vnodes that
> map to it.
>
>>
>> The big "problem" is that you have to have "knowledge of the buckets"
>> to later correlate them. Listing buckets is expensive.  I don't want to
>> hard-code bucket names into the

Re: The power of the siblings....

2011-10-04 Thread Andy Skelton
Ryan Zezeski wrote:
> Mike Oxford wrote:
>> The big "problem" is that you have to have "knowledge of the buckets"
>> to later correlate them. Listing buckets is expensive.
>
> I'm not sure if you realize this but "bucket" is really just a namespace in
> the key.  Said another way =/.  The  is
> what's hashed and determines the ring position.  There are no special
> provisions for a bucket for the most part (one exception I can think of is
> custom properties which get stored in the gossiped ring).

Right. There is no list of buckets. Computing the list is ludicrously
expensive because it involves folding over all of the keys in the
backend, extracting the bucket name from each, and accumulating these
in a set.

Listing all the keys in a bucket is similarly expensive. It folds over
all the keys, extracts the bucket name, matches against the desired
bucket (if any), and then accumulates the keys. However, if you
specify the bucket when listing keys there is an optimization
available to key listing that is impossible for bucket listing.

I've given it some thought because I intend to regularly MR over
entire buckets which involves listing keys. The best solution I've
found so far is to partition the keyspace by using the multi backend.
When you ask for all of the keys in a given bucket, only the backend
that stores that bucket is consulted. Ideally, any bucket that will
need to produce its key list gets its own keyspace (backend).

Andy

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread O'Brien-Strain, Eamonn
I am contemplating two different architectures for deploying Riak nodes and web 
servers.

Option A:  Riak nodes are in their own cluster of dedicated machines behind a 
load balancer.  Web servers talk to the Riak nodes via the load balancer. (See 
diagram http://eamonn.org/i/riak-arch-A.png )

Option B: Each web server machine also has a Riak node, and there are also some 
Riak-only machines.  Each web server only talks to its own localhost Riak node. 
(See diagram http://eamonn.org/i/riak-arch-B.png )


All machines will deployed as elastic cloud instances.  I will want to spin up 
and spin down instances, particularly the web servers, as demand varies.  Both 
load balancers are non-sticky.  Web servers are currently talking to Riak via 
HTTP (though might change that to protocol buffers in the future).  Currently 
Riak is configured with the default options.

Here is my thinking of the comparative advantages:

Option A:

 - Better for security, because can lock down the Riak load balancer to only 
open a single port and only for connections from the web servers.
 - Less churn for Riak of nodes entering and leaving the Riak cluster (as web 
servers spin up and down)
 - More flexibility in scaling storage and web tiers independently of each other

Option B:

 - Faster localhost connection from web server to Riak

I think availability is similar for the two options.

The web server response time is the primary metric I want to optimize.  Most 
web server requests will cause several requests to Riak.

What other factors should I take into account?  What measurements could I make 
to help me decide between the architectures?  Are there other architectures I 
should consider? Should I add memcached? Does anyone have any experiences they 
could share in deploying such systems?

Thanks.
__
Eamonn

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Aphyr
Option C: Deploy your web servers with a list of hosts to connect to. 
Have the clients fail over when a riak node goes down. Lower latency 
without sacrificing availability. If you're using protobufs, this may 
not be as big of an issue.


--Kyle

On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wrote:

I am contemplating two different architectures for deploying Riak nodes and web 
servers.

Option A:  Riak nodes are in their own cluster of dedicated machines behind a 
load balancer.  Web servers talk to the Riak nodes via the load balancer. (See 
diagram http://eamonn.org/i/riak-arch-A.png )

Option B: Each web server machine also has a Riak node, and there are also some 
Riak-only machines.  Each web server only talks to its own localhost Riak node. 
(See diagram http://eamonn.org/i/riak-arch-B.png )


All machines will deployed as elastic cloud instances.  I will want to spin up 
and spin down instances, particularly the web servers, as demand varies.  Both 
load balancers are non-sticky.  Web servers are currently talking to Riak via 
HTTP (though might change that to protocol buffers in the future).  Currently 
Riak is configured with the default options.

Here is my thinking of the comparative advantages:

Option A:

  - Better for security, because can lock down the Riak load balancer to only 
open a single port and only for connections from the web servers.
  - Less churn for Riak of nodes entering and leaving the Riak cluster (as web 
servers spin up and down)
  - More flexibility in scaling storage and web tiers independently of each 
other

Option B:

  - Faster localhost connection from web server to Riak

I think availability is similar for the two options.

The web server response time is the primary metric I want to optimize.  Most 
web server requests will cause several requests to Riak.

What other factors should I take into account?  What measurements could I make 
to help me decide between the architectures?  Are there other architectures I 
should consider? Should I add memcached? Does anyone have any experiences they 
could share in deploying such systems?

Thanks.
__
Eamonn

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
I'm with Kyle on this one. Even better, my 'newhttp' branch on Github
enables this kind of multiple-connection and automatic fail-over.

That branch does have a basic sketch for automatic addition/removal of
Riak nodes as you manipulate your cluster. I'll need it one day, but
not "now", so I haven't finished it yet (the monitor.py background
thread).

Regarding security: it is the same for option A and B and C (you're
just shifting stuff around, but it is pretty much all the same). Put
your webservers in one security group, and the Riak nodes in another.
Open the Riak ports *only* to the webserver security group and to each
other.

Avoiding two services on one machine (e.g web + riak) is also much
easier to manage/maintain. Just have web machines and riak machines.

Cheers,
-g

On Tue, Oct 4, 2011 at 17:09, Aphyr  wrote:
> Option C: Deploy your web servers with a list of hosts to connect to. Have
> the clients fail over when a riak node goes down. Lower latency without
> sacrificing availability. If you're using protobufs, this may not be as big
> of an issue.
>
> --Kyle
>
> On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wrote:
>>
>> I am contemplating two different architectures for deploying Riak nodes
>> and web servers.
>>
>> Option A:  Riak nodes are in their own cluster of dedicated machines
>> behind a load balancer.  Web servers talk to the Riak nodes via the load
>> balancer. (See diagram http://eamonn.org/i/riak-arch-A.png )
>>
>> Option B: Each web server machine also has a Riak node, and there are also
>> some Riak-only machines.  Each web server only talks to its own localhost
>> Riak node. (See diagram http://eamonn.org/i/riak-arch-B.png )
>>
>>
>> All machines will deployed as elastic cloud instances.  I will want to
>> spin up and spin down instances, particularly the web servers, as demand
>> varies.  Both load balancers are non-sticky.  Web servers are currently
>> talking to Riak via HTTP (though might change that to protocol buffers in
>> the future).  Currently Riak is configured with the default options.
>>
>> Here is my thinking of the comparative advantages:
>>
>> Option A:
>>
>>  - Better for security, because can lock down the Riak load balancer to
>> only open a single port and only for connections from the web servers.
>>  - Less churn for Riak of nodes entering and leaving the Riak cluster (as
>> web servers spin up and down)
>>  - More flexibility in scaling storage and web tiers independently of each
>> other
>>
>> Option B:
>>
>>  - Faster localhost connection from web server to Riak
>>
>> I think availability is similar for the two options.
>>
>> The web server response time is the primary metric I want to optimize.
>>  Most web server requests will cause several requests to Riak.
>>
>> What other factors should I take into account?  What measurements could I
>> make to help me decide between the architectures?  Are there other
>> architectures I should consider? Should I add memcached? Does anyone have
>> any experiences they could share in deploying such systems?
>>
>> Thanks.
>> __
>> Eamonn
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Mike Oxford
You'll want to run protobufs if you're looking to optimize your
response time; HTTP sockets (even to localhost) will require much more
overhead and time.
Even better would be unix sockets if they're available, and you can
bypass the whole TCP stack.

I would go A over B.  Less thrash, more control, better ACL/security
but it is slightly slower (probably a few ms per request.)
How optimized do you care about?  Is an extra 100ms (random number)
too much to give up?

-mox

On Tue, Oct 4, 2011 at 2:04 PM, O'Brien-Strain, Eamonn
 wrote:
> I am contemplating two different architectures for deploying Riak nodes and 
> web servers.
>
> Option A:  Riak nodes are in their own cluster of dedicated machines behind a 
> load balancer.  Web servers talk to the Riak nodes via the load balancer. 
> (See diagram http://eamonn.org/i/riak-arch-A.png )
>
> Option B: Each web server machine also has a Riak node, and there are also 
> some Riak-only machines.  Each web server only talks to its own localhost 
> Riak node. (See diagram http://eamonn.org/i/riak-arch-B.png )
>
>
> All machines will deployed as elastic cloud instances.  I will want to spin 
> up and spin down instances, particularly the web servers, as demand varies.  
> Both load balancers are non-sticky.  Web servers are currently talking to 
> Riak via HTTP (though might change that to protocol buffers in the future).  
> Currently Riak is configured with the default options.
>
> Here is my thinking of the comparative advantages:
>
> Option A:
>
>  - Better for security, because can lock down the Riak load balancer to only 
> open a single port and only for connections from the web servers.
>  - Less churn for Riak of nodes entering and leaving the Riak cluster (as web 
> servers spin up and down)
>  - More flexibility in scaling storage and web tiers independently of each 
> other
>
> Option B:
>
>  - Faster localhost connection from web server to Riak
>
> I think availability is similar for the two options.
>
> The web server response time is the primary metric I want to optimize.  Most 
> web server requests will cause several requests to Riak.
>
> What other factors should I take into account?  What measurements could I 
> make to help me decide between the architectures?  Are there other 
> architectures I should consider? Should I add memcached? Does anyone have any 
> experiences they could share in deploying such systems?
>
> Thanks.
> __
> Eamonn
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Jeremiah Peschka
The only thing I have to add is that I'd avoid option B like the plague. 
Spinning up and spinning down nodes in a cluster is going to result in a lot of 
gossip around the ring, especially when we already know that comms in AWS are a 
bit spotty.

I'm assuming that you're not talking about elastic load balancers - those don't 
work for communication within AWS.

You could also use a virtual private cloud for your setup to minimize traffic 
from the outside world.
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Oct 4, 2011, at 3:59 PM, Greg Stein wrote:

> I'm with Kyle on this one. Even better, my 'newhttp' branch on Github
> enables this kind of multiple-connection and automatic fail-over.
> 
> That branch does have a basic sketch for automatic addition/removal of
> Riak nodes as you manipulate your cluster. I'll need it one day, but
> not "now", so I haven't finished it yet (the monitor.py background
> thread).
> 
> Regarding security: it is the same for option A and B and C (you're
> just shifting stuff around, but it is pretty much all the same). Put
> your webservers in one security group, and the Riak nodes in another.
> Open the Riak ports *only* to the webserver security group and to each
> other.
> 
> Avoiding two services on one machine (e.g web + riak) is also much
> easier to manage/maintain. Just have web machines and riak machines.
> 
> Cheers,
> -g
> 
> On Tue, Oct 4, 2011 at 17:09, Aphyr  wrote:
>> Option C: Deploy your web servers with a list of hosts to connect to. Have
>> the clients fail over when a riak node goes down. Lower latency without
>> sacrificing availability. If you're using protobufs, this may not be as big
>> of an issue.
>> 
>> --Kyle
>> 
>> On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wrote:
>>> 
>>> I am contemplating two different architectures for deploying Riak nodes
>>> and web servers.
>>> 
>>> Option A:  Riak nodes are in their own cluster of dedicated machines
>>> behind a load balancer.  Web servers talk to the Riak nodes via the load
>>> balancer. (See diagram http://eamonn.org/i/riak-arch-A.png )
>>> 
>>> Option B: Each web server machine also has a Riak node, and there are also
>>> some Riak-only machines.  Each web server only talks to its own localhost
>>> Riak node. (See diagram http://eamonn.org/i/riak-arch-B.png )
>>> 
>>> 
>>> All machines will deployed as elastic cloud instances.  I will want to
>>> spin up and spin down instances, particularly the web servers, as demand
>>> varies.  Both load balancers are non-sticky.  Web servers are currently
>>> talking to Riak via HTTP (though might change that to protocol buffers in
>>> the future).  Currently Riak is configured with the default options.
>>> 
>>> Here is my thinking of the comparative advantages:
>>> 
>>> Option A:
>>> 
>>>  - Better for security, because can lock down the Riak load balancer to
>>> only open a single port and only for connections from the web servers.
>>>  - Less churn for Riak of nodes entering and leaving the Riak cluster (as
>>> web servers spin up and down)
>>>  - More flexibility in scaling storage and web tiers independently of each
>>> other
>>> 
>>> Option B:
>>> 
>>>  - Faster localhost connection from web server to Riak
>>> 
>>> I think availability is similar for the two options.
>>> 
>>> The web server response time is the primary metric I want to optimize.
>>>  Most web server requests will cause several requests to Riak.
>>> 
>>> What other factors should I take into account?  What measurements could I
>>> make to help me decide between the architectures?  Are there other
>>> architectures I should consider? Should I add memcached? Does anyone have
>>> any experiences they could share in deploying such systems?
>>> 
>>> Thanks.
>>> __
>>> Eamonn
>>> 
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Mike Oxford
On Tue, Oct 4, 2011 at 3:59 PM, Greg Stein  wrote:
> Regarding security: it is the same for option A and B and C (you're
> just shifting stuff around, but it is pretty much all the same). Put
> your webservers in one security group, and the Riak nodes in another.
> Open the Riak ports *only* to the webserver security group and to each
> other.

Not quite the same.  If you get rooted on a webhead you don't want your
data there (esp with an erl shell.)

> Avoiding two services on one machine (e.g web + riak) is also much
> easier to manage/maintain. Just have web machines and riak machines.

I disagree; it's more work to maintain two machines correctly.  However
the extra work is worth it for security/scalability.

-mox

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Kev Burns
I'd also choose option A for infrastructure simplicity and more reliable
stats for capacity planning. 5 node n=3 is a very well understood starting
point.

- Kev
On Oct 4, 2011 2:06 PM, "O'Brien-Strain, Eamonn" <
eamonn.obrien-str...@hp.com> wrote:
> I am contemplating two different architectures for deploying Riak nodes
and web servers.
>
> Option A: Riak nodes are in their own cluster of dedicated machines behind
a load balancer. Web servers talk to the Riak nodes via the load balancer.
(See diagram http://eamonn.org/i/riak-arch-A.png )
>
> Option B: Each web server machine also has a Riak node, and there are also
some Riak-only machines. Each web server only talks to its own localhost
Riak node. (See diagram http://eamonn.org/i/riak-arch-B.png )
>
>
> All machines will deployed as elastic cloud instances. I will want to spin
up and spin down instances, particularly the web servers, as demand varies.
Both load balancers are non-sticky. Web servers are currently talking to
Riak via HTTP (though might change that to protocol buffers in the future).
Currently Riak is configured with the default options.
>
> Here is my thinking of the comparative advantages:
>
> Option A:
>
> - Better for security, because can lock down the Riak load balancer to
only open a single port and only for connections from the web servers.
> - Less churn for Riak of nodes entering and leaving the Riak cluster (as
web servers spin up and down)
> - More flexibility in scaling storage and web tiers independently of each
other
>
> Option B:
>
> - Faster localhost connection from web server to Riak
>
> I think availability is similar for the two options.
>
> The web server response time is the primary metric I want to optimize.
Most web server requests will cause several requests to Riak.
>
> What other factors should I take into account? What measurements could I
make to help me decide between the architectures? Are there other
architectures I should consider? Should I add memcached? Does anyone have
any experiences they could share in deploying such systems?
>
> Thanks.
> __
> Eamonn
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
On Oct 4, 2011 7:04 PM, "Mike Oxford"  wrote:
>
> On Tue, Oct 4, 2011 at 3:59 PM, Greg Stein  wrote:
> > Regarding security: it is the same for option A and B and C (you're
> > just shifting stuff around, but it is pretty much all the same). Put
> > your webservers in one security group, and the Riak nodes in another.
> > Open the Riak ports *only* to the webserver security group and to each
> > other.
>
> Not quite the same.  If you get rooted on a webhead you don't want your
> data there (esp with an erl shell.)

Ah. Yeah. Quite true.

> > Avoiding two services on one machine (e.g web + riak) is also much
> > easier to manage/maintain. Just have web machines and riak machines.
>
> I disagree; it's more work to maintain two machines correctly.  However
> the extra work is worth it for security/scalability.

Note that his original description had two machine types: web+riak, and
riak-only. My point was about that two service box being a pain. Given that
you have two types, then break up the boxes into the two -only formats and
increase your security.

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
On Oct 4, 2011 7:01 PM, "Mike Oxford"  wrote:
>
> You'll want to run protobufs if you're looking to optimize your
> response time; HTTP sockets (even to localhost) will require much more
> overhead and time.

Hmm? The protocol seems moot, compared to inter-node comms when r > 1.
Protocol parsing just doesn't seem like much of a factor. On my laptop, I
was seeing a 3ms response time against one node. I can't imagine that
parsing was more than a few percent, no matter the protocol.

(and no, I have no specific numbers to confirm/deny my thought experiment
here)

> Even better would be unix sockets if they're available, and you can
> bypass the whole TCP stack.

What? Is that even an option for Riak? I haven't seen anything about that.

>...

Cheers,
-g
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Aphyr
Internode times in our datacenter at SL are indistinguishible from 
loopback; TCP/IP processing dominates. HTTP, on the other hand, involves 
either in-depth connection management/multiplexing, or TCP/IP 
setup/teardown latency at either end of a request. In read-write heavy 
apps, protobufs outperforms HTTP in throughput by 2x or more, against 
objects of 500-4000 bytes. That's with the ruby client; ymmv.


--Kyle

On 10/04/2011 07:18 PM, Greg Stein wrote:


On Oct 4, 2011 7:01 PM, "Mike Oxford" mailto:moxf...@gmail.com>> wrote:
 >
 > You'll want to run protobufs if you're looking to optimize your
 > response time; HTTP sockets (even to localhost) will require much more
 > overhead and time.

Hmm? The protocol seems moot, compared to inter-node comms when r > 1
Protocol parsing just doesn't seem like much of a factor. On my laptop,
I was seeing a 3ms response time against one node. I can't imagine that
parsing was more than a few percent, no matter the protocol.

(and no, I have no specific numbers to confirm/deny my thought
experiment here)

 > Even better would be unix sockets if they're available, and you can
 > bypass the whole TCP stack.

What? Is that even an option for Riak? I haven't seen anything about that.

 >...

Cheers,
-g



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Greg Stein
I don't see that multiplexing or TCP setup is specific to HTTP.

The only difference between protobuf and HTTP is what goes on the wire. Not
how the wire is managed.

(and with that said, the Python client managed the wire in the most horrible
ways imaginable for the HTTP Client; I've since fixed that on my branch)
On Oct 4, 2011 11:37 PM, "Aphyr"  wrote:
> Internode times in our datacenter at SL are indistinguishible from
> loopback; TCP/IP processing dominates. HTTP, on the other hand, involves
> either in-depth connection management/multiplexing, or TCP/IP
> setup/teardown latency at either end of a request. In read-write heavy
> apps, protobufs outperforms HTTP in throughput by 2x or more, against
> objects of 500-4000 bytes. That's with the ruby client; ymmv.
>
> --Kyle
>
> On 10/04/2011 07:18 PM, Greg Stein wrote:
>>
>> On Oct 4, 2011 7:01 PM, "Mike Oxford" > > wrote:
>> >
>> > You'll want to run protobufs if you're looking to optimize your
>> > response time; HTTP sockets (even to localhost) will require much more
>> > overhead and time.
>>
>> Hmm? The protocol seems moot, compared to inter-node comms when r > 1
>> Protocol parsing just doesn't seem like much of a factor. On my laptop,
>> I was seeing a 3ms response time against one node. I can't imagine that
>> parsing was more than a few percent, no matter the protocol.
>>
>> (and no, I have no specific numbers to confirm/deny my thought
>> experiment here)
>>
>> > Even better would be unix sockets if they're available, and you can
>> > bypass the whole TCP stack.
>>
>> What? Is that even an option for Riak? I haven't seen anything about
that.
>>
>> >...
>>
>> Cheers,
>> -g
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Aphyr
It's not an intrinsic property of HTTP... more that some of the HTTP 
libraries the clients are built on can have awkward semantics for using 
connections efficiently. Sounds like you've already addressed this, 
which is great. Mochiweb + HTTP parsing + mime-multipart will introduce 
some time/space overhead compared to tagged values in protobufs, but it 
may be negligible. Try it and see!


--Kyle

On 10/04/2011 09:09 PM, Greg Stein wrote:

I don't see that multiplexing or TCP setup is specific to HTTP.

The only difference between protobuf and HTTP is what goes on the wire.
Not how the wire is managed.

(and with that said, the Python client managed the wire in the most
horrible ways imaginable for the HTTP Client; I've since fixed that on
my branch)

On Oct 4, 2011 11:37 PM, "Aphyr" mailto:ap...@aphyr.com>> wrote:
 > Internode times in our datacenter at SL are indistinguishible from
 > loopback; TCP/IP processing dominates. HTTP, on the other hand, involves
 > either in-depth connection management/multiplexing, or TCP/IP
 > setup/teardown latency at either end of a request. In read-write heavy
 > apps, protobufs outperforms HTTP in throughput by 2x or more, against
 > objects of 500-4000 bytes. That's with the ruby client; ymmv.
 >
 > --Kyle
 >
 > On 10/04/2011 07:18 PM, Greg Stein wrote:
 >>
 >> On Oct 4, 2011 7:01 PM, "Mike Oxford" mailto:moxf...@gmail.com>
 >> >> wrote:
 >> >
 >> > You'll want to run protobufs if you're looking to optimize your
 >> > response time; HTTP sockets (even to localhost) will require much more
 >> > overhead and time.
 >>
 >> Hmm? The protocol seems moot, compared to inter-node comms when r > 1
 >> Protocol parsing just doesn't seem like much of a factor. On my laptop,
 >> I was seeing a 3ms response time against one node. I can't imagine that
 >> parsing was more than a few percent, no matter the protocol.
 >>
 >> (and no, I have no specific numbers to confirm/deny my thought
 >> experiment here)
 >>
 >> > Even better would be unix sockets if they're available, and you can
 >> > bypass the whole TCP stack.
 >>
 >> What? Is that even an option for Riak? I haven't seen anything about
that.
 >>
 >> >...
 >>
 >> Cheers,
 >> -g
 >>
 >>
 >>
 >> ___
 >> riak-users mailing list
 >> riak-users@lists.basho.com 
 >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com