zip mapreduce results

2011-05-29 Thread Malka Feldman
Hi,
I get long strings from my map functions, I want to get much smaller
responses to reduce the latency, Do you know an option to zip my strings in
 a redcue phase?

Thanks,
-- 

*Malka Feldman*
Tribase LTD.

41 Shimon Hatzadik St.

Elad, Israel

Tel. 074-7122704  Fax. 03-9075211 Cell. 972-54-8370828

m al...@3basegroup.com

www.3basegroup.com 

[image: cid:image001.jpg@01CB2DDB.3AC492C0]
<>___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Store whole database in memory

2011-05-29 Thread Nico Meyer

Hi Michael,

Greg's advice is probably the best, if you really always want to read 
back or update predefined groups of 1000 keys at once. It will increase 
the rate at which you can write and read by a factor of 1000 ;-).


But if that's not what you want to do, and we really don't know what 
your design goals are, I honestly think you are trying to put a screw in 
with a hammer here. Maybe you should look for alternatives to Riak, 
since you exploit all its weaknesses an don't care about most of its 
strengths.


Namely storing very small values is a weak spot, as Greg mentioned. 
There is an overhead of at least around 400 bytes per entry at the 
moment. Even if there are plans to reduce this overhead, I would 
estimate it will never get below around 100 bytes if the thing is still 
Riak afterwards. Also this overhead exists for all storage backends. So 
with the ets backend you will only be able to store about 2-3 million 
entries per GB of RAM right now.


Which brings me to the part where you don't care/use most of Riak's 
strengths. You don't seem to care about persistence of data, otherwise 
you wouldn't use a memory only backend. (Btw, as Mike pointed out, with 
enough RAM bitcask is essentially a memory store, especially where the 
write performance is concerned)
You also don't care about eventual consistency, evidenced by the fact 
that you do bulk inserts (only?), and that 12 bytes wouldn't allow for 
enough information to resolve conflicts. So you probably want a last 
write wins behaviour (which can be set as a bucket property in Riak, but 
kind of defeats the purpose in my opinion).


But lets assume Riak was the right tool for your job for a moment.
The limiting factor for writing your data is almost certainly not the 
disk. Writing 100,000 keys with a size of 12 bytes requires only about 
1MB/s, so event the crappiest disk should have no problem with that. But 
as I said there is quite a large overhead for storing values in Riak, so 
in reality the required rate will be 50MB/s per node (3 nodes, n=3 
presumably). Still not a big deal, and this only is a limiting factor 
once the filesystem cache uses all available RAM.


On the other hand, network latency is a problem at such high rates, even 
in a LAN. As far as my experience an my short Google research tell me, 
that the lowest roundtrip time you can expect on standard Gigabit 
ethernet is on the order of 0.1msec or 1/1 second. For each 
operation you need at least one roundtrip (one request packet, one 
reponse packet), so that means with one connection you can never go 
beyond 10,000 writes per second. This assumes no processing time 
whatsoever, so a more realistic number is 2000-5000 ops/s. Therefore you 
need at least 20-50 parallel connections or clients to achieve your 
target write rate. If you use the Rest API these numbers need to be 
doubled, since one additional roundtrip is already need to set up the 
TCP connection.
In general without a lot of tuning and maybe specialized hardware 
(multiple NICs or special low latency NICs) any server will have a hard 
time to handle 100,000 ops/s, regardless of the software that is used.



Cheers,
Nico


On 28.05.2011 20:36, Greg Nelson wrote:

Depending on the n_val you have set for that bucket, Riak will store the
objects n times on n different nodes. There are two other parameters you
should know about, r and w. When writing, Riak will wait for w of the n
nodes to finish the write before returning. When reading, Riak will wait
for r of the n nodes to respond before returning. This is the basics of
how Riak does fault and partition tolerance, i.e. if one node is down
your cluster still functions, and the r and w vals define a sort of
"majority vote" threshold to handle a split-brain problem.

Anyway, for your purposes you could set w=1 and r=3 for faster writes at
the expense of potentially slower reads. I've never tried this (or any
of the backends besides bitcask) so I don't know what you should expect.

As for bulk insert and preserving locality, I don't know of a way to do
that with Riak except to batch your 1000 keys into a single object,
identified by one key. As far as Riak is concerned, it's just a 12KB
opaque object, which your application would need to always write and
read all at once.

If you don't batch like that, you should look for a discussion on this
mailing list from last week regarding capacity planning and very small
objects. There's a bit of overhead associated with each object that will
be significant for objects as small as 12 bytes. You could skip over the
parts about Bitcask overhead...

On Saturday, May 28, 2011 at 9:59 AM, Michael McClain wrote:


Thank you, Mike and Greg, for the response.
I've just replied to the list.
In my use case, I need to be able to write 100,000 keys per second.
Where the key is very small (12 bytes). And I always insert 1000 keys
at once, in a bulk insert. I would also like to preserve the locality
of the keys inserted at once 

Re: zip mapreduce results

2011-05-29 Thread Ben Tilly
Google uses http://code.google.com/p/snappy/ internally for exactly this
sort of thing.  There is an Erlang binding available.

On Sun, May 29, 2011 at 12:13 AM, Malka Feldman wrote:

> Hi,
> I get long strings from my map functions, I want to get much smaller
> responses to reduce the latency, Do you know an option to zip my strings in
>  a redcue phase?
>
> Thanks,
> --
>
> *Malka Feldman*
> Tribase LTD.
>
> 41 Shimon Hatzadik St.
>
> Elad, Israel
>
> Tel. 074-7122704  Fax. 03-9075211 Cell. 972-54-8370828
>
> m al...@3basegroup.com
>
> www.3basegroup.com 
>
> [image: cid:image001.jpg@01CB2DDB.3AC492C0]
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: zip mapreduce results

2011-05-29 Thread Eric Moritz
Sounds like a great contrib.basho.com contribution :)

On Sun, May 29, 2011 at 10:40 AM, Ben Tilly  wrote:

> Google uses http://code.google.com/p/snappy/ internally for exactly this
> sort of thing.  There is an Erlang binding available.
>
> On Sun, May 29, 2011 at 12:13 AM, Malka Feldman wrote:
>
>> Hi,
>> I get long strings from my map functions, I want to get much smaller
>> responses to reduce the latency, Do you know an option to zip my strings in
>>  a redcue phase?
>>
>> Thanks,
>> --
>>
>> *Malka Feldman*
>> Tribase LTD.
>>
>> 41 Shimon Hatzadik St.
>>
>> Elad, Israel
>>
>> Tel. 074-7122704  Fax. 03-9075211 Cell. 972-54-8370828
>>
>> m al...@3basegroup.com
>>
>> www.3basegroup.com 
>>
>> [image: cid:image001.jpg@01CB2DDB.3AC492C0]
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Load Balancing With Riak Ruby Client

2011-05-29 Thread Scott M. Likens
Hey,

In my Chef recipes for AppCloud (Engine Yard's PaaS Product) I actually 
configured haproxy to listen on 8098 on the application instances and redirect 
to all the riak nodes in a roundrobin fashion. (Had httpchk for /ping to ensure 
the node is up)

In my own testing with basho_bench this seemed to work, I'm unsure of what 
drawbacks there would be because I could not find any other then HTTP was 
slower then PBC.

I did find find that Protobuffers did not roundrobin correctly with haproxy 
using tcp mode... darn :(

So if we're just speaking HTTP could totally use HAProxy or a Hardware Load 
balancer to spread out the load.

Scott
-- 
Scott M. Likens
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Friday, May 27, 2011 at 12:40 PM, Sean Cribbs wrote:

> This is one thing I desperately want to refactor. The Ruby client still 
> contains some things that reflect my earlier, less astute understanding of 
> how a Riak client should behave and doesn't include obvious things like 
> retrying requests (possibly on other nodes), conflict resolution strategies, 
> and mutators.
> 
> In the past, I have recommended that users put a lightweight load-balancer 
> (e.g. haproxy, pound) between their application and Riak, and simply have the 
> app connect to the locally-running instance of the LB. I realize this is not 
> a great solution, but it also avoids a little NIH for now.
> 
> Sean Cribbs mailto:s...@basho.com)>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> On May 27, 2011, at 3:20 PM, Keith Bennett wrote:
> 
> > Hi, all. If I have several riak servers on a cluster, and want to 
> > distribute load fairly evenly, and am using the Ruby Riak client, what is 
> > the best way to balance load?
> > 
> > With the HTTP interface, I can randomize the choice of host for a request. 
> > How would I do the same with the ruby client? Would I create a Riak::Client 
> > for each host, and then just randomize the selection of those for a given 
> > call? Do the clients contain any state that would make this a bad idea? Or 
> > is there a better way to do this?
> > 
> > Thanks,
> > Keith
> > 
> > 
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> !DSPAM:4ddffdaf202681804284693!

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Load Balancing With Riak Ruby Client

2011-05-29 Thread Alexey Prohorenko
Is there are any solution for round robin through HAproxy to Riak using
protobufs? I believe Yammer guys are using it in their setup.. Might be
wrong, though.

On May 29, 2011, at 10:35 AM, "Scott M. Likens"  wrote:

 Hey,

In my Chef recipes for AppCloud (Engine Yard's PaaS Product) I actually
configured haproxy to listen on 8098 on the application instances and
redirect to all the riak nodes in a roundrobin fashion.  (Had httpchk for
/ping to ensure the node is up)

In my own testing with basho_bench this seemed to work, I'm unsure of what
drawbacks there would be because I could not find any other then HTTP was
slower then PBC.

I did find find that Protobuffers did not roundrobin correctly with haproxy
using tcp mode... darn :(

So if we're just speaking HTTP could totally use HAProxy or a Hardware Load
balancer to spread out the load.

Scott
-- 
Scott M. Likens
Sent with Sparrow 

On Friday, May 27, 2011 at 12:40 PM, Sean Cribbs wrote:

This is one thing I desperately want to refactor. The Ruby client still
contains some things that reflect my earlier, less astute understanding of
how a Riak client should behave and doesn't include obvious things like
retrying requests (possibly on other nodes), conflict resolution strategies,
and mutators.

In the past, I have recommended that users put a lightweight load-balancer
(e.g. haproxy, pound) between their application and Riak, and simply have
the app connect to the locally-running instance of the LB. I realize this is
not a great solution, but it also avoids a little NIH for now.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 27, 2011, at 3:20 PM, Keith Bennett wrote:

Hi, all. If I have several riak servers on a cluster, and want to distribute
load fairly evenly, and am using the Ruby Riak client, what is the best way
to balance load?

With the HTTP interface, I can randomize the choice of host for a request.
How would I do the same with the ruby client? Would I create a Riak::Client
for each host, and then just randomize the selection of those for a given
call? Do the clients contain any state that would make this a bad idea? Or
is there a better way to do this?

Thanks,
Keith


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

!DSPAM:4ddffdaf202681804284693!


 ___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Load Balancing With Riak Ruby Client

2011-05-29 Thread Sean Cribbs
I think for the protocol buffers, one should probably use the "least connected" 
strategy, but PBC connections tend to be more long-lived than HTTP connections 
-- which would describe what you saw.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 29, 2011, at 1:35 PM, Scott M. Likens wrote:

> Hey,
> 
> In my Chef recipes for AppCloud (Engine Yard's PaaS Product) I actually 
> configured haproxy to listen on 8098 on the application instances and 
> redirect to all the riak nodes in a roundrobin fashion.  (Had httpchk for 
> /ping to ensure the node is up)
> 
> In my own testing with basho_bench this seemed to work, I'm unsure of what 
> drawbacks there would be because I could not find any other then HTTP was 
> slower then PBC.
> 
> I did find find that Protobuffers did not roundrobin correctly with haproxy 
> using tcp mode... darn :(
> 
> So if we're just speaking HTTP could totally use HAProxy or a Hardware Load 
> balancer to spread out the load.
> 
> Scott
> -- 
> Scott M. Likens
> Sent with Sparrow
> On Friday, May 27, 2011 at 12:40 PM, Sean Cribbs wrote:
> 
>> This is one thing I desperately want to refactor. The Ruby client still 
>> contains some things that reflect my earlier, less astute understanding of 
>> how a Riak client should behave and doesn't include obvious things like 
>> retrying requests (possibly on other nodes), conflict resolution strategies, 
>> and mutators.
>> 
>> In the past, I have recommended that users put a lightweight load-balancer 
>> (e.g. haproxy, pound) between their application and Riak, and simply have 
>> the app connect to the locally-running instance of the LB. I realize this is 
>> not a great solution, but it also avoids a little NIH for now.
>> 
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On May 27, 2011, at 3:20 PM, Keith Bennett wrote:
>> 
>>> Hi, all. If I have several riak servers on a cluster, and want to 
>>> distribute load fairly evenly, and am using the Ruby Riak client, what is 
>>> the best way to balance load?
>>> 
>>> With the HTTP interface, I can randomize the choice of host for a request. 
>>> How would I do the same with the ruby client? Would I create a Riak::Client 
>>> for each host, and then just randomize the selection of those for a given 
>>> call? Do the clients contain any state that would make this a bad idea? Or 
>>> is there a better way to do this?
>>> 
>>> Thanks,
>>> Keith
>>> 
>>> 
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> !DSPAM:4ddffdaf202681804284693!
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


ssh tunnel between nodes in a cluster?

2011-05-29 Thread Jeremy Bornstein
 Greetings all!  I'm a riak newbie, trying to figure out the details of
setting up a cluster in my environment.

We have machines in different data centers and no VPN between them.  I'm
a little confused by the details of the required ports, and would love
to see a concrete example from someone who has a setup that uses ssh to
tunnel between nodes in a riak cluster.

Jeremy

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: ssh tunnel between nodes in a cluster?

2011-05-29 Thread Matt Ranney
Are you sure you want this?  Riak will spread your data across all nodes in
the cluster with no consideration for the network topology.

On Sun, May 29, 2011 at 6:40 PM, Jeremy Bornstein  wrote:

> We have machines in different data centers and no VPN between them.  I'm
> a little confused by the details of the required ports, and would love
> to see a concrete example from someone who has a setup that uses ssh to
> tunnel between nodes in a riak cluster.
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: ssh tunnel between nodes in a cluster?

2011-05-29 Thread Jeremy Bornstein
 Yes, in this case I do!


On 5/29/11 10:25 PM, Matt Ranney wrote:
> Are you sure you want this?  Riak will spread your data across all
> nodes in the cluster with no consideration for the network topology.
>
> On Sun, May 29, 2011 at 6:40 PM, Jeremy Bornstein  > wrote:
>
> We have machines in different data centers and no VPN between
> them.  I'm
> a little confused by the details of the required ports, and would love
> to see a concrete example from someone who has a setup that uses
> ssh to
> tunnel between nodes in a riak cluster.
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com